Personal Notes: Data Science

The following are the topics I have studied (and will be planning to study) during my time in the Korean military until November 2022. As these notes are primarily for my personal use, I did not spend as much time writing them in a manner that is clear for all readers. But since it would be a waste not to share them, I uploaded them on this website. All of my personal notes are free to download, use, and distrbute under the Creative Commons "Attribution- NonCommercial-ShareAlike 4.0 International" license. Please contact me if you find any errors in my notes or have any further questions. I have used the LaTeX editing program Overleaf to create my notes; diagrams are often drawn using the tikz package or iPad Notes.

- Sampling Distributions:
*Confidence Intervals, Hypothesis Testing, Central Limit Theorem*

- Bayes Rule:
*Prior & Posterior Distributions, Likelihood, Marginalization, Bayes Box, Common Distributions, Beta Family, Multivariate Gaussians* - Bayesian Inference:
*Parameter Estimation, Beta-Binomial Distribution, Conjugate Distributions & Priors, Credible Intervals, Point Estimates, Exponential Family* - Linear Regression:
*Bayesian & Frequentist Regression, Basis Functions, Hyperparameters & Hierarchical Priors, Parameter Distributions & Predictive Functions, Bayesian Model Selection & Averaging, Gaussian Error/OLS & Laplace Error/LAV, L1 & L2 Regularization w/ Laplace & Gaussian Priors, Sparse Models, Equivalent Kernel* - Markov Chain Monte Carlo:
*Metropolis-Hastings, Detailed Balance, Monte Carlo Integration, Gibbs Sampling*

**Introduction to Machine Learning**

- Regression:
*Least-Squares, Normal Equations, Batch/Stochastic Gradient Descent, Polynomial Regression* - Classification:
*K-Nearest Neighbors, Perceptron, Logistic Regression* - GLMs:
*Exponential Family, Link Functions, GLM Construction, Softmax Regression, Poisson Regression* - Generative Learning Algorithms:
*Gaussian Discriminant Analysis, Naive Bayes, Laplace Smoothing* - Kernel Methods:
*Feature Maps, Kernel Trick* - SVM:
*Functional, Function & Geometric Margins, Optimal Margin Classifiers, Lagrange Duality, Primal vs Dual Optimization* - Deep Learning:
*Nonlinear Regression, Mini-batch SGD, Activation Functions (ReLU), 2-Layer & Multilayered Neural Networks, Vectorization, Backpropagation, Convolutional Neural Networks, Graph Neural Networks* - Decision Trees:
*Recursive Binary Splitting (Greedy Algorithms), Classification Error, Discrete/Continuous Features, Overfitting, Pruning Trees, Random Forest* - Unsupervised Learning:
*K-Means, Mixture of Gaussians, EM-Algorithm, Convexity, Evidence Lower Bound* - PCA:
*Factor Analysis, EM Algorithm, Component Eignvectors, SVD, Eigenfaces*

- Statistical Learning Theory
- Low and High Dimensional Linear Regression and Classification
- Low and High Dimensional Nonparametric Regression and Classification
- Cross Validation
- Decision Theory
- Generalized Linear Models
- Boosting and Bagging
- Density Estimation and Clustering
- Graphical Models
- Factor Analysis
- Dimensionality Reduction

- This is at the heart of all things statistics, so it's worth making a set of notes that outlines the methods of all the sampling and optimization algorithms and the theories behind them. I focus on convex optimization.
*Random Walk Metropolis w/ Preconditioning & Adaptation, Automatic Differentiation, Gradient Descent, SGLD, MALA**Phase Flows, Hamiltonian Integration, Langevin Integration, Leapfrog Integrator, Splitting Methods**Hamiltonian Monte Carlo, NUTS**Netwon's Optimization Method, BFGS, Simulated Annealing, Adam*

- Multilayer Perceptron:
*Activation Functions, Preprocessing, Weight Initialization, Weight Space Symmetries* - Network Training:
*Automatic Differentiation, Forward/Back Propagation, Numpy Implementation, PyTorch,* - Regularization and Stability:
*Early Stopping, Dropout, L1/L2 Penalty Terms, Max Norm Regularization, Normalization Layers, Data Augmentation, Sharpness Awareness Maximization, Network Pruning, Guidelines* - Convolutional Neural Nets:
*Kernels, Convolutional Layers, Pooling Layers, Architectures* - Recurrent Neural Nets:
*Uni/Bi-directional RNNs, Stacked RNNs, Loss Functions, LSTMs, GRUs* - Autoencoders:
- Transformers:

- Image Processing:
*OpenCV Functionality, Transforming, Drawing, Masking, Kernels, Color Channels* - Convolutional Neural Nets:
*Convolution Layers, Pooling Layers, Architectures* - Network Training:
*Backpropagation, Implementation from Scratch*

- Basics:
*Regular Expressions, Tokenization, Lemmization, Stemming, NLTK* - Classical Learning:
*N-Gram Model, Naive Bayes, Logistic Regression, Sentiment Analysis* - Embeddings:
*Frequency Semantics, Word2Vec, Doc2Vec, gensim,* - Recurrent Neural Nets: