Muchang Bahng | Duke Math

If you were to invent statistics from scratch, how would I do it? Statistics can be seen as a ``converse'' of probability, and it is essentially a field that branches out from math. In probability, one takes a distribution and attempts to describe what the samples look like. In statistics, we are given the samples first and then try to infer what the distribution is. This is usually an extremly difficult problem, and so rather than trying to describe the entire distribution, we try to talk about certain parameters about the distribution (e.g. what is the mean, variance?).

At the heart of statistics is the seemingly unrelated information theory, which talks about how much information (e.g. bits) can be transmitted through a noisy channel. The concepts of entropy and KL-divergence provide a good transition between probability and statistics. At this point, we can branch off into two paradigms of inference. First is the frequentist approach, which attempts to model the parameter as a random variable that realizes through a sampling distribution. The second is the Bayesian approach, which assumes a prior distribution on the data, which with the data and Bayes rule gets updated to our posterior.

Once the foundations of these two theories are established, the main applications lie in machine learning, which uses computer science to design algorithmic approaches to statistical inference. This field can be seen as an integration of statistics and computer science, since we heavily use the theory of algorithms to optimize objective functions that are determined by statistics (e.g. greedy algorithms to fit decision trees, L1 regularization as a convex approximation of best subset regression). In fact, optimization is such an important subset of machine learning that it deserves its own set of notes. In here, I go through the main concepts in convex optimization, along with an index of other non-convex methods. It turns out that the fields of optimization and sampling (which is also used for numerical integration) are heavily related, as slight modifications of optimizers lead to samplers (e.g. SGD vs SGLD).

With the universal approximation theorem, better engineering, and exponentially-increasing computatonal power, deep neural networks have become extremely powerful models for complex and high-dimensional data. They start out with simple multilayer perceptrons but recent research has pushed the architectures to CNNs, RNNs, LSTMs, energy models, encoder-decoders, flow models, attention layers, and most recently diffusion models. While the models are inherent black-box in nature, several heuristics and architectures have been developed to push their applications to the field of computer vision (CV) and natural language processing (NLP). The most noticeable success of these applications come in autonomous driving and large language models (LLMs).

Finally, another subfield of machine learning is called reinforcement learning, which teaches agents to make decisions through simulations involving trial and error. These models are widely used in robotics and simulations, and this field of optimizing rewards and penalties heavily relies on game theory.

All of my personal notes are free to download, use, and distrbute under the Creative Commons "Attribution- NonCommercial-ShareAlike 4.0 International" license. Please contact me if you find any errors in my notes or have any further questions.

Information Theory

Sampling, Optimization, and Integration

This is at the heart of all things statistics, so it's worth making a set of notes that outlines the methods of all the sampling and optimization algorithms and the theories behind them. I focus on convex optimization.
Random Walk Metropolis w/ Preconditioning & Adaptation, Automatic Differentiation, Gradient Descent, SGLD, MALA
Phase Flows, Hamiltonian Integration, Langevin Integration, Leapfrog Integrator, Splitting Methods
Hamiltonian Monte Carlo, NUTS
Netwon's Optimization Method, BFGS, Simulated Annealing, Adam

Frequentist Statistics

Sampling Distributions: Confidence Intervals, Hypothesis Testing, Central Limit Theorem

Bayesian Statistics

Bayes Rule: Prior & Posterior Distributions, Likelihood, Marginalization, Bayes Box, Common Distributions, Beta Family, Multivariate Gaussians
Bayesian Inference: Parameter Estimation, Beta-Binomial Distribution, Conjugate Distributions & Priors, Credible Intervals, Point Estimates, Exponential Family
Linear Regression: Bayesian & Frequentist Regression, Basis Functions, Hyperparameters & Hierarchical Priors, Parameter Distributions & Predictive Functions, Bayesian Model Selection & Averaging, Gaussian Error/OLS & Laplace Error/LAV, L1 & L2 Regularization w/ Laplace & Gaussian Priors, Sparse Models, Equivalent Kernel
Markov Chain Monte Carlo: Metropolis-Hastings, Detailed Balance, Monte Carlo Integration, Gibbs Sampling

Classical Machine Learning

Statistical Learning Theory
Low and High Dimensional Linear Regression and Classification
Low and High Dimensional Nonparametric Regression and Classification
Cross Validation
Decision Theory
Generalized Linear Models
Boosting and Bagging
Density Estimation and Clustering
Graphical Models
Factor Analysis
Dimensionality Reduction

Deep Learning

Multilayer Perceptron: Activation Functions, Preprocessing, Weight Initialization, Weight Space Symmetries
Network Training: Automatic Differentiation, Forward/Back Propagation, Numpy Implementation, PyTorch,
Regularization and Stability: Early Stopping, Dropout, L1/L2 Penalty Terms, Max Norm Regularization, Normalization Layers, Data Augmentation, Sharpness Awareness Maximization, Network Pruning, Guidelines
Convolutional Neural Nets: Kernels, Convolutional Layers, Pooling Layers, Architectures
Recurrent Neural Nets: Uni/Bi-directional RNNs, Stacked RNNs, Loss Functions, LSTMs, GRUs
Autoencoders:
Transformers:

Computer Vision

Image Processing: OpenCV Functionality, Transforming, Drawing, Masking, Kernels, Color Channels
Convolutional Neural Nets: Convolution Layers, Pooling Layers, Architectures
Network Training: Backpropagation, Implementation from Scratch

Natural Language Processing

Basics: Regular Expressions, Tokenization, Lemmization, Stemming, NLTK
Classical Learning: N-Gram Model, Naive Bayes, Logistic Regression, Sentiment Analysis
Embeddings: Frequency Semantics, Word2Vec, Doc2Vec, gensim,
Recurrent Neural Nets:

Decision Theory

Reinforcement Learning

Introduction to Machine Learning

Regression: Least-Squares, Normal Equations, Batch/Stochastic Gradient Descent, Polynomial Regression
Classification: K-Nearest Neighbors, Perceptron, Logistic Regression
GLMs: Exponential Family, Link Functions, GLM Construction, Softmax Regression, Poisson Regression
Generative Learning Algorithms: Gaussian Discriminant Analysis, Naive Bayes, Laplace Smoothing
Kernel Methods: Feature Maps, Kernel Trick
SVM: Functional, Function & Geometric Margins, Optimal Margin Classifiers, Lagrange Duality, Primal vs Dual Optimization
Deep Learning: Nonlinear Regression, Mini-batch SGD, Activation Functions (ReLU), 2-Layer & Multilayered Neural Networks, Vectorization, Backpropagation, Convolutional Neural Networks, Graph Neural Networks
Decision Trees: Recursive Binary Splitting (Greedy Algorithms), Classification Error, Discrete/Continuous Features, Overfitting, Pruning Trees, Random Forest
Unsupervised Learning: K-Means, Mixture of Gaussians, EM-Algorithm, Convexity, Evidence Lower Bound
PCA: Factor Analysis, EM Algorithm, Component Eignvectors, SVD, Eigenfaces

Image and Video Processing [ECE 588]