Muchang Bahng

Statistics

If you were to invent statistics from scratch, how would I do it? Statistics can be seen as a "converse" of probability, and it is essentially a field that branches out from math. In probability, one takes a distribution and attempts to describe what the samples look like. In statistics, we are given the samples first and then try to infer what the distribution is. This is usually an extremely difficult problem, and so rather than trying to describe the entire distribution, we try to talk about certain parameters about the distribution (e.g. what is the mean, variance?).

At the heart of statistics is the seemingly unrelated information theory, which talks about how much information (e.g. bits) can be transmitted through a noisy channel. The concepts of entropy and KL-divergence provide a good transition between probability and statistics. At this point, we can branch off into two paradigms of inference. First is the frequentist approach, which attempts to model the parameter as a random variable that realizes through a sampling distribution. The second is the Bayesian approach, which assumes a prior distribution on the data, which with the data and Bayes rule gets updated to our posterior.

Once the foundations of these two theories are established, the main applications lie in machine learning, which uses computer science to design algorithmic approaches to statistical inference. This field can be seen as an integration of statistics and computer science, since we heavily use the theory of algorithms to optimize objective functions that are determined by statistics (e.g. greedy algorithms to fit decision trees, L1 regularization as a convex approximation of best subset regression). In fact, optimization is such an important subset of machine learning that it deserves its own set of notes. In here, I go through the main concepts in convex optimization, along with an index of other non-convex methods. It turns out that the fields of optimization and sampling (which is also used for numerical integration) are heavily related, as slight modifications of optimizers lead to samplers (e.g. SGD vs SGLD).

Sampling, Optimization, and Integration

  • This is at the heart of all things statistics, so it's worth making a set of notes that outlines the methods of all the sampling and optimization algorithms and the theories behind them. I focus on convex optimization.
  • Random Walk Metropolis w/ Preconditioning & Adaptation, Automatic Differentiation, Gradient Descent, SGLD, MALA
  • Phase Flows, Hamiltonian Integration, Langevin Integration, Leapfrog Integrator, Splitting Methods
  • Hamiltonian Monte Carlo, NUTS
  • Netwon's Optimization Method, BFGS, Simulated Annealing, Adam

Frequentist Statistics

  • Sampling Distributions: Confidence Intervals, Hypothesis Testing, Central Limit Theorem

Bayesian Statistics

  • Bayes Rule: Prior & Posterior Distributions, Likelihood, Marginalization, Bayes Box, Common Distributions, Beta Family, Multivariate Gaussians
  • Bayesian Inference: Parameter Estimation, Beta-Binomial Distribution, Conjugate Distributions & Priors, Credible Intervals, Point Estimates, Exponential Family
  • Linear Regression: Bayesian & Frequentist Regression, Basis Functions, Hyperparameters & Hierarchical Priors, Parameter Distributions & Predictive Functions, Bayesian Model Selection & Averaging, Gaussian Error/OLS & Laplace Error/LAV, L1 & L2 Regularization w/ Laplace & Gaussian Priors, Sparse Models, Equivalent Kernel
  • Markov Chain Monte Carlo: Metropolis-Hastings, Detailed Balance, Monte Carlo Integration, Gibbs Sampling

Classical Machine Learning

  • Statistical Learning Theory
  • Low and High Dimensional Linear Regression and Classification
  • Low and High Dimensional Nonparametric Regression and Classification
  • Cross Validation
  • Decision Theory
  • Generalized Linear Models
  • Boosting and Bagging
  • Density Estimation and Clustering
  • Graphical Models
  • Factor Analysis
  • Dimensionality Reduction

Deep Learning

  • Multilayer Perceptron: Activation Functions, Preprocessing, Weight Initialization, Weight Space Symmetries
  • Network Training: Automatic Differentiation, Forward/Back Propagation, Numpy Implementation, PyTorch,
  • Regularization and Stability: Early Stopping, Dropout, L1/L2 Penalty Terms, Max Norm Regularization, Normalization Layers, Data Augmentation, Sharpness Awareness Maximization, Network Pruning, Guidelines
  • Convolutional Neural Nets: Kernels, Convolutional Layers, Pooling Layers, Architectures
  • Recurrent Neural Nets: Uni/Bi-directional RNNs, Stacked RNNs, Loss Functions, LSTMs, GRUs
  • Autoencoders:
  • Transformers:

Computer Vision

  • Image Processing: OpenCV Functionality, Transforming, Drawing, Masking, Kernels, Color Channels
  • Convolutional Neural Nets: Convolution Layers, Pooling Layers, Architectures
  • Network Training: Backpropagation, Implementation from Scratch

Natural Language Processing

  • Basics: Regular Expressions, Tokenization, Lemmization, Stemming, NLTK
  • Classical Learning: N-Gram Model, Naive Bayes, Logistic Regression, Sentiment Analysis
  • Embeddings: Frequency Semantics, Word2Vec, Doc2Vec, gensim,
  • Recurrent Neural Nets:

Introduction to Machine Learning

  • Regression: Least-Squares, Normal Equations, Batch/Stochastic Gradient Descent, Polynomial Regression
  • Classification: K-Nearest Neighbors, Perceptron, Logistic Regression
  • GLMs: Exponential Family, Link Functions, GLM Construction, Softmax Regression, Poisson Regression
  • Generative Learning Algorithms: Gaussian Discriminant Analysis, Naive Bayes, Laplace Smoothing
  • Kernel Methods: Feature Maps, Kernel Trick
  • SVM: Functional, Function & Geometric Margins, Optimal Margin Classifiers, Lagrange Duality, Primal vs Dual Optimization
  • Deep Learning: Nonlinear Regression, Mini-batch SGD, Activation Functions (ReLU), 2-Layer & Multilayered Neural Networks, Vectorization, Backpropagation, Convolutional Neural Networks, Graph Neural Networks
  • Decision Trees: Recursive Binary Splitting (Greedy Algorithms), Classification Error, Discrete/Continuous Features, Overfitting, Pruning Trees, Random Forest
  • Unsupervised Learning: K-Means, Mixture of Gaussians, EM-Algorithm, Convexity, Evidence Lower Bound
  • PCA: Factor Analysis, EM Algorithm, Component Eignvectors, SVD, Eigenfaces