contents
1 An overview of machine learning and deep learning
1.1 A first look at machine/deep learning: A paradigm shift in computation
1.2 A function approximation view of machine learning: Models and their training
1.3 A simple machine learning model: The cat brain
1.4 Geometrical view of machine learning
1.5 Regression vs. classification in machine learning
1.6 Linear vs. nonlinear models
1.7 Higher expressive power through multiple nonlinear layers: Deep neural networks
2 Vectors, matrices, and tensors in machine learning
2.1 Vectors and their role in machine learning
The geometric view of vectors and its significance in machine learning
2.2 PyTorch code for vector manipulations
PyTorch code for the introduction to vectors
2.3 Matrices and their role in machine learning
Matrix representation of digital images
2.4 Python code: Introducing matrices, tensors, and images via PyTorch
2.5 Basic vector and matrix operations in machine learning
Dot product of two vectors and its role in machine learning
Matrix multiplication and machine learning
Length of a vector (L2 norm): Model error
Geometric intuitions for vector length
Geometric intuitions for the dot product: Feature similarity
2.6 Orthogonality of vectors and its physical significance
2.7 Python code: Basic vector and matrix operations via PyTorch
PyTorch code for a matrix transpose
PyTorch code for a dot product
PyTorch code for matrix vector multiplication
PyTorch code for matrix-matrix multiplication
PyTorch code for the transpose of a matrix product
2.8 Multidimensional line and plane equations and machine learning
Multidimensional line equation
Multidimensional planes and their role in machine learning
2.9 Linear combinations, vector spans, basis vectors, and collinearity preservation
Vector spaces, basis vectors, and closure
2.10 Linear transforms: Geometric and algebraic interpretations
Generic multidimensional definition of linear transforms
All matrix-vector multiplications are linear transforms
2.11 Multidimensional arrays, multilinear transforms, and tensors
Array view: Multidimensional arrays of numbers
2.12 Linear systems and matrix inverse
Linear systems with zero or near-zero determinants, and ill-conditioned systems
PyTorch code for inverse, determinant, and singularity testing of matrices
Over- and under- determined linear systems in machine learning
Moore Penrose pseudo-inverse of a matrix
Pseudo-inverse of a matrix: A beautiful geometric intuition
PyTorch code to solve overdetermined systems
2.13 Eigenvalues and eigenvectors: Swiss Army knives of machine learning
Eigenvectors and linear independence
Symmetric matrices and orthogonal eigenvectors
PyTorch code to compute eigenvectors and eigenvalues
2.14 Orthogonal (rotation) matrices and their eigenvalues and eigenvectors
Orthogonality of rotation matrices
PyTorch code for orthogonality of rotation matrices
Eigenvalues and eigenvectors of a rotation matrix: Finding the axis of rotation
PyTorch code for eigenvalues and vectors of rotation matrices
PyTorch code for matrix diagonalization
Solving linear systems without inversion via diagonalization
PyTorch code for solving linear systems via diagonalization
Matrix powers using diagonalization
2.16 Spectral decomposition of a symmetric matrix
PyTorch code for the spectral decomposition of a matrix
2.17 An application relevant to machine learning: Finding the axes of a hyperellipse
PyTorch code for hyperellipses
3 Classifiers and vector calculus
3.1 Geometrical view of image classification
Classifiers as decision boundaries
Sign of the surface function in binary classification
3.3 Minimizing loss functions: Gradient vectors
Gradients: A machine learning-centric introduction
Level surface representation and loss minimization
3.4 Local approximation for the loss function
Multidimensional Taylor series and the Hessian matrix
3.5 PyTorch code for gradient descent, error minimization, and model training
PyTorch code for linear models
Autograd: PyTorch automatic gradient computation
A linear model for the cat brain in PyTorch
3.6 Convex and nonconvex functions, and global and local minima
Convexity and the Taylor series
4 Linear algebraic tools in machine learning
4.1 Distribution of feature data points and true dimensionality
4.2 Quadratic forms and their minimization
Symmetric positive (semi)definite matrices
4.3 Spectral and Frobenius norms of a matrix
4.4 Principal component analysis
PyTorch code: PCA and dimensionality reduction
4.5 Singular value decomposition
Informal proof of the SVD theorem
Applying SVD: Solving arbitrary linear systems
PyTorch code for solving linear systems with SVD
PyTorch code for PCA computation via SVD
Applying SVD: Best low-rank approximation of a matrix
4.6 Machine learning application: Document retrieval
Using TF-IDF and cosine similarity
PyTorch code to compute LSA and SVD on a large dataset
5 Probability distributions in machine learning
5.1 Probability: The classical frequentist view
5.3 Basic concepts of probability theory
Probabilities of impossible and certain events
Exhaustive and mutually exclusive events
5.4 Joint probabilities and their distributions
Dependent events and their joint probability distribution
5.5 Geometrical view: Sample point distributions for dependent and independent variables
5.6 Continuous random variables and probability density
5.7 Properties of distributions: Expected value, variance, and covariance
Variance, covariance, and standard deviation
5.8 Sampling from a distribution
5.9 Some famous probability distributions
Gaussian (normal) distribution
Categorical distribution and one-hot vectors
6 Bayesian tools for machine learning
6.1 Conditional probability and Bayes’ theorem
Joint and marginal probability revisited
6.2 Entropy
Geometrical intuition for entropy
6.3 Cross-entropy
6.4 KL divergence
Chain rule of conditional entropy
6.6 Model parameter estimation
Likelihood, evidence, and posterior and prior probabilities
Maximum likelihood parameter estimation (MLE)
Maximum a posteriori (MAP) parameter estimation and regularization
6.7 Latent variables and evidence maximization
6.8 Maximum likelihood parameter estimation for Gaussians
Python PyTorch code for maximum likelihood estimation
Python PyTorch code for maximum likelihood estimation using gradient descent
Probability density function of the GMM
Latent variables for class selection
Maximum likelihood estimation of GMM parameters (GMM fit)
7 Function approximation: How neural networks model the world
7.1 Neural networks: A 10,000-foot view
7.2 Expressing real-world problems: Target functions
Logical functions in real-world problems
Classifier functions in real-world problems
General functions in real-world problems
7.3 The basic building block or neuron: The perceptron
Perceptrons and classification
Modeling common logic gates with perceptrons
7.4 Toward more expressive power: Multilayer perceptrons (MLPs)
7.5 Layered networks of perceptrons: MLPs or neural networks
Modeling logical functions with MLPs
Cybenko’s universal approximation theorem
MLPs for polygonal decision boundaries
8 Training neural networks: Forward propagation and backpropagation
8.1 Differentiable step-like functions
8.2 Why layering?
8.3 Linear layers
Linear layers expressed as matrix-vector multiplication
Forward propagation and grand output functions for an MLP of linear layers
8.4 Training and backpropagation
Loss and its minimization: Goal of training
Loss surface and gradient descent
Why a gradient provides the best direction for descent
Gradient descent and local minima
Putting it all together: Overall training algorithm
8.5 Training a neural network in PyTorch
9 Loss, optimization, and regularization
9.1 Loss functions
Quantification and geometrical view of loss
Binary cross-entropy loss for image and vector mismatches
9.2 Optimization
Geometrical view of optimization
Stochastic gradient descent and minibatches
Geometric view: Constant loss contours, gradient descent, and momentum
Nesterov accelerated gradients
9.3 Regularization
Minimum descriptor length: An Occam’s razor view of optimization
Sparsity: L1 vs. L regularization
Bayes’ theorem and the stochastic view of optimization
10 Convolutions in neural networks
10.1 One-dimensional convolution: Graphical and algebraical view
Curve smoothing via 1D convolution
Curve edge detection via 1D convolution
One-dimensional convolution as matrix multiplication
PyTorch: One-dimensional convolution with custom weights
10.3 Two-dimensional convolution: Graphical and algebraic view
Image smoothing via 2D convolution
Image edge detection via 2D convolution
PyTorch: 2D convolution with custom weights
Two-dimensional convolution as matrix multiplication
10.4 Three-dimensional convolution
Video motion detection via 3D convolution
PyTorch: Three-dimensional convolution with custom weights
10.5 Transposed convolution or fractionally strided convolution
Application of transposed convolution: Autoencoders and embeddings
Transposed convolution output size
Upsampling via transpose convolution
10.6 Adding convolution layers to a neural network
PyTorch: Adding convolution layers to a neural network
10.7 Pooling
11 Neural networks for image classification and object detection
11.1 CNNs for image classification: LeNet
PyTorch: Implementing LeNet for image classification on MNIST
11.2 Toward deeper neural networks
VGG (Visual Geometry Group) Net
Inception: Network-in-network paradigm
ResNet: Why stacking layers to add depth does not scale
11.3 Object detection: A brief history
11.4 Faster R-CNN: A deep dive
Other object-detection paradigms
12 Manifolds, homeomorphism, and neural networks
12.1 Manifolds
12.2 Homeomorphism
12.3 Neural networks and homeomorphism between manifolds
13 Fully Bayes model parameter estimation
13.1 Fully Bayes estimation: An informal introduction
Parameter estimation and belief injection
13.2 MLE for Gaussian parameter values (recap)
13.3 Fully Bayes parameter estimation: Gaussian, unknown mean, known precision
13.4 Small and large volumes of training data, and strong and weak priors
13.5 Conjugate priors
13.6 Fully Bayes parameter estimation: Gaussian, unknown precision, known mean
Estimating the precision parameter
13.7 Fully Bayes parameter estimation: Gaussian, unknown mean, unknown precision
Estimating the mean and precision parameters
13.8 Example: Fully Bayesian inferencing
13.9 Fully Bayes parameter estimation: Multivariate Gaussian, unknown mean, known precision
13.10 Fully Bayes parameter estimation: Multivariate, unknown precision, known mean
14 Latent space and generative modeling, autoencoders, and variational autoencoders
14.1 Geometric view of latent spaces
14.3 Benefits and applications of latent-space modeling
14.4 Linear latent space manifolds and PCA
PyTorch code for dimensionality reduction using PCA
14.5 Autoencoders
14.6 Smoothness, continuity, and regularization of latent spaces
VAE training, losses, and inferencing
Stochastic mapping leads to latent-space smoothness
Direct minimization of the posterior requires prohibitively expensive normalization
Choice of prior: Zero-mean, unit-covariance Gaussian