Index

A

AdaGrad algorithm 326327

Adam optimizer algorithm 328329

AlexNet 391

argmaxonehot function 307308

arrays 53

asymptotically approaching zero 174

autoencoders 469, 478481

autoencoder decoder, code listing 481

autoencoder encoder, code listing 480

autoencoder training, code listing 481

decoder, definition of 478

definition of 305, 377, 478

embedding an image 305

encoder and decoder neural networks, discussion of 479480

encoder, definition of 478

hyperparameter, definition of 478

PCA and 481

reconstruction loss, definition of 478

representation learning, definition of 478

schematic representation of an autoencoder 479

as unsupervised 479

See also variational autoencoders (VAEs)

average pooling, definition of 381

B

backpropagation algorithm 286294

algorithm for training a neural network 294295

backpropagation algorithm on an arbitrary network of linear layers 290294

definition of 286

evaluating on a simple MLP with a single neuron per layer 286

forward and backward propagation, code listing 289290

forward pass, definition of 289

Hadamard product, definition of 292

performing min-max normalization in PyTorch, code listing 296

training a neural network in PyTorch 295298

using an optimizer to update weights 298

Bayes’ theorem 194, 196198, 448

Bernoulli distribution 188189

binary classifiers 83, 88

binomial distribution 180184

bounding box, definition of 385

C

categorical variables, definition of 242

centroid, definition of 163

classification loss, definition of 423

classifiers

binary classifiers 88

charted examples of good and bad decision boundaries 250

charts of cat-brain threat-model decision boundaries 247249

as decision boundaries 8485, 246247

decision boundary as a hypersurface 249

definition of 12, 245

estimating a decision boundary 251

feature space 246

forming mental pictures of hyperspaces with 3D analogs 249

geometric depiction of a classification problem 85

as a hypersurface 85

input space 246

modeling the classifier, definition of 86

continuous random variable 152

continuous variables, definition of 242

convex and nonconvex functions

convex curves and surfaces, three definitions of 110112

convexity and the Taylor series 112113

examples of convex functions 113

introduction to 109110

convolution

convolution layers 344, 380381

convolution output size 356

description of 343345

expressing convolution layers as matrix-vector multiplications 344

one-dimensional 345356

three-dimensional 368374

transposed 374379

two-dimensional 356368

convolution, one-dimensional 345356

1D edge detection, code listing 355

1D local averaging convolution, code listing 354355

convolution output size 356

curve edge detection via 1D convolution 350351

curve smoothing via 1D convolution 350

detecting edges as a way to understand images 351

directly invoking the convolution function, code listing 356

edge, definition of 351

formula for generating a single output value in 1D convolution 349

graphical and algebraical view 345

how to visualize a 1D convolution 345

input, definition of 345

kernel, definition of 345

as matrix multiplication 351354

output, definition of 345

padding, definition of 346347

same (zero) padding 347, 352

setting the weights of a 1D kernel 354

stride, definition of 345346

valid padding 347, 352

convolution, three-dimensional 368374

3D convolution with custom weights, code listing 373374

diagrams illustrating the spatio-temporal view of 3D convolution 370

generating a single output value in 3D convolution 370

how a kernel extracts motion information from video frames 371

how to visualize a 3D convolution 369

illustration of a 3D convolution motion detector 372

video as a 3D entity extending over a spatio-temporal volume 368

video motion detection via 3D convolution 370371

convolution, transposed 374379

2D convolution and its transpose 377

autoencoder, definition of 377

decoder, definition of 376

descriptor vector, definition of 375

embedding as an effective compression technique 377

embedding, definition of 375

encoder, definition of 375

end-to-end learning, definition of 377

fractionally strided convolution–375

illustration of a 1D convolution and its transpose 376

illustration of a 2D convolution and its transpose 377

output size 377

upsampling using transpose convolutions, code listing 378379

why autoencoders need transposed convolution 375

convolution, two-dimensional 356368

2D convolution as matrix multiplication 366368

2D convolution with custom weights 363365

2D edge detection, code listing 366

2D local averaging convolution, code listing 365366

comparing Euclidean distance to Manhattan distance 358

generating a single output value in 2D convolution 361362

graphical and algebraic view 356358

how to visualize a 2D convolution 358

image edge detection via 2D convolution 362363

image smoothing via 2D convolution 362

image, definition of 356357

input, definition of 359

kernel, definition of 359

output, definition of 359

padding, definition of 361

same (zero) padding 361

stride, definition of 360361

two-dimensional neighborhoods not preserved by rasterization 358

valid padding 361

convolutional neural networks (CNNs)

AlexNet 391

benefits of neural networks with multiple convolutional layers 388391

bounding box, definition of 385

feature map, definition of 388

GoogLeNet 398

image classification, definition of 386

Inceptionv1 architecture 397401

LeNet architecture, components of 387389

MNIST data set, sample images from 387

object detection, definition of 386

ResNet architecture 401406

VGG (Visual Geometry Group) Net 391397

covariance

covariance as the multivariate analog of variance 165167

covariance of a multivariate Gaussian distribution 178180

variance, covariance, and standard deviation 164165

zero-mean, unit-covariance Gaussian for the known prior 490492

See also variance

cross-entropy loss

binary cross-entropy loss, code listing 305

definition of 303

Cybenko’s universal approximation theorem 261262

D

data imbalance, definition of 310311

decision making 24

decoder, definition of 376, 478

deep learning, overview of 117

deep neural networks, definition of 260

dependent events 157159

descriptor vector, definition of 375

determinants 499

differentiable step-like functions 273276

graph of the derivatives of 1D sigmoid and tanh functions 276

Heaviside step function as not differentiable 273

sigmoid function and its properties 273275

tanh function 275276

dimensionality reduction, definition of 469

discriminative functions 252

document descriptor space 116118

document retrieval problem 141147

dot product and cosine of the angle between two vectors 497498

downsampling, definition of 381

dropout 336339

E

eigenvalues and eigenvectors 6265, 6769, 7273

encoder, definition of 375, 478

entropy 198212

applying to continuous and multidimensional random variables 201

chain rule of conditional entropy 212

charts of entropies of peaked and flat distributions 202

charts of KLD between example distributions 211

computing the cross-entropy of a Gaussian, code listing 202

computing the entropy of a Gaussian distribution, code listing 204

computing the KLD, code listing 210

conditional entropy 210212

cross-entropy 204207

definition of 200201

entropy of Gaussians 203204

examples of 199

geometrical intuition for entropy 201203

Huffman encoding 200

Kullback Leibler divergence (KLD) 207210

prefix coding 200

quantifying the uncertainty associated with a chancy event 199

variable bit-rate coding 200

epoch, definition of 302

error. See loss function (error)

evidence lower bound (ELBO) 488490

expected value (mean) 162163

expressive power 14, 16, 240

F

Fast R-CNN architecture 429

Faster R-CNN, high level architecture 414

feature space, definition of 10

first-order approximation 101

fixed point, definition of 231

focal loss 310312

frequentist paradigm 151

Frobenius norms, definition of 122123

Fully Bayes estimation 448453

Bayes’ theorem 448449

Bayesian estimation with unknown mean, known variance, code listing 452453

Bayesian estimation with unknown mean, unknown variance, code listing 459

Bayesian estimation with unknown variance, known mean, code listing 456

Bayesian inference 460461

computing posterior probability using Bayesian inference, code listing 461

conjugate priors 454

estimating precision 464466

estimating the mean and precision parameters 457458

estimating the precision parameter when the mean is known 455456

Fully Bayesian inferencing 459461

Gaussian, unknown mean, known precision 450453

Gaussian, unknown precision, known mean 454

maximum a posteriori (MAP) estimation 448

maximum likelihood estimation 460

maximum likelihood parameter estimation (MLE) 448

MLE for Gaussian parameter values, recap of 449450

multivariate Bayesian inferencing, unknown mean 463

multivariate Gaussian, unknown mean, known precision 461463

multivariate, unknown precision, known mean 463466

normal-gamma distribution 457459

parameter estimation and belief injection 448449

prior probability density 448

Wishart distribution 454, 463464

fully connected layer 277

function family 86

function-fitting problem 89

G

Gamma distribution 454455, 503505

Gamma function, overview of 502503

Gaussian (normal) distribution 173180

asymptotically approaching zero 174

bell-shaped curve 174

Bernoulli distribution 188189

binomial distribution 180184

categorical distribution and one-hot vectors 189190

chart of a univariate Gaussian random probability density function 177

computing the variance of 499501

covariance of a multivariate Gaussian distribution 178180

expected value of a Bernoulli distribution 188189

expected value of a binomial distribution 184185

expected value of a categorical distribution 190191

expected value of a Gaussian distribution 176177

Gaussian probability density function

geometry of sampled point clouds 180

log probability of a Bernoulli distribution, code listing 188

log probability of a binomial distribution, code listing 183184

log probability of a multinomial distribution, code listing 186

log probability of a univariate normal distribution, code listing 175

mean and variance of a Bernoulli distribution, code listing 189

mean and variance of a multinomial distribution, code listing 187188

mean and variance of a multivariate normal distribution, code listing 179

mean and variance of a univariate Gaussian, code listing 178

multinomial distribution 185187

multivariate Gaussian 175176

multivariate Gaussian point clouds and hyper-ellipses 180

outlier values 174

probability of a categorical distribution 190

variance of a Bernoulli distribution 189

variance of a binomial distribution 185

Gaussian mixture models (GMM) 215237

algorithm of GMM fit (MLE of GMM parameters) 236

charts of two-dimensional GMMs with circular and elliptical bases 226

classification via GMM 230

fixed point, definition of 231

Gaussian mixture model distribution 229

GMM fit, code listing 236237

latent variables for class selection 227229

maximum likelihood estimation of GMM parameters (GMM fit) 230237

probability density function of the GMM 223227

progression of maximum likelihood estimation for GMM parameters 232

generative functions 252

generative modeling, definition of 447448

global minimum 303

GoogLeNet 398

gradients 9599

becoming zero at the optimum 96

gradient example in 3D 9899

gradient vectors and minimizing loss functions 8990

introduction to 9091

as the vector of all the partial derivatives 95

ground truth 6, 241

GT vector 301, 303

H

Hadamard product, definition of 292

Hausdorff property, definition of 441442

Heaviside step function 252253, 273

Hessian matrix 101, 284

hidden layers 260

hinge loss function 312314

histograms 152153

homeomorphism 443445

Huffman encoding 200

human labeling (human curation), definitionof 6

hyperparameter, definition of 478

hyperplanes 85, 253254

I

image 8384, 386

Inceptionv1 architecture 397401

description of its network-in-network paradigm 397399

diagram of 398

GoogLeNet 398

implementing a dimensionality reduced Inception block 400401

implementing a naive Inception block, code listing 399

inferencing 4, 6, 10, 241

input variables 242

inputs, normalizing 7

iterative training 89

J

Jensen’s inequality theorem 501502

joint probability, definition of 155

Jupyter Notebook 1819, 22

K

Kullback–Leibler divergence (KLD) 207210

L

L’Hospital’s rule 500

L1 regularization 333334

L2 regularization 332334

labeling 241, 273

latent or hidden variables/parameters 216, 228

latent semantic analysis (LSA) 118, 142147

latent spaces 468496

comparing discriminative and generative models 471472

considering the space of natural and digital images 469

dimensionality reduction 469, 475

dimensionality reduction using PCA, code listing 477478

discriminative classifiers, definitionof 471

generative classifiers 471472

generative models, properties of 471472

geometric view of 469471

illustration of good and bad discriminative classifiers 472

latent space modeling briefly explained 470471

latent vector, definition of 468

latent-space modeling, benefits and applications 472473

linear latent space manifolds and PCA 474478

manifold as capturing the essence of a common property 469

mapping from a 2D input space to a 1D latent space 482

observed vector, definition of 468

PCA as a special case of latent space representation 471

regularization as creating a more compact latent space 483

smoothness, continuity, and regularization of 481483

steps involved in a PCA-based dimensionality reduction 475477

two examples of latent subspaces, with planar and curved manifolds 470

learning rate (LR) 285, 315

LeNet architecture 387389

implementing LeNet for image classification on MNIST, code listing 388389

output feature map passed through two fully connected (FC) layers 388

PyTorch Lightning 406411

subsampling (pooling) layers 388

tanh activation layer 388

three convolutional layers of 5×5 kernels 387388

level contours 97

linear layers 277281

algorithm for training a neural network 294295

backpropagation algorithm 286294

diagram of a complete multilayered neural network 278

forward and backward propagation, code listing 289290

forward propagation of a single linear layer 280281

forward propagation, code listing 281

fully connected layer 277

gradient descent and local minima 285286

Hadamard product, definition of 292

Hessian matrix 284

learning rate 285

loss and its minimization 282283

loss surface and gradient descent 283286

as matrix-vector multiplication 277280

mean squared error (MSE) function 282

MSE loss, code listing 283

never using test data for training 281

performing min-max normalization in PyTorch, code listing 296

training a neural network in PyTorch 298

training and backpropagation 281282

tunable hyperparameter 285

using an optimizer to update weights 298

linear vs. nonlinear models 1214

local minimum 303

local response normalization (LRN) layers 391

local translation invariance, definitionof 381

logical functions 242245

definition of 242

logical AND 243244

logical NOT 244245

logical OR 242243

logical XOR 244245

m-out-of-n trigger 244

multi-input logical AND 244

multi-input logical OR 244

log-sum inequality theorem 502

loss function (error)

autoencoders, definition of 305

binary cross-entropy loss for image and vector mismatches 305306

binary cross-entropy loss, code listing 305

computing the gradient of the loss function 316

creating a custom neural network model, code listing 318

creating a custom PyTorch data set, code listing 317

cross-entropy loss, code listing 304

cross-entropy loss, definition of 303

data imbalance, definition of 310311

definition of 8991, 301

epoch, definition of 302

equation for describing a full neural network 301

focal loss 310312

generating one training loop, code listing 319

global minimum 303

gradient vectors and minimizing loss functions 8990

GT vector, definition of 301302

local approximation for the loss function 99100

local minimum 303

loss function and SGD optimizer, code listing 318

loss surfaces and their minimization 303

loss surfaces, description of 302303

minimizing 83, 8889

multi-dimensional loss functions 9394

one-dimensional loss functions 9193

output vector 303

prediction vector 303

regression loss, code listing 303

running the training loop num epochs times, code listing 319

softmax function 306310

total error, definition of 89

total training loss, definition of 301

using a squared error function 89

visualizing loss surfaces 97

M

machine learning

analogy to the human brain 5

cat brain model 7

chart of the 2D input point space for the cat brain model 11

classifier, definition of 12

collinearity as implying linear dependence 47

computing eigenvectors and eigenvalues 67

computing the simplified cat brain threat score model 11

defining the span of a set of vectors 4748

dot product and the difference between two unit vectors 3839

dot product of two vectors 2930

eigenvalues and eigenvectors 6265

eigenvectors and linear independence 6566

entropy 198212

equations for describing a multilayered neural network 16

estimating a threat score 7

example cat-brain dataset matrix 24

example training dataset 24

expressive power 14, 16

feature space, definition of 10

finding the axes of a hyperellipse 7879

formula for transforming an arbitrary input value to a normalized value 7

from arbitrary input to the desired output during inferencing 5

generating the right outcome on never-before-seen data 5

generic multidimensional definition of linear transforms 51

geometric intuitions for dot product and vector length 3637

geometrical view of 1011

introduction to vectors via PyTorch 23

Kullback–Leibler divergence (KLD) 207210

latent semantic analysis (LSA) 118

learning, definition of 5

linear dependence 4647

linear systems with zero or near-zero determinants 5557

linear vs. nonlinear models 1214

list of problem solving stages 4

machine learning model error 3436

matrix diagonalization 7374

matrix powers using diagonalization 7677

matrix-matrix multiplication 3233

matrix-vector multiplication 3132

matrix-vector multiplications as linear transforms 5253

measuring the component of a vector along a coordinate axis 3738

minimizing a quadratic form in machine learning problems 121

model estimation 8

multidimensional line and plane equations 4246

multidimensional line equation 4243

multidimensional planes 4346

multilayered neural network, diagram of 15

natural language processing (NLP) 118

over-determined and under-determined linear systems 5759

as a paradigm shift in computing 3

performing basic vector and matrix operations 2628

principal component analysis (PCA) 118

producing Python code using Jupyter Notebook 22

quadratic form, definition of 118

regressor, definition of 12

retrieving documents that match a query phrase 140147

role of matrices in 23

role of vectors in 1921

sigmoid function, definition of 14

singular value decomposition (SVD) 130140

solving linear systems without inversion via diagonalization 7475

spectral decomposition of a symmetric matrix 77

squared error 34

sticking to any fixed coordinate system 22

supervised machine learning 194

supervised vs. unsupervised learning 4

symmetric matrices and orthogonal eigenvectors 66

target output 4

training data, definition of 4

training, definition of 5

transpose of matrix products 33

trying to model the unknown transformation function 6

unsupervised machine learning 193194

using 3D analogues for higher dimensional spaces 22

using PyTorch code for vector manipulations 22

See also neural networks

manifolds 438443

applying calculus to a locally Euclidean property 440441

bounded, compact, and precompact sets 443

definition of 438

d-manifold, definition of 440

example manifolds and non-manifolds in 1D and 2D 440

Hausdorff property, definition of 441442

manifolds as locally Euclidean 440

mapping points from one manifold to another 439

neural networks and 438

open sets, closed sets, and boundaries 442

second countable property of manifolds 442443

mathematical notations used throughout the text 506

matrices

applying rotation matrices 69

basic vector and matrix operations in machine learning 2628

converting a matrix into a vector via rasterization 84

data matrix columns as dimensions in the feature space 115

data matrix rows as representing feature vectors 115

example cat-brain dataset matrix 24

Frobenius norms, definition of 122123

full-rank matrices, definition of 137

introducing matrices via PyTorch 2526

inverting a matrix and computing its determinant 57

linear systems and matrix inverse 5355

matrix and vector transpose 2829

matrix diagonalization 7374

matrix powers using diagonalization 7677

matrix, definition of 23

matrix-matrix multiplication 3233

matrix-vector multiplication 3132

Moore Penrose pseudo-inverse of a matrix 5962

orthogonal (rotation) matrices and their eigenvalues and eigenvectors 6769

orthogonality and length-preservation 71

orthogonality of rotation matrices 7172

rank of a matrix, definition of 137

representing digital images as matrices 25

role in machine learning 23

slicing and dicing matrices 26

solving an overdetermined system using the pseudo-inverse 62

spectral norms, definition of 122

symmetric positive semidefinite matrices 121122

transpose of matrix products 33

using linear algebraic tools to analyze matrix structures 115116

max pooling, definition of 381

maximum a posteriori (MAP) estimation 448

maximum likelihood parameter estimation (MLE) 448

mean squared error (MSE) function 282

model architecture 4, 6, 8

model parameter estimation 213222

estimating the model parameters from the unlabeled training data 213

examining the likelihood term 213

examining the prior probability term 213

Gaussian mixture models (GMMs) 215

Gaussian negative log-likelihood for training data, code listing 219220

Gaussian negative log-likelihood with regularization, code listing 221222

latent variables and evidence maximization 215216

likelihood, evidence, and posterior and prior probabilities 213214

maximum a posteriori (MAP) parameter estimation and regularization 215

maximum likelihood estimate for a Gaussian, code listing 219

maximum likelihood parameter estimation (MLE) 214215

maximum likelihood parameter estimation for Gaussians 216218

minimizing MLE loss via gradient descent, code listing 220

using the log-likelihood trick 214

modeling

inferencing 4

linear vs. nonlinear models 1214

model architecture selection 86

model training 4, 810, 86

overall algorithm for training a supervised model 90

training error, definition of 86

trying to model the unknown transformation function 6

momentum 320325

AdaGrad algorithm 326327

Adam optimizer algorithm with bias correction 328329

chart showing an overfitting of data points in a binary classifier 331

explanation of 320321

L1 regularization 333334

L2 regularization 332334

momentum-based gradient descent 322

Nesterov accelerated gradients 322325

overfitting and underfitting 330

regularization 330

root mean squared propagation (RMSProp) 327328

viewing regularization as minimizing descriptor length 332

Multibox Single-Shot Detector (SSD) 436

multidimensional functions 9395

multidimensional integral 161162

N

natural language processing (NLP) 118

Nesterov accelerated gradients 322325

neural networks

adjusting its architecture and parameter values 240

algorithm for training a neural network 294295

backpropagation algorithm 286294

categorical variables, definition of 242

charts of cat-brain threat-model decision boundaries 247249

charts of good and bad decision boundaries 250

choosing an architecture 240

classifier functions 245246

classifying into supervised and unsupervised neural networks 241

continuous variables, definition of 242

decision boundaries, definition of 246247

decision boundary as a hypersurface 249

determining parameter values through training 240

diagram of a complete multilayered neural network 278

diagram of a multilayered neural network 15

differentiable step-like functions 273276

discriminative functions 252

equations for describing a multilayered neural network 16

estimating a decision boundary 251

expressing real-world problems in target functions 240242

expressive power, definition of 240

feature space 246

forming mental pictures of hyperspaces with 3D analogs 249

forward and backward propagation, code listing 289290

fully connected layer 277

generative functions 252

gradient descent and local minima 285286

ground truth 241

Hadamard product, definition of 292

Heaviside step function 252253, 273

Hessian matrix 284

hyperplanes 253254

inferencing 241

input space 246

input variables 242

labeling 241, 273

learning rate 285

linear layers 276281

logical AND function 243244

logical functions, definition of 242245

logical NOT function 244245

logical OR function 242243

logical XOR function 244245

making a probabilistic statement of output correctness 241

manual annotation 241

mean squared error (MSE) function 282

m-out-of-n trigger function 244

multi-input logical AND function 244

multi-input logical OR function 244

multilayer perceptrons (MLPs) 259269

neuron, basic description of 240, 252

output variables 242

overview of 240241

perceptrons 254269

performing min-max normalization in PyTorch, code listing 296

sigmoid function and its properties 275

supervised neural networks 273

supervised training data 241

tanh function 275276

target output 241

training a neural network in PyTorch 295298

training data, definition of 272

tunable hyperparameter 285

using an optimizer to update weights 298

weights 241

See also machine learning

neuron, description of 240, 252

non-maxima suppression (NMS) algorithm 425426

O

object detectors 411436

anchors and their configurations, description of 415

assigning GT labels for each anchor box, code listing 420

assigning targets to anchor boxes 421422

classification loss, definition of 423

classifier predicting an objectness value 417

contributions and improvements of Fast R-CNN 412413

dealing with the imbalance between negative and positive anchors 421

Fast R-CNN and Rols 427435

Fast R-CNN architecture 429

Fast R-CNN inference 433434

Fast R-CNN loss function 431432

Fast R-CNN RoI head, code listing 430431

Faster R-CNN and its two core modules 412413

Faster R-CNN, high level architecture 414

FCN of the RPN, code listing 418419

Feature Pyramid Network (FPN) 436

FRCNN guidelines for assigning labels to anchor boxes 420

fully convolutional network (FCN) architecture 417418

generating a target (GT) for an RPN 421

generating all anchors for a given image 417

generating anchors at a particular grid point, code listing 416

generating region proposals 424425

Multibox Single-Shot Detector (SSD) 435436

NMS of RoIs, code listing 427

non-maxima suppression (NMS) algorithm 425426

other object-detection paradigms 435436

R-CNN module 413414

Region proposal network (RPN) 413415

regression loss, definition of 423

Rol pooling 428429

RPN loss function 423424

three stages in the R-CNN approach to object detection 411412

training the Fast R-CNN 431

training the Faster R-CNN 434435

You Only Look Once (YOLO) 435

observed vector, definition of 468

one-dimensional loss functions 9193

optimization 314316

AdaGrad algorithm 326327

Adam optimizer algorithm with bias correction 328329

Bayes’ theorem and the stochastic view of optimization 334335

creating a custom neural network model, code listing 318

creating a custom PyTorch data set, code listing 317

definition of 301302, 315

dropout 336339

generating one training loop, codelisting 319

L1 regularization 333334

L2 regularization 332334

learning rate (LR) 315

loss function and SGD optimizer, code listing 318

map optimization 335336

MLE-based optimization 335

overfitting and underfitting 330

overfitting of data points in a binary classifier 331

random shuffling of training data after every epoch 315

regularization 330

root mean squared propagation (RMSProp) 327328

running the training loop num epochs times, code listing 319

stochastic gradient descent (SGD) 315316

viewing regularization as minimizing descriptor length 332

output variables 242

output vector 303

P

parameterized function, threat score 4

partial derivatives, definition of 94

perceptrons 254269

classification and 254

code listing for 256

Cybenko’s universal approximation theorem 261262

deep neural networks, definition of 260

definition of 254

generating 2D steps and waves with perceptrons 264267

generating a 1D tower with perceptrons 262264

hidden layers 260

introduction to modeling common logic gates with perceptrons 256

layering for organizing perceptrons into a neural network 260

MLP for a logical XOR function 259260

MLPs for polygonal decision boundaries 268269

modeling logical gates, code listing 258

multilayer perceptrons (MLPs) 259269

multiple perceptrons 256

partitioning with a planar decision surface

perceptron for a logical AND function 257

perceptron for a logical NOT function 258

perceptron for a logical OR function 258

perceptrons and MLPs in 1D, code listing 267

perceptrons and MLPs in 2D, code listing 267268

truth table for two-variable logical functions 261

pixel, definition of 83

pooling 381383

prediction vector 303

prefix coding 200

principal component analysis (PCA) 118, 123130

applying PCA on correlated and uncorrelated datasets 128

calculating the direction of maximum spread 125127

dimensionality reduction via PCA 127128

introduction to 123125

limitations of PCA 129130

linear latent space manifolds and PCA 478

PCA and data compression 130

PCA computation, code listing 128129

PCA on synthetic correlated data, code listing 129

PCA on synthetic nonlinearly correlated data, code listing 130

PCA on synthetic uncorrelated data 129

as a special case of latent space representation 471

use in JPEG 98 image compression techniques 130

probability density function (PDF) 152, 198

probability distributions

continuous random variable 152

definition of 153

discrete random variable 151

emphasizing the geometrical view of multivariate statistics 150

example graph for the weights of adults in Statsville 154

fitting probability distributions to specific groups of people 150

frequentist paradigm 151

loosely structured point distributions in high-dimensional spaces 149

probabilities as always less than or equal to 1 151

probability density 152

PyTorch distributions package 150, 162

random variable, definition of 151

semantic segmentation 150

using histograms to visualize discrete random variables 152153

using probabilistic models in unsupervised and minimally supervised learning 150

using uppercase letters to denote random variables 152

variational auto encoders (VAEs) 150

See also probability theory

probability theory

asymptotically approaching zero 174

basic concepts of 154155

bell-shaped curve 174

Bernoulli distribution 188189

binomial distribution 180184

Cartesian product 157

categorical distribution and one-hot vectors 189190

centroid, definition of 163

chart of a univariate Gaussian random probability density function 177

chart of bivariate uniform random probability density function 173

conditional probability 196

continuous random variables and probability density 160162

covariance as the multivariate analog of variance 165167

covariance of a multivariate Gaussian distribution 178180

dependent events and their joint probability distribution 157159

dependent vs. independent variables 196

entropy 198212

entropy, definition of 200201

exhaustive and mutually exclusive events 154155

expected value (mean) 162164

expected value of a Bernoulli distribution 188189

expected value of a binomial distribution 184185

expected value of a categorical distribution 190191

expected value of a function of a random variable 163

expected value of a Gaussian distribution 176177

expected value of a linear combination of random variables 164

expected value of a uniform distribution 171

Gaussian (normal) distribution 173180

Gaussian probability density function 174

geometry of sampled point clouds 180

graphical visualization of joint probability distributions 160

independent events 155

joint and marginal probability 194196

joint probabilities and their distributions 157

Kullback Leibler divergence (KLD) 207210

log probability of a Bernoulli distribution, code listing 188

log probability of a binomial distribution, code listing 183184

log probability of a multinomial distribution, code listing 186

log probability of a univariate normal distribution, code listing 175

log probability of a univariate uniform random distribution, code listing 171

marginal probabilities 157

marginal probability for a variable 195

mean and variance of a Bernoulli distribution, code listing 189

mean and variance of a multinomial distribution, code listing 187188

mean and variance of a multivariate normal distribution, code listing 179

mean and variance of a uniform random distribution, code listing 172

mean and variance of a univariate Gaussian, code listing 178

multidimensional integral 161162

multinomial distribution 185187

multivariate Gaussian 175176

multivariate Gaussian point clouds and hyper-ellipses 180

outlier values 174

probabilities of impossible and certain events 154

probability density function (PDF) 16, 198

probability of a categorical distribution 190

product rule, definition of 155

properties of distributions 162167

sample point distributions for dependent and independent variables 159160

sampling from a distribution 167169

sum rule 195

uniform distributions as multivariate 173

uniform random distributions 170171

variance and expected value 167

variance of a Bernoulli distribution 189

variance of a binomial distribution 185

variance of a uniform distribution 172

variance, covariance, and standard deviation 164165

See also probability distributions

product rule, definition of 155

Python code 1819

applying PCA on correlated and uncorrelated datasets 128

computing LSA 145146

computing LSA and SVD on a large dataset 146147

computing PCA directly using SVD 139

dot product of two vectors 40

eigenvalues and eigenvectors of a rotation matrix 72

examining linear models 101105

examining nonlinear models 105107

finding the axes of a hyperellipse 7980

introducing matrices via PyTorch 2526

matrix diagonalization 74

matrix vector multiplication 4041

orthogonality of rotation matrices 7172

PCA computation, code listing 128129

PCA on synthetic correlated data, code listing 129

PCA on synthetic nonlinearly correlated data, code listing 130

PCA on synthetic uncorrelated data 129

performing a matrix transpose 3940

slicing and dicing matrices 26

solving linear systems via diagonalization 76

solving linear systems with SVD 137139

spectral decomposition of a symmetric matrix 7778

tensors and images in PyTorch 26

training a linear model for the cat brain 108

transpose of a matrix product 42

using for vector manipulations 22

PyTorch

creating a custom PyTorch data set, code listing 317

inferencing a model, PyTorch Trainer code listing 411

introducing matrices via PyTorch 2526

introducing vectors via PyTorch 23

MNIST data module, PyTorch DataModule code listing 407408

performing min-max normalization in PyTorch, code listing 296

PyTorch code for solving linear systems with SVD 137139

PyTorch distributions package 150, 162

tensors and images in PyTorch 26

training a neural network in PyTorch 295298

using PyTorch code for vector manipulations 22

PyTorch Autograd 103

PyTorch DataLoader 316

PyTorch Lightning 406411

DataModule component 407408

implementing LeNet as a PyTorch Lightning module, code listing 408410

inferencing a model, PyTorch Trainer code listing 411

LightningModule component 410

MNIST data module, PyTorch DataModule code listing 407408

Trainer component 410411

Q

quadratic forms, minimizing 118121

quantitative estimation 23

quantitative inputs 4

R

random variable 151153, 160162

rasterized vector, creating 84

reconstruction loss, definition of 478

rectified linear unit (ReLU) 392394

regression loss, definition of 303, 423

regressors, definition of 2, 12

regularization 330

representation learning, definition of 478

ResNet architecture 401406

components of the core architecture 403

examining how to solve the degradation problem 401403

identity shortcut connection 402

implementing a basic skip connection block (BasicBlock) 403404

PyTorch Lightning 406411

ResidualConvBlock, code listing 405

ResNet-34, code listing 405406

root mean squared propagation (RMSProp) 327328

S

semantic segmentation 150

sigmoid function 14, 273276

singular value decomposition (SVD) 130140

applying SVD by solving arbitrary linear systems 135136

applying SVD to find the best low-rank approximation of a matrix 139140

applying SVD via PCA computation 135

computing PCA directly using SVD 139

full-rank matrices, definition of 137

linear system as degenerate 137

PyTorch code for solving linear systems with SVD 137139

rank of a matrix, definition of 137

SVD theorem 131134

softmax function 306314

spectral norms, definition of 122

standard deviation 164165

stochastic gradient descent (SGD) 315, 317

stochastic mapping, definition of 484

supervised learning 4, 194

supervised neural networks 273

supervised training data 241

T

tanh function 275276

target functions 240242

target output 4, 241

Taylor series 100101, 112113

tensors 2526

term frequency (TF) 141

threat score 4, 7

thresholding 10, 12

torchvision package 397

total training loss, definition of 301

training data 46, 86, 272, 315

transposed convolution. See convolution, transposed

tunable hyperparameter 285

U

unit vector 36

unsupervised machine learning 193194

V

vanishing gradient problem 394

variable bit-rate coding 200

variance

Bayesian estimation with unknown mean, known variance, code listing 452453

Bayesian estimation with unknown mean, unknown variance, code listing 459

Bayesian estimation with unknownvariance, known mean, code listing

computing the variance of a Gaussian distribution 499501

covariance as the multivariate analog of variance 165167

mean and variance of a Bernoulli distribution, code listing 189

mean and variance of a multinomial distribution, code listing 187188

mean and variance of a multivariate normal distribution, code listing 179

mean and variance of a univariate Gaussian, code listing 178

variance and expected value 167

variance of a Bernoulli distribution 189

variance of a binomial distribution 185

variance of a uniform distribution 171172

variance, covariance, and standard deviation 164165

See also covariance

variational autoencoders (VAEs) 150, 241, 483495

autoencoders vs. VAEs 494

comparing autoencoder-and VAE-reconstructed images on the MNIST data set 494

computing the reconstruction loss and KL divergence loss 485

differences between the learned latent spaces of the autoencoder and VAE 495

evidence lower bound (ELBO) 488490

examples of high and low KL divergence loss 486

geometric overview of 483484

KLD loss as regularizing the latent space 486

minimizing reconstruction loss as leading to ELBO maximization 490

physical significance of ELBO maximization 489

reparameterization trick, code listing 492

stochastic mapping as leading to latent-space smoothness 487

stochastic mapping, definition of 484

VAE decoder, code listing 493

VAE loss, code listing 494

VAE training, code listing 494

VAE training, losses, and inferencing 485486

VAE, code listing 493

VAEs and Bayes’ theorem 487

zero-mean, unit-covariance Gaussian for the known prior 490492

See also autoencoders

vectors

basic vector and matrix operations in machine learning 2628

basis vectors 48

creating feature vectors that describe a document 20

defining the span of a set of vectors 4748

definition of 19

describing a point’s position in a coordinate system 21

document feature vectors 141

dot product of two vectors 2930

feature vector, definition of 19

geometric intuitions for vector length 36

geometric view of 21

as inputs to a machine learning system 84

introduction to vectors via PyTorch 23

linear transforms 4951

mapping input points to output points in a high-dimensional space 21

matrix and vector transpose 2829

minimal and complete basis 4849

orthogonality of vectors 39

representing both inputs and outputs 19

representing the parameters of the model function 19

role of in machine learning 1921

unit vector 36

vector spaces 4849

VGG (Visual Geometry Group) Net 391397

common structural elements of the VGG family of networks 391392

convolutional backbone, code listing 395396

graph of a 1D sigmoid function and its derivative 394

instantiating a VGG network from a specific config 397

rectified linear unit (ReLU) 392394

removal of the local response normalization (LRN) layers 391

single convolutional block, code listing 395

torchvision package 397

use of smaller (3x3) convolution filters 391

vanishing gradient problem 394

VGG network, code listing 396

VGG-11 architecture diagram 393

W

weights 4, 6, 241

Wishart distribution 454, 463464

← Previous Section 22 of 22 Next →