front matter
foreword
As a lifelong student of the business of technological innovation, I have often wondered: what sets apart an expert from regular practitioners in any area of technology? An expert tends to have many micro-insights into the subject that often elude the ordinary practitioner. This enables them to come up with solutions that are not visible to others. The primary appeal of this book is to generate that kind of micro-intuitions into the complex subject of machine learning. For all their ubiquitousness, episodic internet recipes do not build such intuitions in a systematic, connected way. This book does.
I also agree with the author’s position that such intuitions are impossible to build without a firm grasp of the mathematical understanding of the core principles of machine learning. Of course, all this has to be combined with programming knowledge, without which it becomes idle theory. I like the way this book attends to both theory and practice of machine learning by presenting the mathematics alongside PyTorch code snippets.
At present, deep learning is indeed shaping human history. Machine learning and data science jobs are consistently rated as the best. If you are looking for a rewarding career in technology, this may be the area for you. And if you are looking for a book that gives you expert-level understanding but only assumes fairly basic knowledge of mathematics and programming, this is your book. With its joint, side-by-side treatment of math and PyTorch programming, it is perfect for professionals who want to become serious practitioners of the art and science of machine learning. Machine learning lies at the confluence of linear algebra, multivariate statistics, and Python programming, and this book combines them into a single coherent narrative—starting from the basics but rapidly moving into advanced topics.
A particularly delightful aspect of the book is how it creates geometric intuitions behind complex mathematical concepts. Symbols may be forgotten, but the picture remains in the head.
—Prith Banerjee, Chief Technology Officer ANSYS, Inc., ex Senior Vice President of Research and Director, HP Labs, formerly Professor and Director of Computational Science and Engineering, University of Illinois at Urbana-Champaign
preface
Artificial intelligence (machine learning or deep learning to insiders) is quite the rage at this point of time. Media is full of eager and/or paranoid predictions about a world governed by this new technology and quite justifiably so. It’s a knowledge revolution happening in front of our very eyes.
Working on computer vision and image processing problems for decades for my PhD, then at Adobe Systems, then at Google, and then at Drishti Technologies (the Silicon Valley start-up that I co-founded), I have been at the bleeding edge of this revolution for a long time. I’ve seen not only what works, but also—perhaps more importantly—what does not work and what almost works. This gives me a unique perspective. Often when trying to solve practical problems, none of the textbook theories will work directly. We must mix various ideas to create a winning concoction. This requires a feel for what works and why and what doesn’t work and why. Itis this feel, this understanding of the inner workings of the machine/deep learning theory, along with the insights and intuitions that I hope to transmit to myreaders.
This brings me to another point. Because of the popularity of the subject, a large volume of “deep-learning-made-easy”-type material exists in print and/or online. These articles don’t do justice to the subject. My reaction to them is “everything should be made as simple as possible, but not simpler.” Deep learning can’t be learned by going through a small fragmented set of simplified recipes from which all math has been scrubbed out. This is a mathematical topic and mastery requires understanding the math along with the programming. What is needed is a resource which presents this topic with the requisite amount of math—no more and no less—with the connection between the deep learning and math explicitly spelled out. This is exactly what this book strives to provide with its dual presentation of the math and corresponding PyTorch code snippets.
acknowledgments
The authors would collectively like to thank all their colleagues at Drishti Technologies, especially Etienne Dejoie and Soumya Dipta Biswas, who actively engaged in many lively discussions of the topics covered in the book; Pinakpani Mukherjee, who created some of the early diagrams; and all the MEAP reviewers whose anonymous contributions made the book possible. They would also like to thank the Manning team for their professionalism and competence, in particular Tiffany Taylor for her sharp and deep reviews.
To all the reviewers: Al Krinker, Atul Saurav, Bobby Filar, Chris Giblin, Ekkehard Schnoor, Erik Hansson, Gaurav Bhardwaj, Grigory Sapunov, Ian Graves, James J. Byleckie, Jeff Neumann, Jehad Nasser, Juan Jose Rubio Guillamon, Julien Pohie, Kevin Cheung, Krzysztof Kamyczek, Lucian Mircea Sasu, Matthias Busch, Mike Wall, Mortaza Doulaty, Morteza Kiadi, Nelson González, Nicole Königstein, Ninoslav $\check{\rm C}$erkez, Obiamaka Agbaneje, Pejvak Moghimi, Peter Morgan, Rauhsan Jha, Sean T. Booker, Sebastián Palma Mardones, Stefano Ongarello, Tony Holdroyd, Vishwesh Ravi Shrimali, and Wiebe de Jong, your suggestions helped make this a better book.
From Krish Chaudhury: First and foremost, I would like to thank my family:
-
Devyani (my wife), for covering my back for all these years despite an abundance of reasons not to, and for teaching me the value of pursuing excellence in whatever I do.
-
Anwesa (my daughter), who fills my life with indescribable joy with her love, positive attitude, and empathy.
-
Gouri (my mother), for her unquestioning faith in me.
-
(Late) Dr. Sujit Chaudhury (my father), for teaching me the value of insights, sincerity, and a life of letters as a goal in itself.
-
I would also like to thank Dr. Vineet Gupta (my former colleague from Google) and Dr. Srayanta Mukherjee (my former colleague from Flipkart), for their valuable comments and encouragement.
From Ananya Honnedevasthana Ashok: Writing this book has been much harder than I initially expected. It has been a massive learning experience that wouldn’t have been possible without the unwavering support of my family. In particular, I’d like to thank:
-
Dr. Ashok (my father), for being a perennial role model and always being there for me.
-
Jayanthi (my mother), for her unequivocal belief in me.
-
Susheela (my grandmother), for her unconditional love despite chiding me for spending long hours on the book during weekends.
-
I would also like to thank all my teachers, especially Dr. Viraj Kumar and Prof. N.S. Kumar, for inspiring and indoctrinating a love of learning within me.
From Sujay Narumanchi: This book has been a labor of love, requiring more effort than I anticipated but giving me a truly fulfilling learning experience that I will forever cherish. My family and friends have been my pillars of strength throughout this journey. I’d like to thank:
-
Sivakumar (my father), for always believing in me and encouraging me to pursue my dreams.
-
Vinitha (my mother), for being my rock and providing unwavering support throughout my life.
-
Prabhu (my brother), for being a constant source of fun and wisdom.
-
(Late) Ramachandran (my grandfather), for instilling in me a love of mathematics and teaching me the value of learning from first principles.
-
My friends Ambika, Anoop, Bharat, Neel, Pranav, and Sanjana, for providing a listening ear and a shoulder to lean on.
From Devashish Shankar: I would like to begin by thanking my parents, Dr. Shiv Shanker and Dr. Sadhana Shanker, for their unwavering support, love, and guidance. Additionally, I would like to honor the memory of my late grandfather, Dr. Ajai Shanker, who instilled in me a deep sense of curiosity and a passion for scientific thinking that has guided me throughout my life. I am also deeply grateful to my mentors and colleagues for their guidance and support.
about this book
Are you the type of person who wants to know why and how things work? Instead of feeling satisfied, even grateful, that a tool solves the problem at hand, do you try to understand what the tool is really doing, why it behaves a certain way, and whether it will work under different circumstances? If yes, you have our sympathy—life won’t be peaceful for you. You also have our best wishes—these pages are dedicated to you.
The internet abounds with prebuilt deep learning models and training systems that hardly require you to understand the underlying principles. But practical problems often do not fit any of the publicly available models. These situations call for the development of a custom model architecture. Developing such an architecture requires understanding the mathematical underpinnings of optimization and machine learning.
Deep learning and computer vision are very practical subjects, so these questions are relevant: “Is the math necessary? Shouldn’t we spend the time learning, say, the Python nuances of deep learning?” Well, yes and no. Programming skills (in particular, Python) are mandatory. But without an intuitive understanding of the mathematics, the how and why and the answer to “Can I repurpose this model?” will not be visible to you. Mathematics allows you to see the abstractions behind the implementation.
In many ways, the ability to form abstractions is the essence of higher intelligence. Abstraction enabled early humans to divine a digging and defending tool from what was merely a sharply pointed stone to other animals. The abstraction of the description of where something is with respect to another thing fixed in the environment (aka coordinate systems and vectors) has done wonders for human civilization. Mathematics is the language for abstractions: the most precise, succinct, and unambiguous known to humankind. Hence, mathematics is absolutely necessary as a tool to study deep learning. But we must remember that it is a tool—no more and no less. The ultimate purpose of all the math in the book is to bring out the intuitions and insights that are necessary to gain expertise in the complex world of machine learning.
Another equally important tool is the programming language—we have chosen PyTorch—without which all the wisdom cannot be put to practical use. This book connects the two pillars of machine learning—mathematics and programming—via numerous code snippets typically presented together with the math. The book is accompanied by fully functional code in the GitHub repository. We expect readers to work out the math with paper and pencil and then run the code on a computer to understand the results. This book is not bedtime reading.
Having (hopefully) made a case for studying the underlying mathematical principles of deep learning and computer vision, we hasten to add that mathematical rigor is not the goal of this book. Rather, the goal is to provide mathematical (in particular, geometrical) insights that make the subject more intuitive and less like black magic. At the same time, we provide Python coding exercises and visualization aids throughout. Thus, reading this book can be regarded as learning the mathematical foundations of deep learning via geometrical examples and Python exercises.
Mastery over the material presented in this book will enable you to
-
Understand state-of-the-art deep learning research papers. The book provides in-depth, intuitive explanations of some of today’s seminal papers.
-
Study and understand a deep learning code base.
-
Use code snippets from the book in your tasks.
-
Prepare for an interview for a role as a machine learning engineer/scientist.
-
Determine whether a real-life problem is amenable to machine/deep learning.
-
Troubleshoot neural network quality issues.
-
Identify the right neural network architecture to solve a real-life problem.
-
Quickly implement a prototype architecture and train a deep learning model for a real-life problem.
A word of caution: we often start with the basics but quickly go deeper. It’s important to read individual chapters from beginning to end, even if you’re familiar with the material presented at the start.
Finally, the ultimate justification for an intellectual endeavor is to have fun pursuing it. So, the authors will consider themselves successful if you enjoy reading this book.
Who should read this book?
This book is aimed toward the reader with a basic understanding of engineering mathematics and Python programming, with a serious intent to learn deep learning. For maximum benefit, the math should be worked out with paper and pencil and the PyTorch programs executed on a computer. Here are some possible reader profiles:
-
A person with a degree in engineering, science, or math, possibly acquired a while ago, who is considering a career switch to deep learning. No prior knowledge of machine learning or deep learning is required.
-
An entry- or mid-level machine learning practitioner who wants to gain deeper insights into the workings of various techniques and graduate from downloading models from the internet and trying them out to developing custom deep learning solutions for real problems, and/or develop the ability to read and understand research publications on the topic.
-
A college student embarking on a career of deep learning.
How this book is organized: A road map
This book consists of 14 chapters and an appendix. In general, all mathematical concepts are examined from a machine learning point of view. Geometric insights are brought out and PyTorch code is provided wherever appropriate.
-
Chapter 1 is an overview of machine learning and deep learning. Its purpose is to establish the big picture context in the reader’s mind and familiarize the reader with some machine learning concepts like input space, feature space, model training, architecture, loss, and so on.
-
Chapter 2 covers the core concepts of vectors and matrices which form the building blocks for machine learning. It introduces the notions of dot product, vector length, orthogonality, linear systems, eigenvalues and eigenvectors, Moore-Penrose pseudo inverse, matrix diagonalization, spectral decomposition, and so on.
-
Chapter 3 provides an overview of vector calculus concepts needed for understanding deep learning. We introduce gradients, local approximation of multi-dimensional functions via Taylor expansion in arbitrary dimensional spaces, Hessian matrices, gradient descent, convexity, and the connection of all these with the idea of loss minimization in machine learning. This chapter provides the first taste of PyTorch model building.
-
Chapter 4 introduces principal component analysis (PCA) and singular value decomposition (SVD)—key linear algebraic tools for machine learning. We provide end-to-end PyTorch implementation of a SVD-based document retrieval system.
-
Chapter 5 explains the basic concepts of probability distributions from a deep learning point of view. We look at the important properties of distributions like expected value, variance and covariance, and we also cover some of the most popular probability distributions like Gaussian, Bernoulli, binomial, multinomial, categorical, and so on. We also introduce the PyTorch distributions package.
-
Chapter 6 explores Bayesian tools for machine learning. We study the Bayes theorem, understand model parameter estimation techniques like maximum likelihood estimation (MLE) and maximum a posteriori (MAP) estimation. We also look at latent variables, regularization, MLE for Gaussian distributions, entropy, cross entropy, conditional entropy, and KL divergence. We finally look at Gaussian mixture models (GMMs) and how to model and estimate the parameters of a GMM.
-
Chapter 7 deep dives into neural networks. We study perceptrons, the basic building block of neural networks and how multilayered perceptrons can model arbitrary polygonal decision boundaries as well as common logic gate operations.This enables them to perform classification. We discuss Cybenko’s universalapproximation theorem.
-
Chapter 8 covers activation functions for neural networks, the importance and intuition behind layers. We look at forward propagation and backpropagation (with mathematical proofs) and implement a simple neural network with PyTorch. We study how to train a neural network end to end.
-
Chapter 9 provides an in-depth look into various loss functions which are crucial for effective learning of neural networks. We study the math and the intuitions behind popular loss functions like cross entropy loss, regression loss, focal loss, and so on, implementing them via PyTorch. We look at geometrical insights underlying various optimization techniques like SGD, Nesterov, Adagrad, Adam, and others. Additionally, we understand why regularization is important and its relationship with MLE and MAP.
-
Chapter 10 introduces convolutions, a core operator for computer vision models. We study 1D, 2D, and 3D convolution, as well as transposed convolutions and their intuitive interpretations. We also implement a simple convolutional neural network via PyTorch.
-
Chapter 11 introduces various neural network architectures for image classification and object detection in images. We look at several image classification architectures in detail like LeNet, VGG, Inception, and Resnet. We also provide an in-depth study of Faster R-CNN for object detection.
-
Chapter 12 explores the manifolds, the properties of manifolds like homeomorphism, Haussdorf property, and second countable property, and also how manifolds tie in with neural networks.
-
Chapter 13 provides an introduction to Bayesian parameter estimation. We look at injection of prior belief into parameter estimation and how it can be used in unsupervised/semi-supervised settings. Additionally, we understand conjugate priors and the estimation of Gaussian likelihood parameters under conditions of known/unknown mean and variances.
-
Chapter 14 explores latent spaces and generative modeling. We understand the geometric view of latent spaces and the benefits of latent space modeling. We take another look at PCA with this new lens, along with studying autoencoders and variational autoencoders. We study how variational autoencoders regularize the latent space and hence exhibit superior properties to autoencoders.
-
The appendix covers mathematical proofs and derivations for some of the mathematical properties introduced in the chapters.
About the code
This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes code is also in bold to highlight code that has changed from previous steps in the chapter, such as when a new feature adds to an existing line of code.
In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers (➥). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts.
You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/math-and-architectures-of-deep-learning. Fully functional code backing the theory discussed in the book can be found on GitHub at https://github.com/krishnonwork/mathematical-methods-in-deep-learning-ipython and from the Manning website at www.manning.com. The code is presented in the form of Jupyter notebooks (organized by chapter) that can be executed independently. The code is written in Python and uses the popular PyTorch library. Important code snippets are presented as code listings throughout the book, and key concepts are highlighted using code annotations. To get started with the code, clone the repository and follow the steps described in the README.
liveBook discussion forum
Purchase of Math and Architectures of Deep Learning includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to https://livebook.manning.com/book/math-and-architectures-of-deep-learning/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/discussion.
Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the authors some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website for as long as the book is in print.
about the authors
Krishnendu Chaudhury is the CTO and a co-founder of Drishti Technologies in Palo Alto, California, which applies AI to manufacturing. He has been a technology leader and inventor in the field of deep learning and computer vision for decades. Before starting Drishti, Krishnendu spent over 20 years at premier organizations, including Google (2004–2015) and Adobe Systems (1996–2004). He was with Flipkart as head of image sciences from 2015 to 2017. In 2017, he left Flipkart to start Drishti. Krishnendu earned his PhD in computer science from the University of Kentucky in Lexington. He has several dozen patents and publications in leading journals and global conferences to his credit.
Ananya Honnedevasthana Ashok, Sujay Narumanchi, and Devashish Shankar are practicing machine learning engineers with multiple patents in the deep learning and computer vision area. They are all members of the founding engineering team at Drishti.
about the cover illustration
The figure on the cover of Math and Architectures of Deep Learning is “Femme Wotyak,” or “Wotyak Woman,” taken from a collection by Jacques Grasset de Saint-Sauveur, published in 1797. Each illustration is finely drawn and colored by hand.
In those days, it was easy to identify where people lived and what their trade or station in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional culture centuries ago, brought back to life by pictures from collections such as this one.