CS568 Deep Learning
Spring 2023
Nazar Khan

The ability of biological brains to sense, perceive, analyse and recognise patterns can only be described as stunning. Furthermore, they have the ability to learn from new examples. Mankind's understanding of how biological brains operate exactly is embarrassingly limited. However, there do exist numerous 'practical' techniques that give machines the 'appearance' of being intelligent. This is the domain of statistical pattern recognition and machine learning. Instead of attempting to mimic the complex workings of a biological brain, this course aims at explaining mathematically well-founded techniques for analysing patterns and learning from them.

Artificial Neural Networks as extremely simplified models of the human brain have existed for almost 75 years. However, the last 25 years have seen a tremendous unlocking of their potential. This progress has been a direct result of a collection of network architectures and training techniques that have come to be known as Deep Learning. As a result, Deep Learning has taken over its parent fields of Neural Networks, Machine Learning and Artificial Intelligence. Deep Learning is quickly becoming must-have knowledge in many academic disciplines as well as in the industry.

This course is a mathematically involved introduction into the wonderful world of deep learning. It will prepare students for further study/research in the areas of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis, Natural Language Processing, Speech Recognition, Machine Translation, Autonomous Driving and other areas attempting to solve Artificial Intelligence (AI) type problems.

CS 568 is a graduate course worth 3 credit hours.

Lectures: Tuesday and Thursday, 10:00 a.m. - 11:30 a.m. in Room B4.
Office Hours: Tuesday and Thursday, 11:30 a.m. - 12:00 p.m. in Visiting Faculty Office.
Google Classroom: https://classroom.google.com/c/NjA4MjQyNjgzNTkx?cjc=d3paa5d
Online Quiz: Friday, 8:30 a.m via Google Classroom.
Recitations: Friday, 8:40 a.m. - 10:10 a.m

Prerequisites

Python
Basic Calculus (Differentiation, Partial derivatives, Chain rule)
Linear Algebra (Vectors, Matrices, Dot-product, Orthogonality, Eigenvectors)
Basic Probability (Bernoulli, Binomial, Gaussian, Discrete, Continuous)

Books and Other Resources

No single book will be followed as the primary text. Helpful online and offline resources include:

Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2017. Available online
Pattern Recognition and Machine Learning by Christopher Bishop, 2006
Neural Networks and Deep Learning by Michael Nielsen, 2016. Available online
Deep Learning with Python by J. Brownlee
Deep Learning with Python by Francois Chollet

Grades

Grading sheet (Accessible only through your PUCIT email account)

Lectures

#	Date	Topics	Slides	Videos	Recitations	Readings	Miscellaneous
1	May 8	Course Details Introduction to Machine Learning Introduction to Neural Computations	Introduction to Deep Learning Introduction to Neural Computations	Video		"Brain: inner workings of the grey matter", Richard Walker "The human brain in numbers: a linearly scaled-up primate brain", Herculano-Houzel. "The Brain Is Both Neurocomputer and Quantum Computer", Stuart R. Hameroff
2	June 5	Mathematical Modelling of Neural Computations McCulloch & Pitts Neurons Hebbian Learning Rosenblatt's Perceptron XOR Problem Multilayer Perceptrons	History of Neural Computation	Video	Friday, May 19: Recitation 1 Working with Jupyter Notebooks and Google Colab Python basics NumPy basics Recitation1.zip	"A logical calculus of the ideas immanent in nervous activity.", McCulloch and Pitts "The organization of behavior; a neuropsychological theory.", Hebb "The perceptron: a probabilistic model for information storage and organization in the brain.", Rosenblatt	Quiz 1
3	June 7	Universal Approximation Theorem for Multilayer Perceptrons For Boolean functions For classification boundaries For continuous functions	MLPs and Universal Approximation Theorem	Video1 Video2		Approximation by Superpositions of a Sigmoidal Function, Cybenko Approximation Capabilities of Muitilayer Feedforward Networks, Hornik
4	June 13	Training a perceptron Minimization Gradient Descent Perceptron learning rule	Perceptron Training	Video 1 Video 2	Friday, May 26: Recitation 2 File handling in Python Matplotlib Matrix and Vector Calculus Video 1 Video 2 Recitation2.zip		Quiz 2
5	June 15	Loss Functions and Activation Functions Loss Functions for Regression Univariate Multivariate Loss Functions for Classification Binary Multiclass Activation Functions Linear Logistic Sigmoid Softmax	Loss Functions and Activation Functions for Machine Learning	Video
6	June 20	Training Neural Networks Forward Propagation Backward Propagation	Training Neural Networks: Forward and Backward Propagation	Video	Friday, June 2: Recitation 3 Training a perceptron Discussion of Assignment 1 Video Recitation3.zip		Quiz 3
7	June 22	Backpropagation and Vanishing Gradients Numerical derivative check Efficiency of backpropagation Vanishing gradient problem Activation functions for Deep Learning Tanh ReLU Leaky ReLU ELU	Backpropagation and Vanishing Gradients	Video		Efficient BackProp, in Neural Networks: Tricks of the Trade, Yann LeCun et al. (1998)
8	June 27	Gradient Descent Variations - I Problems with vanilla gradient descent First-order methods Resilient Propagation (Rprop) Second-order methods Taylor series approximation Newton's Method for finding stationary points Quickprop	Variations of Gradient Descent	Video	Friday, June 9: Recitation 4 Pytorch basics Neural Network for binary classification in Pytorch Video (Old) Recitation4.zip	A direct adaptive method for faster backpropagation learning: The RPROP algorithm, Riedmiller and Braun, ICNN (1993): 586-591 vol.1. An Empirical Study of Learning Speed in Back-Propagation Networks, Scott E. Fahlman. September 1988. CMU-CS-88-162.	Quiz 4 Assignment 1: Backpropagation for MLPs.
9	June 29	Gradient Descent Variations - II Momentum-based first-order methods Momentum Nesterov Accelerated Gradient RMSprop ADAM	Momentum-based Gradient Descent	Video		ADAM: A Method for Stochastic Optimization, Kingma, Diederik P., and Jimmy Ba., ICLR (2015)
	July 7	Mid-term Exam
	Jul 10 till Sep 10	Summer Break
10	Sep 12	Automatic Differentiation Analytic vs Automatic Differentiation Linear Regression via Automatic Differentiation Logistic Regression via Automatic Differentiation	Automatic Differentiation Notes	Video
11	Sep 14	Regularization - I Primer on ML Capabilities of polynomials Everything contains noise Overfitting vs Generalisation Regularization Methods Weights Penalty Early Stopping Data Augmentation Label Smmoothing	Regularization	Video	Friday, Sep 15: Recitation 5 Convolution Video Recitation5.zip
12	Sep 19	Regularization - II Dropout BatchNorm	Dropout and BatchNorm	Video		Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava, Nitish, et al., Journal of Machine Learning Research 15 (2014): 1929-1958. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Ioffe and Szegedy, ICML (2015) How does batch normalization help optimization?, Santurkar et al., NIPS (2018)
13	Sep 21	Convolutional Neural Networks Convolution Neurons as detectors Pooling Forward Propagation Covariance of CNNs	Convolutional Neural Networks	Video	Friday, Sep 22: Recitation 6 Applications of DL in Computer Vision CNN in PyTorch Video Recitation6.zip
14	Sep 26	Variations of Convolutional Neural Networks - I 1x1 Convolutions Depthwise Separable Convolutions Transposed Convolutions	Variations of Convolutional Neural Networks	Video		Network in network, Lin, Min, Qiang Chen, and Shuicheng Yan, arXiv:1312.4400 (2013) Xception: Deep Learning with Depthwise Separable Convolutions, Francois Chollet, CVPR (2017)
15	Sep 28	Variations of Convolutional Neural Networks - II Unpooling Fully Convolutional Networks ResNet	Variations of Convolutional Neural Networks	Video	Friday, Sep 29: Recitation 7 CNN Architectures and Transfer Learning Video Recitation7.zip	Fully Convolutional Networks for Semantic Segmentation, Jonathan Long, Evan Shelhamer, Trevor Darrell, CVPR (2015) Deep residual learning for image recognition, Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, CVPR (2016)
16	Oct 3	Recurrent Neural Networks (RNN) Static vs. Dynamic Inputs Temporal, sequential and time-series data Folding in space Folding in time Unfolding in time Forward propagation in RNN	Recurrent Neural Networks	Video
17	Oct 5	RNN variants, benefits and stability Bidirectional RNN Some problems are inherently recurrent Exploding gradients	RNN variants, benefits and stability	Video
18	Oct 10	Long Short-Term Memory (LSTM) RNN cell and its weakness Building blocks of the LSTM cell The LSTM cell How does the LSTM cell remember the past? Variants Peephole connections Coupled forget and input gates Gated Recurrent Unit (GRU)	Long Short-Term Memory	Video
19	Oct 12	Language Modelling Modelling input text as numeric vectors Text generation Language translation Beam Search	Language Modelling	Video
20	Oct 17	Attention Attention-based decoder for Language translation Image captioning Handwritten text recognition	Attention	Video
21	Oct 19	Transformers Encoding with attention Self-attention Residual connection Layer-norm Parallelism by removing recurrence Multiheaded self-attention Positional encoding Self-attention based Decoder Encoder-decoder-attention	Transformers			Attention Is All You Need, Ashish Vaswani et al., NIPS (2017)