Deep Learning - University of The Punjab

The ability of biological brains to sense, perceive, analyse and recognise patterns can only be described as stunning. Furthermore, they have the ability to learn from new examples. Mankind's understanding of how biological brains operate exactly is embarrassingly limited. However, there do exist numerous 'practical' techniques that give machines the 'appearance' of being intelligent. This is the domain of statistical pattern recognition and machine learning. Instead of attempting to mimic the complex workings of a biological brain, this course aims at explaining mathematically well-founded techniques for analysing patterns and learning from them.

Artificial Neural Networks as extremely simplified models of the human brain have existed for almost 75 years. However, the last 25 years have seen a tremendous unlocking of their potential. This progress has been a direct result of a collection of network architectures and training techniques that have come to be known as Deep Learning. As a result, Deep Learning has taken over its parent fields of Neural Networks, Machine Learning and Artificial Intelligence. Deep Learning is quickly becoming must-have knowledge in many academic disciplines as well as in the industry.

This course is a mathematically involved introduction into the wonderful world of deep learning. It will prepare students for further study/research in the areas of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis, Natural Language Processing, Speech Recognition, Machine Translation, Autonomous Driving and other areas attempting to solve Artificial Intelligence (AI) type problems.

Lectures

#	Topics	Slides	Videos	Recitations	Readings	Miscellaneous
1	Course Details Introduction to Machine Learning Introduction to Neural Computations	Introduction to Deep Learning Introduction to Neural Computations	Video		"Brain: inner workings of the grey matter", Richard Walker "The human brain in numbers: a linearly scaled-up primate brain", Herculano-Houzel. "The Brain Is Both Neurocomputer and Quantum Computer", Stuart R. Hameroff
2	Mathematical Modelling of Neural Computations McCulloch & Pitts Neurons Hebbian Learning Rosenblatt's Perceptron XOR Problem Multilayer Perceptrons	History of Neural Computation	Video	Recitation 1 Working with Jupyter Notebooks and Google Colab Python basics NumPy basics Recitation1.zip	"A logical calculus of the ideas immanent in nervous activity.", McCulloch and Pitts "The organization of behavior; a neuropsychological theory.", Hebb "The perceptron: a probabilistic model for information storage and organization in the brain.", Rosenblatt	Quiz 1
3	Universal Approximation Theorem for Multilayer Perceptrons For Boolean functions For classification boundaries For continuous functions	MLPs and Universal Approximation Theorem	Video1 Video2		Approximation by Superpositions of a Sigmoidal Function, Cybenko Approximation Capabilities of Muitilayer Feedforward Networks, Hornik
4	Training a perceptron Minimization Gradient Descent Perceptron learning rule	Perceptron Training	Video 1 Video 2	Recitation 2 File handling in Python Matplotlib Matrix and Vector Calculus Video 1 Video 2 Recitation2.zip		Quiz 2
5	Loss Functions and Activation Functions Loss Functions for Regression Univariate Multivariate Loss Functions for Classification Binary Multiclass Activation Functions Linear Logistic Sigmoid Softmax	Loss Functions and Activation Functions for Machine Learning	Video
6	Training Neural Networks Forward Propagation Backward Propagation	Training Neural Networks: Forward and Backward Propagation	Video	Recitation 3 Training a perceptron Discussion of Assignment 1 Video Recitation3.zip		Quiz 3
7	Backpropagation and Vanishing Gradients Numerical derivative check Efficiency of backpropagation Vanishing gradient problem Activation functions for Deep Learning Tanh ReLU Leaky ReLU ELU	Backpropagation and Vanishing Gradients	Video		Efficient BackProp, in Neural Networks: Tricks of the Trade, Yann LeCun et al. (1998)	Assignment 1: Backpropagation for MLPs.
8	Gradient Descent Variations - I Problems with vanilla gradient descent First-order methods Resilient Propagation (Rprop) Second-order methods Taylor series approximation Newton's Method for finding stationary points Quickprop	Variations of Gradient Descent	Video	Recitation 4 Pytorch basics Neural Network for binary classification in Pytorch Video (Old) Recitation4.zip	A direct adaptive method for faster backpropagation learning: The RPROP algorithm, Riedmiller and Braun, ICNN (1993): 586-591 vol.1. An Empirical Study of Learning Speed in Back-Propagation Networks, Scott E. Fahlman. September 1988. CMU-CS-88-162.	Quiz 4 Assignment 2: Multilclass Classification
9	Gradient Descent Variations - II Momentum-based first-order methods Momentum Nesterov Accelerated Gradient RMSprop ADAM	Momentum-based Gradient Descent	Video		ADAM: A Method for Stochastic Optimization, Kingma, Diederik P., and Jimmy Ba., ICLR (2015)
10	Automatic Differentiation Analytic vs Automatic Differentiation Linear Regression via Automatic Differentiation Logistic Regression via Automatic Differentiation	Automatic Differentiation Notes	Video			Quiz 5 Assignment 3: Initialization and ADAM
11	Regularization - I Primer on ML Capabilities of polynomials Everything contains noise Overfitting vs Generalisation Regularization Methods Weights Penalty Early Stopping Data Augmentation Label Smmoothing	Regularization	Video	Recitation 5 Convolution Video Recitation5.zip
12	Regularization - II Dropout BatchNorm	Dropout and BatchNorm	Video		Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava, Nitish, et al., Journal of Machine Learning Research 15 (2014): 1929-1958. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Ioffe and Szegedy, ICML (2015) How does batch normalization help optimization?, Santurkar et al., NIPS (2018)	Quiz 6
13	Convolutional Neural Networks Convolution Neurons as detectors Pooling Forward Propagation Covariance of CNNs	Convolutional Neural Networks	Video	Recitation 6 Applications of DL in Computer Vision CNN in PyTorch Video Recitation6.zip
14	Variations of Convolutional Neural Networks - I 1x1 Convolutions Depthwise Separable Convolutions Transposed Convolutions	Variations of Convolutional Neural Networks	Video		Network in network, Lin, Min, Qiang Chen, and Shuicheng Yan, arXiv:1312.4400 (2013) Xception: Deep Learning with Depthwise Separable Convolutions, Francois Chollet, CVPR (2017)
15	Variations of Convolutional Neural Networks - II Unpooling Fully Convolutional Networks ResNet	Variations of Convolutional Neural Networks	Video	Recitation 7 CNN Architectures and Transfer Learning Video Recitation7.zip	Fully Convolutional Networks for Semantic Segmentation, Jonathan Long, Evan Shelhamer, Trevor Darrell, CVPR (2015) Deep residual learning for image recognition, Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, CVPR (2016)
16	Recurrent Neural Networks (RNN) Static vs. Dynamic Inputs Temporal, sequential and time-series data Folding in space Folding in time Unfolding in time Forward propagation in RNN	Recurrent Neural Networks	Video
17	RNN variants, benefits and stability Bidirectional RNN Some problems are inherently recurrent Exploding gradients	RNN variants, benefits and stability	Video
18	Long Short-Term Memory (LSTM) RNN cell and its weakness Building blocks of the LSTM cell The LSTM cell How does the LSTM cell remember the past? Variants Peephole connections Coupled forget and input gates Gated Recurrent Unit (GRU)	Long Short-Term Memory	Video
19	Language Modelling Modelling input text as numeric vectors Text generation Language translation Beam Search	Language Modelling	Video
20	Attention Attention-based decoder for Language translation Image captioning Handwritten text recognition	Attention	Video
21	Transformers Encoding with attention Self-attention Residual connection Layer-norm Parallelism by removing recurrence Multiheaded self-attention Positional encoding Self-attention based Decoder Encoder-decoder-attention	Transformers			Attention Is All You Need, Ashish Vaswani et al., NIPS (2017)
22	Transformers - II Self-attention based Decoder Encoder-decoder-attention	Transformers		No Recitation
23	Generative Adversrial Networks (GANs) Generative vs. Discriminative Models Adversarial Learning Applications GAN Training Objective Functions Training Procedure Stability and Mode-Collapse Tips & Tricks	Generative Adversarial Networks	Video
24	Graph Neural Networks (GNNs) - I Euclidean vs. Non-Euclidean Domains Permutation Invariance Permutation Equivariance Learning on Sets Learning on Graphs	Graph Neural Networks	Video	Friday, April 22: Recitation 11 Fake image generation using GAN and conditional GAN Video Recitation11.zip
25	Graph Neural Networks (GNNs) - II GNN Layers Convolutional (GCN) Attention (GAT) Message Passing (MPN) Multilayer GNN Example: 3-layer vanilla GNN Node Prediction Graph Prediction	Graph Neural Networks	Video
26	Deep Q-Learning	Deep Q-Learning
27	Conclusion What was covered? What were the general principles? What was not covered?	Conclusion	Video
	Final Exam

Deep Learning (CS-563)

Department of Computer Science
University of the Punjab

Lectures

Grading