CS568 Deep Learning
Spring 2020
Nazar Khan

The ability of biological brains to sense, perceive, analyse and recognise patterns can only be described as stunning. Furthermore, they have the ability to learn from new examples. Mankind's understanding of how biological brains operate exactly is embarrassingly limited. However, there do exist numerous 'practical' techniques that give machines the 'appearance' of being intelligent. This is the domain of statistical pattern recognition and machine learning. Instead of attempting to mimic the complex workings of a biological brain, this course aims at explaining mathematically well-founded techniques for analysing patterns and learning from them.

Artificial Neural Networks as extremely simplified models of the human brain have existed for almost 75 years. However, the last 25 years have seen a tremendous unlocking of their potential. This progress has been a direct result of a collection of network architectures and training techniques that have come to be known as Deep Learning. As a result, Deep Learning has taken over its parent fields of Neural Networks, Machine Learning and Artificial Intelligence. Deep Learning is quickly becoming must-have knowledge in many academic disciplines as well as in the industry.

This course is a mathematically involved introduction into the wonderful world of deep learning. It will prepare students for further study/research in the areas of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis, Natural Language Processing, Speech Recognition, Machine Translation, Autonomous Driving and other areas attempting to solve Artificial Intelligence (AI) type problems. 

CS 568 is a graduate course worth 3 credit hours.

TA: Arbish Akram
Lecture: Monday and Wednesday, 8:15 a.m. - 9:40 a.m. @ AKLT
Recitation: Friday,
8:15 a.m. - 9:40 a.m. @ AKLT
Office Hours: Wednesday, 2:00 p.m. - 3:00 p.m.

Prerequisites

  1. Python
  2. Basic Calculus (differentiation, chain rule)
  3. Linear Algebra
  4. Basic Probability

Books and Other Resources 

No single book will be followed as the primary text. Helpful online and offline resources include:

  1. Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2017. Available online
  2. Pattern Recognition and Machine Learning by Christopher Bishop, 2006
  3. Neural Networks and Deep Learning by Michael Nielsen, 2016. Available online
  4. Deep Learning with Python by J. Brownlee
  5. Deep Learning with Python by Francois Chollet

Grades

Grading sheet (Accessible only through your PUCIT email account)

Videos

YouTube Playlist

Lectures

Lecture

Date

Topics

Slides

Material

Readings

Assignments

1

February 10

  • Course Details
  • Introduction to Machine Learning

Introduction to Deep Learning



2

February 12

  • Introduction to Neural Computations
  • Mathematical Modelling of Neural Computations
    • McCulloch & Pitts Neurons
    • Hebbian Learning
    • Rosenblatt's Perceptron
  • XOR Problem
  • Multilayer Perceptrons
  • McCulloch and Pitts, ‘A logical calculus of the ideas immanent in nervous activity.’
  • Hebb, ‘The organization of behavior; a neuropsychological theory.’
  • Rosenblatt, ‘The perceptron: a probabilistic model for information storage and organization in the brain.’


3

February 17

  • Universal Approximation Theorem for Multilayer Perceptrons
    • For Boolean functions
    • For classification boundaries
    • For continuous functions

MLPs and Universal Approximation Theorem



4

February 19

  • Universal Approximation Theorem continued ...



5

February 24

  • Training a perceptron
    • Minimization
    • Gradient Descent
    • Perceptron learning rule

Perceptron Training



6

February 26

  • Neural Networks
    • Differentiable activation function: Logistic Sigmoid
    • Loss functions for regression and classification

Forward propagation

  • Neural Network from scratch in Python



7

March 2

  • Training a Neural Network
    • Multivariate chain rule of differentiation
    • Backpropagation
    • Numerical derivative check
    • Vanishing gradients problem
    • Better activation functions
      • Tanh
      • ReLU
      • Leaky ReLU
      • ELU

Backpropagation

  • LeCun, Yann et al. “Efficient BackProp.” Neural Networks: Tricks of the Trade (1998)
  • Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann machines." ICML. 2010.


-

March 4

PUCIT Mini-Olympics. No lecture. 





8

March 9

  • Backpropagation continued ...


Assignment 1: Backpropagation
Due before 4:00 pm on March 16, 2020

9

March 11

  • Automatic Differentiation
    • Computation Graph
    • Forward
    • Reverse

Automatic Differentiation



-

March 16

Corona break. No lecture. 

Video: Loss Functions for ML




-

March 18

Corona break. No lecture. 





10

March 23

  • Gradient Descent Variations
    • First-order methods
      • Rprop
    • Second-order methods
      • Quickprop
    • Momentum-based first-order methods
      • Momentum
      • Nesterov Accelerated Gradient
      • RMSprop
      • ADAM

Gradient Descent Variations

Video

Messy notes


  • Kingma, Diederik P., and Jimmy Ba. "Adam: A Method for Stochastic Optimization. 2015 ICLR." arXiv preprint arXiv:1412.6980 (2015).


11

March 25

  • Gradient Descent Variations continued ...

Video




12

March 30

  • Gradient Descent Variations continued ...

Video




13

April 1

  • Regularization

Regularization

Video

Messy notes


14

April 6

  • Regularization continued ...

Video

  • Srivastava, Nitish, et al. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." Journal of Machine Learning Research 15 (2014): 1929-1958.


15

April 8

  • Regularization continued ...

Video A

Video B

  • Ioffe, Sergey, and Christian Szegedy. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift." International Conference on Machine Learning. 2015.


16

April 13

  • Convolutional Neural Network (CNN)
    • Convolution
    • Shared Weights
    • Pooling

Convolutional Neural Network

Video

Messy notes



17

April 15

  • CNN continued ...
    • Backpropagation in CNNs

Video


Assignment 2: Initialization, ADAM, Regularization
Due before 8:00 am on April 24, 2020

18

April 20

  • CNN continued ...
    • Backpropagation in CNNs continued ...

Video



19

April 22

  • CNN Variations
    • 1x1 Convolution
    • Depthwise Separable Convolution
    • Transposed Convolution
    • Unpooling
    • Fully Convolutional Networks
    • Inception Modules
    • Residual Blocks

Video

Messy Notes



20

April 27

  • CNN Variations continued ...
    • Fully Convolutional Networks
    • Residual Blocks

Video


  • Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." CVPR. 2015.
  • Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR. 2015.
  • K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," CVPR. 2016.
  • Szegedy, Christian, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. "Inception-v4, inception-resnet and the impact of residual connections on learning." AAAI conference on artificial intelligence. 2017.

Assignment 3: CNN from scratch
Due before 4:00 pm on May 11, 2020

21

April 29

  • Recurrent Neural Network (RNN)
    • Static vs. Dynamic Inputs
    • Temporal, sequential and time-series data
    • Folding in space
    • Folding in time
    • Unfolding in time
    • Forward propagation in RNN
    • Derivative of a vector with respect to a matrix

Recurrent Neural Networks

Video

Messy Notes



22

May 4

  • Recurrent Neural Network (RNN) continued ...
    • Backpropagation through time (BPTT)

Video



23

May 6

  • Recurrent Neural Network (RNN) continued ...
    • Backpropagation through time (BPTT)
    • RNN is a very deep network (in time)
      • Vanishing information over forward time
      • Vanishing gradients over backward time
      • Information and gradients can explode over time as well
    • Benefit of recurrent modelling
      • N-bit addition
      • N-bit XOR
    • Bidirectional RNN

Video



24

May 11

  • LSTM and GRU: RNNs with long term memory
    • Long Short-Term Memory (LSTM) Cell
    • Gated Recurrent Unit (GRU)

Video

Messy Notes



25

May 13

  • Language Modelling
    • Text Prediction
    • Language Translation
    • Encoder-Decoder

Video

Messy Notes



26

May 18

  • Language Modelling continued...
    • Language Translation continued ...
      • A better decoder
      • Beam search

Video



27

May 20

  • Attention
    • Attention based decoder for
      • Language translation
      • Image captioning

Video

Messy Notes




August 5-12

Final Exams