CS568 Deep Learning
Fall 2020
Nazar Khan

The ability of biological brains to sense, perceive, analyse and recognise patterns can only be described as stunning. Furthermore, they have the ability to learn from new examples. Mankind's understanding of how biological brains operate exactly is embarrassingly limited. However, there do exist numerous 'practical' techniques that give machines the 'appearance' of being intelligent. This is the domain of statistical pattern recognition and machine learning. Instead of attempting to mimic the complex workings of a biological brain, this course aims at explaining mathematically well-founded techniques for analysing patterns and learning from them.

Artificial Neural Networks as extremely simplified models of the human brain have existed for almost 75 years. However, the last 25 years have seen a tremendous unlocking of their potential. This progress has been a direct result of a collection of network architectures and training techniques that have come to be known as Deep Learning. As a result, Deep Learning has taken over its parent fields of Neural Networks, Machine Learning and Artificial Intelligence. Deep Learning is quickly becoming must-have knowledge in many academic disciplines as well as in the industry.

This course is a mathematically involved introduction into the wonderful world of deep learning. It will prepare students for further study/research in the areas of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis, Natural Language Processing, Speech Recognition, Machine Translation, Autonomous Driving and other areas attempting to solve Artificial Intelligence (AI) type problems.

CS 568 is a graduate course worth 3 credit hours.

Lectures: Monday and Wednesday, 8:30 a.m. - 9:55 a.m. @ https://meet.google.com/stg-pjvm-vnb
Office Hours: Monday, 2:00 p.m. - 3:00 p.m. @ https://meet.google.com/njc-gvuy-wtj

Recitations: Friday, 8:30 a.m. - 10:00 a.m @ https://meet.google.com/kqu-rbny-acz
TA: Arbish Akram


Prerequisites

  1. Python
  2. Basic Calculus (Differentiation, Partial derivatives, Chain rule)
  3. Linear Algebra (Vectors, Matrices, Dot-product, Orthogonality, Eigenvectors)
  4. Basic Probability (Bernoulli, Binomial, Gaussian, Discrete, Continuous)

Books and Other Resources

No single book will be followed as the primary text. Helpful online and offline resources include:

  1. Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2017. Available online
  2. Pattern Recognition and Machine Learning by Christopher Bishop, 2006
  3. Neural Networks and Deep Learning by Michael Nielsen, 2016. Available online
  4. Deep Learning with Python by J. Brownlee
  5. Deep Learning with Python by Francois Chollet

Grades

Grading sheet (Accessible only through your PUCIT email account)

Lectures

#

Date

Topics

Slides

Videos

Recitations

Readings

Miscellaneous

1

January 18

  • Course Details
  • Introduction to Machine Learning
  • Introduction to Neural Computations

Introduction to Deep Learning and Neural Computations

Video

Friday, January 22: Recitation 0


2

January 25

  • Mathematical Modelling of Neural Computations
    • McCulloch & Pitts Neurons
    • Hebbian Learning
    • Rosenblatt's Perceptron
  • XOR Problem
  • Multilayer Perceptrons

History of Neural Computation

Video

Friday, January 29: Recitation 1


3

February 1

  • Universal Approximation Theorem for Multilayer Perceptrons
    • For Boolean functions
    • For classification boundaries
    • For continuous functions

MLPs and Universal Approximation Theorem

Video1

Video2


Quiz 1

4

February 3

  • Training a perceptron
    • Minimization
    • Gradient Descent
    • Perceptron learning rule

Perceptron Training

Video 1

Video 2

Friday, February 5: Recitation 2



5

February 8

  • Loss Functions and Activation Functions
    • Loss Functions for Regression
      • Univariate
      • Multivariate
    • Loss Functions for Classification
      • Binary
      • Multiclass
    • Activation Functions
      • Linear
      • Logistic Sigmoid
      • Softmax

Loss Functions and Activation Functions for Machine Learning

Video

    Quiz 2

    6

    February 10

    • Training Neural Networks
      • Forward Propagation
      • Backward Propagation

    Training Neural Networks: Forward and Backward Propagation

    Video

    Friday, February 12: Recitation 3

      Assignment 1
      Assigned: Thurs. Feb 11
      Due: Thurs. Feb 18

      7

      February 15

      • Backpropagation and Vanishing Gradients
        • Numerical derivative check
        • Efficiency of backpropagation
        • Vanishing gradient problem
        • Activation functions for Deep Learning
          • Tanh
          • ReLU
          • Leaky ReLU
          • ELU

      Backpropagation and Vanishing Gradients

      Video

      Quiz 3

      8

      February 17

      • Gradient Descent Variations - I
        • Problems with vanilla gradient descent
        • First-order methods
          • Resilient Propagation (Rprop)
        • Second-order methods
          • Taylor series approximation
          • Newton's Method for finding stationary points
          • Quickprop

      Variations of Gradient Descent

      Video

      Friday, February 19: Recitation 4

      • Practical Tips for training Neural Networks
      • Pytorch basics
      • Neural Network for binary classification in Pytorch
      • Video
      • Recitation4.zip

      9

      February 22

      • Gradient Descent Variations - II
        • Momentum-based first-order methods
          • Momentum
          • Nesterov Accelerated Gradient
          • RMSprop
          • ADAM

      Momentum-based Gradient Descent

      Video

      Quiz 4

      Assignment 2: Initialization and ADAM
      Assigned: Sun. Feb 21
      Due: Sun. Feb 28

      10

      February 24

      • Regularization - I
        • Primer on ML
          • Capabilities of polynomials
          • Everything contains noise
          • Overfitting vs Generalisation
        • Regularization Methods
          • Weights Penalty
          • Early Stopping
          • Data Augmentation
          • Label Smmoothing

      Regularization

      Video

      Friday, February 26: Recitation 5

      11

      March 1

      • Regularization - II
        • Dropout
        • BatchNorm

      Dropout and BatchNorm

      Video

      Quiz 5

      12

      March 3

      • Convolutional Neural Networks
        • Convolution
        • Neurons as detectors
        • Pooling
        • Forward Propagation
        • Invariance of CNNs

      Convolutional Neural Networks

      Video

      Friday, March 5: Recitation 6

      Assignment 3: Regularization
      Assigned: Wed. March 3
      Due: Mon. March 8

      13

      March 8

      • Backpropagation in Convolutional Neural Networks
        • From FC to Subsampling Layer
        • From Subsampling to Conv Layer
        • From Conv Layer
        • Computing gradients in Conv Layer

      Backprop in a CNN

      Video

      Quiz 6

      14

      March 10

      • Variations of Convolutional Neural Networks - I
        • 1x1 Convolutions
        • Depthwise Separable Convolutions
        • Transposed Convolutions

      Variations of Convolutional Neural Networks

      Video

      Friday, March 12: Recitation 7

      Assignment 4: CNN
      Assigned: Sat. March 13
      Due: Thurs. March 25
      Discussion

      15

      March 15

      • Variations of Convolutional Neural Networks - II
        • Unpooling
        • Fully Convolutional Networks
        • ResNet

      Variations of Convolutional Neural Networks

      Video

      16

      March 17

      • Recurrent Neural Networks (RNN)
        • Static vs. Dynamic Inputs
        • Temporal, sequential and time-series data
        • Folding in space
        • Folding in time
        • Unfolding in time
        • Forward propagation in RNN
        • Derivative of a vector with respect to a matrix

      Recurrent Neural Networks

      Video

      Friday, March 19: No Recitation

      17

      March 22

      • Backpropagation Through Time (BPTT)
        • Matrix and Vector Calculus
        • Derivative of a vector with respect to a matrix
        • 5 types of derivatives required for RNN training

      Backpropagation Through Time

      Video

      Quiz 7 (delayed from previous week)

      18

      March 24

      • RNN variants, benefits and stability
        • Bidirectional RNN
        • Some problems are inherently recurrent
        • Exploding gradients

      RNN variants, benefits and stability

      Video

      Friday, March 26: Recitation 8

      19

      March 29

      • Long Short-Term Memory (LSTM)
        • RNN cell and its weakness
        • Building blocks of the LSTM cell
        • The LSTM cell
        • How does the LSTM cell remember the past?
        • Variants
          • Peephole connections
          • Coupled forget and input gates
          • Gated Recurrent Unit (GRU)

      Long Short-Term Memory

      Video

      Assignment 5: RNN
      Assigned: Wed. March 31
      Due: Thurs. April 8

      20

      March 31

      • Automatic Differentiation
        • Analytic vs Automatic Differentiation
        • Linear Regression via Automatic Differentiation
        • Logistic Regression via Automatic Differentiation

      Automatic Differentiation

      Notes

      Video

      Friday, April 2: Recitation 9

      21

      April 5

      • Language Modelling
        • Modelling input text as numeric vectors
        • Text generation
        • Language translation
        • Beam Search

      Language Modelling

      Video

      22

      April 7

      • Attention
        • Attention-based decoder for
          • Language translation
          • Image captioning
          • Handwritten text recognition

      Attention

      Video

      Friday, April 9: Recitation 10

      23

      April 12

      • Transformers - I
        • Encoding with attention
          • Self-attention
          • Residual connection
          • Layer-norm
          • Parallelism by removing recurrence
          • Multiheaded self-attention
          • Positional encoding

      Transformers

      24

      April 14

      • Transformers - II
        • Self-attention based Decoder
          • Encoder-decoder-attention

      Transformers

      No Recitation

      25

      April 19

      • Generative Adversrial Networks (GANs)
        • Generative vs. Discriminative Models
        • Adversarial Learning
        • Applications
        • GAN Training
          • Objective Functions
          • Training Procedure
          • Stability and Mode-Collapse
          • Tips & Tricks

      Generative Adversarial Networks

      Video

      26

      April 21

      • Graph Neural Networks (GNNs) - I
        • Euclidean vs. Non-Euclidean Domains
        • Permutation Invariance
        • Permutation Equivariance
        • Learning on Sets
        • Learning on Graphs

      Graph Neural Networks

      Video

      Friday, April 22: Recitation 11

      27

      April 26

      • Graph Neural Networks (GNNs) - II
        • GNN Layers
          • Convolutional (GCN)
          • Attention (GAT)
          • Message Passing (MPN)
        • Multilayer GNN
        • Example: 3-layer vanilla GNN
          • Node Prediction
          • Graph Prediction

      Graph Neural Networks

      Video

      28

      April 28

      • Conclusion
        • What was covered?
        • What were the general principles?
        • What was not covered?

      Conclusion

      Video

      May 7

      • Final Exam