CS568 Deep Learning
Spring 2023
Nazar Khan

The ability of biological brains to sense, perceive, analyse and recognise patterns can only be described as stunning. Furthermore, they have the ability to learn from new examples. Mankind's understanding of how biological brains operate exactly is embarrassingly limited. However, there do exist numerous 'practical' techniques that give machines the 'appearance' of being intelligent. This is the domain of statistical pattern recognition and machine learning. Instead of attempting to mimic the complex workings of a biological brain, this course aims at explaining mathematically well-founded techniques for analysing patterns and learning from them.

Artificial Neural Networks as extremely simplified models of the human brain have existed for almost 75 years. However, the last 25 years have seen a tremendous unlocking of their potential. This progress has been a direct result of a collection of network architectures and training techniques that have come to be known as Deep Learning. As a result, Deep Learning has taken over its parent fields of Neural Networks, Machine Learning and Artificial Intelligence. Deep Learning is quickly becoming must-have knowledge in many academic disciplines as well as in the industry.

This course is a mathematically involved introduction into the wonderful world of deep learning. It will prepare students for further study/research in the areas of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis, Natural Language Processing, Speech Recognition, Machine Translation, Autonomous Driving and other areas attempting to solve Artificial Intelligence (AI) type problems.

CS 568 is a graduate course worth 3 credit hours.

Lectures: Tuesday and Thursday, 10:00 a.m. - 11:30 a.m. in Room B4.
Office Hours: Tuesday and Thursday, 11:30 a.m. - 12:00 p.m. in Visiting Faculty Office.
Google Classroom: https://classroom.google.com/c/NjA4MjQyNjgzNTkx?cjc=d3paa5d
Online Quiz: Friday, 8:30 a.m via Google Classroom.
Recitations: Friday, 8:40 a.m. - 10:10 a.m


Prerequisites

  1. Python
  2. Basic Calculus (Differentiation, Partial derivatives, Chain rule)
  3. Linear Algebra (Vectors, Matrices, Dot-product, Orthogonality, Eigenvectors)
  4. Basic Probability (Bernoulli, Binomial, Gaussian, Discrete, Continuous)

Books and Other Resources

No single book will be followed as the primary text. Helpful online and offline resources include:

  1. Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2017. Available online
  2. Pattern Recognition and Machine Learning by Christopher Bishop, 2006
  3. Neural Networks and Deep Learning by Michael Nielsen, 2016. Available online
  4. Deep Learning with Python by J. Brownlee
  5. Deep Learning with Python by Francois Chollet

Grades

Grading sheet (Accessible only through your PUCIT email account)

Lectures

#

Date

Topics

Slides

Videos

Recitations

Readings

Miscellaneous

1

May 8

  • Course Details
  • Introduction to Machine Learning
  • Introduction to Neural Computations

Introduction to Deep Learning

Introduction to Neural Computations

Video


2

June 5

  • Mathematical Modelling of Neural Computations
    • McCulloch & Pitts Neurons
    • Hebbian Learning
    • Rosenblatt's Perceptron
  • XOR Problem
  • Multilayer Perceptrons

History of Neural Computation

Video

Friday, May 19: Recitation 1

Quiz 1

3

June 7

  • Universal Approximation Theorem for Multilayer Perceptrons
    • For Boolean functions
    • For classification boundaries
    • For continuous functions

MLPs and Universal Approximation Theorem

Video1

Video2


4

June 13

  • Training a perceptron
    • Minimization
    • Gradient Descent
    • Perceptron learning rule

Perceptron Training

Video 1

Video 2

Friday, May 26: Recitation 2


Quiz 2

5

June 15

  • Loss Functions and Activation Functions
    • Loss Functions for Regression
      • Univariate
      • Multivariate
    • Loss Functions for Classification
      • Binary
      • Multiclass
    • Activation Functions
      • Linear
      • Logistic Sigmoid
      • Softmax

Loss Functions and Activation Functions for Machine Learning

Video

    6

    June 20

    • Training Neural Networks
      • Forward Propagation
      • Backward Propagation

    Training Neural Networks: Forward and Backward Propagation

    Video

    Friday, June 2: Recitation 3

      Quiz 3

      7

      June 22

      • Backpropagation and Vanishing Gradients
        • Numerical derivative check
        • Efficiency of backpropagation
        • Vanishing gradient problem
        • Activation functions for Deep Learning
          • Tanh
          • ReLU
          • Leaky ReLU
          • ELU

      Backpropagation and Vanishing Gradients

      Video

      8

      June 27

      • Gradient Descent Variations - I
        • Problems with vanilla gradient descent
        • First-order methods
          • Resilient Propagation (Rprop)
        • Second-order methods
          • Taylor series approximation
          • Newton's Method for finding stationary points
          • Quickprop

      Variations of Gradient Descent

      Video

      Friday, June 9: Recitation 4

      Quiz 4


      Assignment 1: Backpropagation for MLPs.

      9

      June 29

      • Gradient Descent Variations - II
        • Momentum-based first-order methods
          • Momentum
          • Nesterov Accelerated Gradient
          • RMSprop
          • ADAM

      Momentum-based Gradient Descent

      Video

      July 7

      • Mid-term Exam

      Jul 10 till Sep 10

      • Summer Break

      10

      Sep 12

      • Automatic Differentiation
        • Analytic vs Automatic Differentiation
        • Linear Regression via Automatic Differentiation
        • Logistic Regression via Automatic Differentiation

      Automatic Differentiation

      Notes

      Video

      11

      Sep 14

      • Regularization - I
        • Primer on ML
          • Capabilities of polynomials
          • Everything contains noise
          • Overfitting vs Generalisation
        • Regularization Methods
          • Weights Penalty
          • Early Stopping
          • Data Augmentation
          • Label Smmoothing

      Regularization

      Video

      Friday, Sep 15: Recitation 5

      12

      Sep 19

      • Regularization - II
        • Dropout
        • BatchNorm

      Dropout and BatchNorm

      Video

      13

      Sep 21

      • Convolutional Neural Networks
        • Convolution
        • Neurons as detectors
        • Pooling
        • Forward Propagation
        • Covariance of CNNs

      Convolutional Neural Networks

      Video

      Friday, Sep 22: Recitation 6

      14

      Sep 26

      • Variations of Convolutional Neural Networks - I
        • 1x1 Convolutions
        • Depthwise Separable Convolutions
        • Transposed Convolutions

      Variations of Convolutional Neural Networks

      Video

      15

      Sep 28

      • Variations of Convolutional Neural Networks - II
        • Unpooling
        • Fully Convolutional Networks
        • ResNet

      Variations of Convolutional Neural Networks

      Video

      Friday, Sep 29: Recitation 7

      16

      Oct 3

      • Recurrent Neural Networks (RNN)
        • Static vs. Dynamic Inputs
        • Temporal, sequential and time-series data
        • Folding in space
        • Folding in time
        • Unfolding in time
        • Forward propagation in RNN

      Recurrent Neural Networks

      Video

      17

      Oct 5

      • RNN variants, benefits and stability
        • Bidirectional RNN
        • Some problems are inherently recurrent
        • Exploding gradients

      RNN variants, benefits and stability

      Video

      18

      Oct 10

      • Long Short-Term Memory (LSTM)
        • RNN cell and its weakness
        • Building blocks of the LSTM cell
        • The LSTM cell
        • How does the LSTM cell remember the past?
        • Variants
          • Peephole connections
          • Coupled forget and input gates
          • Gated Recurrent Unit (GRU)

      Long Short-Term Memory

      Video

      19

      Oct 12

      • Language Modelling
        • Modelling input text as numeric vectors
        • Text generation
        • Language translation
        • Beam Search

      Language Modelling

      Video

      20

      Oct 17

      • Attention
        • Attention-based decoder for
          • Language translation
          • Image captioning
          • Handwritten text recognition

      Attention

      Video

      21

      Oct 19

      • Transformers
        • Encoding with attention
          • Self-attention
          • Residual connection
          • Layer-norm
          • Parallelism by removing recurrence
          • Multiheaded self-attention
          • Positional encoding
        • Self-attention based Decoder
          • Encoder-decoder-attention

      Transformers