Machine Learning -- CS 667, PUCIT

Advanced Machine Learning (CS 667)
Spring 2019

The ability of biological brains to sense, perceive, analyse and recognise patterns can only be described as stunning. Furthermore, they have the ability to learn from new examples. Mankind's understanding of how biological brains operate exactly is embarrassingly limited.

However, there do exist numerous 'practical' techniques that give machines the 'appearance' of being intelligent. This is the domain of statistical pattern recognition and machine learning. Instead of attempting to mimic the complex workings of a biological brain, this course aims at explaining mathematically well-founded techniques for analysing patterns and learning from them.

This course is an extension of CS 567 -- Machine Learning and is therefore a mathematically involved introduction into the field of pattern recognition and machine learning. It will prepare students for further study/research in the areas of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis and other areas attempting to solve Artificial Intelligence (AI) type problems.

Pre-requisite(s): CS 567 -- Machine Learning

Text:

(Required) Pattern Recognition and Machine Learning by Christopher M. Bishop (2006)
(Recommended) Pattern Classification by Duda, Hart and Stork (2001)

Lectures:

Monday 9:45 am - 11:15 am Al Khwarizmi Lecture Theater

Wednesday 9:45 am - 11:15 am Al Khwarizmi Lecture Theater

Office Hours:

Tuesday 12:00 pm - 01:00 pm

Programming Environment: MATLAB

MATLAB Resources (by Aykut Erdem):

Introduction to MATLAB, by Danilo Scepanovic
MATLAB Tutorial, Stefan Roth
MATLAB Primer, MathWorks
Code Vectorization Guide, MathWorks
Writing Fast MATLAB code, Pascal Getreuer
MATLAB array manipulation tips and tricks, Peter J. Acklam

Grading Scheme/Criteria:

Category Weight Effective* Weight

Quizzes 5% 5%

Assignments 12% 30%

Project 8% 15%

Mid-Term 35% 20%

Final 40% 30%

*The current grading scheme is a PU requirement that I do not agree with. Assignments and the project will actually constitute 30% of the course. This will be acheived by awarding 15% in the mid-term and 10% in the final based on performance in the assignments and project. So the mid-term is effectively 20% of the grade and the final is effectively 30% of the grade.

Assignments

Logistic Regression

Assignment 1: Implement a binary Logistic Regression classifier and train it using the IRLS algorithm to recognise hand-written digits for 2 classes from the MNIST dataset. (Due: Monday, February 25th, 2019)
Assignment 2: Implement a multiclass Logistic Regression classifier and train it using SGD to recognise hand-written digits from the MNIST dataset. (Due: Monday, March 4th, 2019)

Neural Networks

Assignment 3:
- Part 1: Implement the backpropagation algorithm for MLP training and regenerate Figure 5.3 from Bishop's book.
- Part 2: Implement an MLP for multiclass classification and train it using SGD to recognise hand-written digits from the MNIST dataset.
(Due: Monday, March 18th, 2019)

Convolutional Neural Networks

Assignment 4: Implement a Convolutional Neural Network for classification and train it to recognise hand-written digits from the Fashion-MNIST dataset. (Due: Monday, April 8th, 2019)

PCA
- Assignment 5: Implement Principal Component Analysis and regenerate Figures 12.3, 12.4 and 12.5 from Bishop's book. (Due: Wednesday, April 25th, 2019)
- Assignment 6: Implement Principal Component Analysis for classification and use it to recognise hand-written digits from the MNIST dataset. (Due: Monday, 2019)
Density estimation via Gaussian Mixture Model (GMM)

Assignment 7: Implement a generic implementation of learning a GMM via the EM algorithm and regenerate Figure 9.8 from Bishop's book. (Due: Monday, 2019)

~~Multimodal conditional density estimation via Mixture Density Network (MDN)~~

~~Assignment 8: Implement a generic implementation of learning an MDN and regenerate Figures 5.19 and 5.21 from Bishop's book. (Due: Monday, 2019)~~

Grades:
Grading sheet (Accessible only through your PUCIT email account)

Content

Probabilistic Discriminative Models -- model posterior p(C_k|x) directly
- Logistic Sigmoid function and its derivative
- Softmax function and its derivative
- Positive Definite matrix
- Logistic Regression
- Positive definite Hessian implies convexity which implies unique, global minimum
- Newton-Raphson updates constitute IRLS algorithm.
- Multiclass Logistic Regression
Neural Networks
- Mathematical model of a single neuron
- Learn optimal features φ* as well as weights w* for those features
- Multilayer Perceptrons
- Back-propagation
- Regularization Techniques
  - Weight decay
  - Per-layer weight decay
  - Early stopping
  - Training with transformed data
  - Dropout and DropConnect
  - Batch Normalization
  - Structural Invariance
- Deep Learning
  - Saturation
  - Vanishing Gradient Problem
  - Better Activation Functions
  - Better Weight Initializations
  - Better Normalization
Convolutional Neural Networks
- Neurons as detectors
- Invariance
- Local correlation property of images
- Receptive field
- Feature maps
- Weight sharing
- Subsampling
- Backpropagation for CNN (Tutorial by Sania Ashraf)
Generative Adversarial Networks
- Adversarial learning via minmax game
Principal Component Analysis
- Dimensionality Reduction, Data Compression, Feature Extraction
- Maximum Variance Formulation of PCA
- PCA for high-dimensional data
- Whitening
- Classification via PCA
Python, Automatic Differentiation and TensorFlow
Support Vector Machines and Kernel Methods
- Maximising the margin -- hard constraints
- Lagrange Multipliers Method for Constrained Optimization
  - Maximization with equality constraint
  - Minimization with equality constraint
  - Maximization with inequality constraint
  - Minimization with inequality constraint
  - Optimization with multiple constraints
- Dual formulations
- Kernel Trick
- Improving generalisation -- soft constraints
Latent Variable Models
- K-means Clustering -- alternating optimization
- Spectral Clustering
- Gaussian Mixture Models
- Expectation Maximisation (EM) Algorithm
  - EM performs maximum likelihood (ML) for latent variable models
  - Extensions
    - MAP estimation
    - Generalized EM for 'difficult' M-step
  - Proof of convergence
Combining Models
- Committees, Bagging, Boosting
- Conditional Mixture Models
  - Mixtures of Linear Regression Models
  - Mixtures of Logistic Regression Models
- Mixture Density Networks