Advanced Machine Learning (CS 667)
Spring 2019
Dr. Nazar
Khan
The ability of biological brains to sense, perceive, analyse and recognise patterns can only be described as stunning. Furthermore, they have the ability to learn from new examples. Mankind's understanding of how biological brains operate exactly is embarrassingly limited.
However, there do exist numerous 'practical' techniques that give machines the 'appearance' of being intelligent. This is the domain of statistical pattern recognition and machine learning. Instead of attempting to mimic the complex workings of a biological brain, this course aims at explaining mathematically wellfounded techniques for analysing patterns and learning from them.
This course is an extension of CS 567  Machine Learning and is therefore a mathematically involved introduction into the field of pattern recognition and machine learning. It will prepare students for further study/research in the areas of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis and other areas attempting to solve Artificial Intelligence (AI) type problems.
Prerequisite(s): CS 567  Machine Learning
Text:
 (Required) Pattern Recognition and Machine Learning by Christopher M. Bishop (2006)
 (Recommended) Pattern Classification by Duda, Hart and Stork (2001)
Lectures:
Monday  9:45 am  11:15 am  Al Khwarizmi Lecture Theater 
Wednesday  9:45 am  11:15 am  Al Khwarizmi Lecture Theater 
Office Hours:
Tuesday  12:00 pm  01:00 pm 
Programming Environment: MATLAB
 MATLAB Resources (by Aykut Erdem):
Grading Scheme/Criteria:
Category  Weight  Effective* Weight 
Quizzes  5%  5% 
Assignments  12%  30% 
Project  8%  15% 
MidTerm  35%  20% 
Final  40%  30% 
*The current grading scheme is a PU requirement that I do not agree with. Assignments and the project will actually constitute 30% of the course. This will be acheived by awarding 15% in the midterm and 10% in the final based on performance in the assignments and project. So the midterm is effectively 20% of the grade and the final is effectively 30% of the grade.
Assignments
 Logistic Regression
 Assignment 1: Implement a binary Logistic Regression classifier and train it using the IRLS algorithm to recognise handwritten digits for 2 classes from the MNIST dataset. (Due: Monday, February 25th, 2019)
 Assignment 2: Implement a multiclass Logistic Regression classifier and train it using SGD to recognise handwritten digits from the MNIST dataset. (Due: Monday, March 4th, 2019)
 Neural Networks
 Assignment 3:
 Part 1: Implement the backpropagation algorithm for MLP training and regenerate Figure 5.3 from Bishop's book.
 Part 2: Implement an MLP for multiclass classification and train it using SGD to recognise handwritten digits from the MNIST dataset.
(Due: Monday, March 18th, 2019)
 Convolutional Neural Networks
 Assignment 4: Implement a Convolutional Neural Network for classification and train it to recognise handwritten digits from the FashionMNIST dataset. (Due: Monday, April 8th, 2019)
 PCA
 Assignment 5: Implement Principal Component Analysis and regenerate Figures 12.3, 12.4 and 12.5 from Bishop's book. (Due: Wednesday, April 25th, 2019)
 Assignment 6: Implement Principal Component Analysis for classification and use it to recognise handwritten digits from the MNIST dataset. (Due: Monday, 2019)
 Density estimation via Gaussian Mixture Model (GMM)
 Assignment 7: Implement a generic implementation of learning a GMM via the
EM algorithm and regenerate Figure 9.8 from Bishop's book. (Due: Monday, 2019)
 Multimodal conditional density estimation via Mixture Density
Network (MDN)
 Assignment 8: Implement a generic implementation of learning an MDN and regenerate Figures 5.19 and 5.21 from Bishop's book. (Due: Monday, 2019)
Grades:
Grading sheet (Accessible only through your PUCIT email account)
Content
 Probabilistic Discriminative Models  model posterior p(C_kx) directly
 Logistic Sigmoid function and its derivative
 Softmax function and its derivative
 Positive Definite matrix
 Logistic Regression
 Positive definite Hessian implies convexity which implies unique, global minimum
 NewtonRaphson updates constitute IRLS algorithm.
 Multiclass Logistic Regression
 Neural Networks
 Mathematical model of a single neuron
 Learn optimal features φ* as well as weights w* for those features
 Multilayer Perceptrons
 Backpropagation
 Regularization Techniques
 Weight decay
 Perlayer weight decay
 Early stopping
 Training with transformed data
 Dropout and DropConnect
 Batch Normalization
 Structural Invariance
 Deep Learning
 Saturation
 Vanishing Gradient Problem
 Better Activation Functions
 Better Weight Initializations
 Better Normalization
 Convolutional Neural Networks
 Neurons as detectors
 Invariance
 Local correlation property of images
 Receptive field
 Feature maps
 Weight sharing
 Subsampling
 Backpropagation for CNN (Tutorial by Sania Ashraf)
 Generative Adversarial Networks
 Adversarial learning via minmax game
 Principal Component Analysis
 Dimensionality Reduction, Data Compression, Feature Extraction
 Maximum Variance Formulation of PCA
 PCA for highdimensional data
 Whitening
 Classification via PCA
 Support Vector Machines and Kernel Methods
 Maximising the margin  hard constraints
 Lagrange Multipliers Method for Constrained Optimization
 Maximization with equality constraint
 Minimization with equality constraint
 Maximization with inequality constraint
 Minimization with inequality constraint
 Optimization with multiple constraints
 Dual formulations
 Kernel Trick
 Improving generalisation  soft constraints
 Latent Variable Models
 Combining Models
