Machine Learning -- CS 567, PUCIT

Machine Learning (CS 567)
Fall 2017

The ability of biological brains to sense, perceive, analyse and recognise patterns can only be described as stunning. Furthermore, they have the ability to learn from new examples. Mankind's understanding of how biological brains operate exactly is embarrassingly limited.

However, there do exist numerous 'practical' techniques that give machines the 'appearance' of being intelligent. This is the domain of Statistical Pattern Recognition and Machine learning. Instead of attempting to mimic the complex workings of a biological brain, this course aims at explaining mathematically well-founded and empirically successful techniques for analysing patterns and learning from them.

Accordingly, this course is a mathematically involved introduction into the field of pattern recognition and machine learning. It will prepare students for further study/research in the areas of Pattern Recognition, Machine Learning, Computer Vision, Data Analysis and other areas attempting to solve Artificial Intelligence (AI) type problems.

Passing this course is necessary for students planning to undertake research with Dr. Nazar Khan.
Course Outline

Prerequisites:
The course is designed to be self-contained. So the required mathematical details will be covered in the lectures. However, this is a math-heavy course. Students are encouraged to brush up on their knowledge of

calculus (differentiation, partial derivatives)
linear algebra (vectors, matrices, dot-product, orthogonality, eigenvectors, SVD)
probability and statistics

The students should know that the only way to benefit from this course is to be prepared to spend lots of hours reading the text book and attempting its exercises (preferably) alone or with a class-fellow.

Text:

(Required) Pattern Recognition and Machine Learning by Christopher M. Bishop (2006)
(Recommended) Pattern Classification by Duda, Hart and Stork (2001)

Lectures:

Monday 8:15 am - 9:40 am Al Khwarizmi Lecture Theater

Wednesday 8:15 am - 9:40 am Al Khwarizmi Lecture Theater

Office Hours:

Monday 10:00 am - 12:00 pm

Teaching Assistant:

Programming Environment: MATLAB

MATLAB Resources (by Aykut Erdem):

Introduction to MATLAB, by Danilo Scepanovic
MATLAB Tutorial, Stefan Roth
MATLAB Primer, MathWorks
Code Vectorization Guide, MathWorks
Writing Fast MATLAB code, Pascal Getreuer
MATLAB array manipulation tips and tricks, Peter J. Acklam

Grading:

Assignments 20%

Quizzes 5%

Mid-Term 35%

Final 40%

To determine course grade, graduate students will be evaluated in a more rigorous manner.
Theoretical assignments have to be submitted before the lecture on the due date.
There will be no make-up for any missed quiz.
Make-up for a mid-term or final exam will be allowed only under exceptional circumstances provided that the instructor has been notified beforehand.
Instructor reserves the right to deny requests for any make-up quiz or exam.
Worst score on quizzes will be dropped.
Worst score on assignments will be dropped.

Assignments:

Assigned Due

Programming Assignment 1 Wednesday, December 20, 2017 Monday, January 01, 2018

Programming Assignment 2 Monday, January 15, 2018 Monday, January 22, 2018

Grades:
Grading sheet (Accessible only through your PUCIT email account)

Content:

Lecture 1: Introduction
- Introduction
Lectures 2 and 3: Curve Fitting
- Curve Fitting (Over-fitting vs. Generalization)
- Regularized Curve Fitting
Lecture 4: Probability
- Frequentist view
- Sum rule
- Product rule
- Bayes' theorem
Lecture 5: Maximum Likelihood (ML) Estimation
- Gaussian Distribution
- Fitting a Gaussian Distribution to Data
- Probabilistic Curve Fitting
  - Likelihood
  - Log-Likelihood
  - Maximum Likelihood (ML) Estimation
Lecture 6: Maximum Posterior (MAP) Estimation
- Posterior ∝ Likelihood * Prior
- Bayesian Curve Fitting
Lecture 7: Optimization
- Model Selection (Cross Validation)
- Calculus of variations
- Lagrange Multipliers
Lecture 8: Decision Theory
- Minimising number of misclassifications
- Minimising expected loss
- Benefits of knowing posterior distributions
- Generative vs Discriminative vs. Discriminant functions
- Loss functions for regression problems
Lecture 9: Information Theory
- Information ∝ 1/Probability
- Entropy = expected information (measure of uncertainty)
  - Maximum Entropy Discrete Distribution (Uniform)
  - Maximum Entropy Continuous Distribution (Gaussian)
- Jensen's Inequality
- Relative Entropy (KL divergence)
- Mutual Information
Lecture 10: Probability Distributions and Parametric Density Estimation
- Density Estimation is fundamentally ill-posed
- Parametric Density Estimation
- Probability Distributions
  - Bernoulli
  - Binomial
  - Beta
  - Multinomial
  - Dirichlet
- Sequential Learning via Conjugate Priors
Lecture 11: Gaussian Distribution -- A Second Look
Lecture 12: Non-Parametric Density Estimation
- Non-Parametric Density Estimation
  - Histogram based
  - Kernel estimators
  - Nearest neighbours
Lecture 13: Linear Models for Regression
- Equivalence of likelihood maximisation (ML) and SSE minimisation (Least Squares)
- Design matrix
- Pseudoinverse
- Regularized least-squares estimation
- Linear regression for multivariate targets
Lecture 14: Linear Models for Classification
- Generalized linear model
- Orthogonality of w and decision surface
- Discriminant Functions -- model the decision surface directly
  - Least Squares Classification -- y(x)=f(w'x)
  - Fisher's Linear Discriminant -- J(w)=w'*S_b*w / w'*S_w*w
  - Perceptron -- y(x)=step(w'φ(x))
    - Gradient Descent
      - Batch
      - Sequential
      - Stochastic -- Better avoidance of local minima
      - Mini-batch
- Probabilistic Generative Models -- model posterior p(C_k|x) via class-conditional p(x|C_k) and prior p(C_k)
Lecture 15: Probabilistic Discriminative Models -- model posterior p(C_k|x) directly
- Logistic Sigmoid function and its derivative
- Softmax function and its derivative
- Positive definite matrix
- Logistic Regression via Logistic Sigmoid
- Positive definite Hessian implies convexity which implies unique, global minimum
- Newton-Raphson updates constitute IRLS algorithm.
- Multiclass Logistic Regression via Softmax