About Me

I am currently an Assistant Professor at PUCIT. I obtained a Ph.D in Computer Science from UCF under Marshall Tappen. Before UCF, I was a research associate at the Computer Vision Lab at LUMS. I completed my Masters degree in Computer Science from Universitaet des Saarlandes in Saarbruecken, Germany and my Bachelors in Computer Science from LUMS.

Phone: +92 111-923-923 Ext: 521
Email: nazarkhan at pucit.edu.pk
Ground Floor, Graduate Block
PUCIT, Punjab University, Old Campus,
The Mall, Lahore, Pakistan


The 4 Point Lecture

Probability and Statistics MA-250
Computer Vision CS-565
Machine Learning CS-567
Computer Vision SE-461
Advanced Machine Learning CS-667
Machine Learning CS-567
Computer Vision CS-565/SE-461
Advanced Machine Learning CS-667
Machine Learning CS-567
Computer Vision CS-565/CS-465
Linear Algebra MA-310
Advanced Machine Learning CS-667
Machine Learning CS-567
Computer Vision CS-565/CS-465
Probability and Statistics MA-250
Advanced Machine Learning CS-667
Machine Learning CS-567
Computer Vision CS-465
Computer Vision CS-565
Linear Algebra MA-110
Advanced Machine Learning CS-667
Probability and Statistics MA-120
Linear Algebra MA-110
Deep Learning CS-568


  1. Saadia Shahzad (Incremental Ellipse Detection). Co-supervised with Dr. Zubair Nawaz.
  2. Naila Hamid (Perceptual Line Segment Extraction).
  3. Tayaba Anjum (Learning for Handwritten Text Recognition).
  4. Tauseef Iftikhar (Probabilistic Graphical Models for Map Stitching).
  5. Rabia Sirhindi (Spectral Clustering for High-Noise Ellipse Detection).
  6. Arbish Akram (Expression Synthesis and Analysis of Facial Images).
  1. Adeela Islam (Learning to solve Jigsaw Puzzles)
  2. Asmat Batool (Tabular Structure Detection)
  3. Umar Kamal (Index-word detection and recognition in Hand-drawn Cadastral Maps)
  4. Abubakar Siddique (Detection, Recognition, and Spotting of Hand-drawn Metadata in Historical Maps)
  1. Naila Hamid (Road Condition Classification in Winter) -- 2014.
  2. Waqas Tariq (Click-free Video-based Document Capture) -- 2015.
  3. Sania Ashraf (Machine Learning for Expression Synthesis) -- 2016.
  4. Omer Farooq (Fast 1D Hough Transform for Ellipse Detection) -- 2017.
  5. Arbish Akram (Facial Expression Synthesis via Generative Adversarial Networks) -- Fall 2018.
  6. Ayesha Rafique (Generating Patterns by Adversarial Learning) -- Spring 2019.
  7. Hussnain Haider (Handwritten, Offline Mathematical Expression Recognition) -- Summer 2019.
  8. Sehrush Seemab (Answer Grading via Sentence Similarity) -- Spring 2020


  1. Masked Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis, Information Technology University (ITU), Lahore, October 15, 2019
  2. Friday Presentation Series (FPS)

Current Research

Pixel-based Facial Expression Synthesis
with Arbish Akram

Facial expression synthesis has achieved remarkable advances with the advent of Generative Adversarial Networks (GANs). However, GAN-based approaches mostly generate photo-realistic results as long as the testing data distribution is close to the training data distribution. The quality of GAN results significantly degrades when testing images are from a slightly different distribution. Moreover, recent work has shown that facial expressions can be synthesized by changing localized face regions. In this work, we propose a pixel-based facial expression synthesis method in which each output pixel observes only one input pixel. The proposed method achieves good generalization capability by leveraging only a few hundred training images. Experimental results demonstrate that the proposed method performs comparably well against state-of-the-art GANs on in-dataset images and significantly better on out-of-dataset images. In addition, the proposed model is two orders of magnitude smaller which makes it suitable for deployment on resource-constrained devices.

Arbish Akram and Nazar Khan, Pixel-based Facial Expression Synthesis, 25th International Conference on Pattern Recognition (ICPR 2020), Milan Italy, January 10-15 2021.
[Project page] [Manuscript] [arXiv] [Presentation] [Bib] [Code]
Resource-aware On-device Deep Learning for Supermarket Hazard Detection
with M. G. Sarwar Murshed, James J. Carroll and Faraz Hussain

Supermarkets need to implement safety measures to create a safe environment for shoppers and employees. Many of these injuries, such as falls, are caused by a lack of safety precautions. Such incidents are preventable by timely detection of hazardous conditions such as undesirable objects on supermarket floors. In this paper, we describe EdgeLite, a new lightweight deep learning model specifically designed for local and fast inference on edge devices which have limited memory and compute power. We show how EdgeLite was deployed on three different edge devices for detecting hazards in images of supermarket floors. On our dataset of supermarket floor hazards, EdgeLite outperformed six state-of-the-art object detection models in terms of accuracy when deployed on the three small devices. Our experiments also showed that energy consumption, memory usage, and inference time of EdgeLite were comparable to that of the baseline models. Based on our experiments, we provide recommendations to practitioners for overcoming resource limitations and execution bottlenecks when deploying deep learning models in settings involving resource-constrained hardware.

M. G. Sarwar Murshed, James J. Carroll, Nazar Khan, Faraz Hussain. Resource-aware On-device Deep Learning for Supermarket Hazard Detection, 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, Dec. 2020.
[Manuscript] [Bib] [Code]
Improving Explainability of Image Classification in Scenarios with Class Overlap: Application to COVID-19 and Pneumonia
with Edward Verenich, Alvaro Velasquez and Faraz Hussain

Trust in predictions made by machine learning models is increased if the model generalizes well on previously unseen samples and when inference is accompanied by cogent explanations of the reasoning behind predictions. In the image classification domain, generalization can be assessed through accuracy, sensitivity, and specificity. Explainability can be assessed by how well the model localizes the object of interest within an image. However, both generalization and explainability through localization are degraded in scenarios with significant overlap between classes. We propose a method based on binary expert networks that enhances the explainability of image classifications through better localization by mitigating the model uncertainty induced by class overlap. Our technique performs discriminative localization on images that contain features with significant class overlap, without explicitly training for localization. Our method is particularly promising in real-world class overlap scenarios, such as COVID-19 and pneumonia, where expertly labeled data for localization is not readily available. This can be useful for early, rapid, and trustworthy screening for COVID-19.

Edward Verenich, Alvaro Velasquez, Nazar Khan, and Faraz Hussain. Improving Explainability of Image Classification in Scenarios with Class Overlap: Application to COVID-19 and Pneumonia, 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) Special Session, IEEE, Dec. 2020.
[arXiv] [Bib]
Automatic recognition of handwritten Urdu sentences
with Tayaba Anjum

Compared to derivatives from Latin script, recognition of derivatives from Arabic hand-written script is a complex task due to the presence of two-dimensional structure, context-dependent shape of characters, high number of ligatures, overlap of characters, and placement of diacritics. While significant attempts exist for Latin and Arabic scripts, very few attempts have been made for offline, handwritten, Urdu script. In this paper, we introduce a large, annotated dataset of handwritten Urdu sentences. We also present a methodology for the recognition of offline handwritten Urdu text lines. A deep learning based encoder/decoder framework with attention mechanism is used to handle two-dimensional text structure. While existing approaches report only character level accuracy, the proposed model improves on BLSTM-based state-of-the-art by a factor of 2 in terms of character level accuracy and by a factor of 37 in terms of word level accuracy. Incorporation of attention before a recurrent decoding framework helps the model in looking at appropriate locations before classifying the next character and therefore results in a higher word level accuracy.

Tayaba Anjum and Nazar Khan, An attention based method for offline handwritten Urdu text recognition, 17th International Conference on Frontiers in Handwriting Recognition (ICFHR 2020), Sep 7-10, 2020.
[Project page] [Manuscript] [Presentation] [Bib] [Code] [PUCIT-OHUL Dataset]
Masked Regression for Facial Expression Synthesis
with Arbish Akram, Arif Mehmood, Sania Ashraf and Kashif Murtaza

Compared to facial expression recognition, expression synthesis requires a very high-dimensional mapping. This problem exacerbates with increasing image sizes and limits existing expression synthesis approaches to relatively small images. We observe that facial expressions often constitute sparsely distributed and locally correlated changes from one expression to another. By exploiting this observation, the number of parameters in an expression synthesis model can be significantly reduced. Therefore, we propose a constrained version of ridge regression that exploits the local and sparse structure of facial expressions. We consider this model as masked regression for learning local receptive fields. In contrast to the existing approaches, our proposed model can be efficiently trained on larger image sizes. Experiments using three publicly available datasets demonstrate that our model is significantly better than L0, L1 and L2-regression, SVD based approaches, and kernelized regression in terms of mean- squared-error, visual quality as well as computational and spatial complexities. The reduction in the number of parameters allows our method to generalize better even after training on smaller datasets. The proposed algorithm is also compared with state- of-the-art GANs including Pix2Pix, CycleGAN, StarGAN and GANimation. These GANs produce photo-realistic results as long as the testing and the training distributions are similar. In contrast, our results demonstrate significant generalization of the proposed algorithm over out-of-dataset human photographs, pencil sketches and even animal faces.

Nazar Khan, Arbish Akram, Arif Mehmood, Sania Ashraf and Kashif Murtaza, Masked Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis, International Journal of Computer Vision (IJCV), 128(5), 2020.
[Paper] [ReadCube version] [Manuscript] [Presentation] [Bib] [Matlab Code] [Python Code]
Canny and Hough on Hadoop and Spark
with Bilal Iqbal, Waheed Iqbal, Arif Mahmood and Abdelkarim Erradi

Nowadays, video cameras are increasingly used for surveillance, monitoring, and activity recording. These cameras generate high resolution image and video data at large scale. Processing such large scale video streams to extract useful information with time constraints is challenging. Traditional methods do not offer scalability to process large scale data. In this paper, we propose and evaluate cloud services for high resolution video streams in order to perform line detection using Canny edge detection followed by Hough transform. These algorithms are often used as preprocessing steps for various high level tasks including object, anomaly, and activity recognition. We implement and evaluate both Canny edge detector and Hough transform algorithms in Hadoop and Spark. Our experimental evaluation using Spark shows an excellent scalability and performance compared to Hadoop and standalone implementations for both Canny edge detection and Hough transform. We obtained a speedup of 10.8x and 9.3x for Canny edge detection and Hough transform respectively using Spark. These results demonstrate the effectiveness of parallel implementation of computer vision algorithms to achieve good scalability for real-world applications.

Bilal Iqbal, Waheed Iqbal, Nazar Khan, Arif Mahmood and Abdelkarim Erradi, Canny edge detection and Hough transform for high resolution video streams using Hadoop and Spark, Cluster Computing 23(1), 2020, pp 397-408.
[Paper] [Bib]
Adversarial Placement Vector Generation
with Ayesha Rafique and Tauseef Iftikhar

Automated jigsaw puzzle solving is a challenging problem with numerous scientific applications. We explore whether a Generative Adversarial Network (GAN) can output jigsaw piece placements. State-of-the-art GANs for image-to-image translation cannot solve the jigsaw problem in an exact fashion. Instead of learning image-to-image mappings, we propose a novel piece-to-location mapping problem and present a trainable generative model for producing output that can be interpreted as the placement of jigsaw pieces. This represents a first step in developing a complete learning-based generative model for piece-to-location mappings. We introduce four new evaluation measures for the quality of output locations and show that locations generated by our model perform favorably.

A. Rafique, T. Iftikhar and N. Khan, Adversarial Placement Vector Generation, 2nd International Conference on Advancements in Computational Sciences (ICACS), 2019.
[Paper] [Presentation] [Bib]
Incremental Ellipse Detection
with Saadia Shahzad, Zubair Nawaz and Claudio Ferrero

Projections of spherical/ellipsoidal objects appear as ellipses in 2-D images. Detection of these ellipses enables information about the objects to be extracted. In some applications, images contain ellipses with scattered data i.e. portions of an ellipse can have significant gaps in-between. We initially group pixels to get small connected regions. Then we use an incremental algorithm to grow these scattered regions into ellipses. In our proposed algorithm, we start growing a region by selecting neighbours near this region and near the best-fit ellipse of this region. After merging the neighbours into the original region, a new ellipse again. This proceeds until convergence. We evaluate our method on the problem of detecting ellipses in X-ray diffraction images where diffraction patterns appear as so-called Debye-Scherrer rings. Detection of these rings allows calibration of the experimental setup.

S. Shahzad, N. Khan, Z. Nawaz and C. Ferrero, Automatic Debye-Scherrer elliptical ring extraction via a computer vision approach, Journal of Synchrotron Radiation, 25(2), 2018, 439--450.
[Paper] [Bib] [Code]
Click-free, Video-based Document Capture
with Waqas Tariq

We propose a click-free method for video-based digitization of multi-page documents. The work is targeted at the non-commercial, low-volume, home user. The document is viewed through a mounted camera and the user is only required to turn pages manually while the system automatically extracts the video frames representing stationary document pages. This is in contrast to traditional document conversion approaches such as photocopying and scanning which can be time-consuming, repetitive, redundant and can lead to document deterioration.
Main contributions of our work are i) a 3-step method for automatic extraction of unique, stable and clear document pages from video, ii) a manually annotated data set of 37 videos consisting of 726 page turn events covering a large variety of documents, and iii) a soft, quantitative evaluation criterion that is highly correlated with the hard F1-measure. The criterion is motivated by the need to counter the subjectivity in human marked ground truth for videos. On our data set, we report an F1-measure of 0.91 and a soft score of 0.94 for the page extraction task.

W. Tariq and N. Khan, Click-Free, Video-Based Document Capture -- Methodology and Evaluation, CBDAR 2017.
[Paper] [Presentation] [Bib] [Dataset] [Errata]
Word Pair Similarity
with Asma Shaukat

We present a novel approach for computing similarity of English word pairs. While many previous approaches compute cosine similarity of individually computed word embeddings, we compute a single embedding for the word pair that is suited for similarity computation. Such embeddings are then used to train a machine learning model. Testing results on MEN and WordSim-353 datasets demonstrate that for the task of word pair similarity, computing word pair embeddings is better than computing word embeddings only.

A. Shaukat and N. Khan, New Word Pair Level Embeddings to Improve Word Pair Similarity, ICDAR, WML 2017.
[Paper] [Presentation] [Bib]
LSM: Perceptually Accurate Line Segment Merging
with Naila Hamid

Existing line segment detectors tend to break up perceptually distinct line segments into multiple segments. We propose an algorithm for merging such broken segments to recover the original perceptually accurate line segments. The algorithm proceeds by grouping line segments on the basis of angular and spatial proximity. Then those line segment pairs within each group that satisfy unique, adaptive mergeability criteria are successively merged to form a single line segment. This process is repeated until no more line segments can be merged. We also propose a method for quantitative comparison of line segment detection algorithms. Results on the York Urban dataset show that our merged line segments are closer to human-marked ground-truth line segments compared to state-of-the-art line segment detection algorithms.

N. Hamid and N. Khan, LSM: Perceptually Accurate Line Segment Merging, Journal of Electronic Imaging, 25(6), 2016
[Project page] [PDF] [Bib] [Code]
Video-based Vehicular Statistics Estimation
with Nausheen Qaiser

A method for automatically estimating vehicular statistics from video. Accurately locating vehicles in a video becomes a challenging task when the brightness is varying and when the vehicles are occluded by each other. We are looking towards:
  • accurate tracking of vehicles in different scenarios such as
    • slow/fast-moving traffic,
    • stop and go traffic, and
    • light variation and occlusion
  • real-time vehicle tracking and speed computation.
Currently, the algorithm gives 19.3% error in speed estimates for a video capturing 3 lanes and containing about 80 vehicles from fast-moving to slow-moving ones, and stop-and-go traffic.
Automated Rural Map Parsing For Land Record Digitization
with Tauseef Iftikhar

A framework for automated mauza-map stitching from digital images of Colonial-era, hand-drawn cadastral maps. The framework
  1. is automated,
  2. determines its own failure, and therefore
  3. transfers to a semi-automated system.
In order to assist the stitching process, we also automatically extract meta-data from the map. Initially funded by DAAD under the grant for the MARUP project.
Input Previous 1D Method Our Method
A Fast and Improved Hough Transform based Ellipse Detector using 1D Parametric Space
with Umar Farooq

There are many approaches to detect ellipses from images. The standard Hough Transform based approach depends on the number of parameters and requires a five dimensional accumulator array to gather votes for the five parameters of an ellipse. We propose a modified HT based ellipse detector which requires a 1D parametric space. It overcomes the weaknesses of previous 1D approaches which include i) missed detections when multiple ellipses are partially overlapped, ii) redundant and false detections. We overcome these weaknesses while also reducing the execution time of the algorithm by exploiting gradient information of edge pixels.
Automated Road Condition Monitoring
with Naila Hamid, Kashif Murtaza and Raqib Omer

A framework for automated road condition monitoring. The research emphasis is on simultaneous incorporation of chromo-geometric information. Accordingly, color and vanishing point based road detection is performed. The condition of the road area is then determined using a hierarchical classification scheme to deal with the non-robustness of using color as a feature.

Previous Research

Discriminative Dictionary Learning with Spatial Priors

While smoothness priors are ubiquitous in analysis of visual information, dictionary learning for image analysis has traditionally relied on local evidences only. We present a novel approach to discriminative dictionary learning with neighborhood constraints. This is achieved by embedding dictionaries in a Conditional Random Field (CRF) and imposing label-dependent smoothness constraints on the resulting sparse codes at adjacent sites. This way, a smoothness prior is used while learning the dictionaries and not just to make inference. This is in contrast with competing approaches that learn dictionaries without such a prior. Pixel-level classification results on the Graz02 bikes dataset demonstrate that dictionaries learned in our discriminative setting with neighborhood smoothness constraints can equal the state-of-the-art performance of bottom-up (i.e. superpixel-based) segmentation approaches.

N. Khan and M. F. Tappen, Discriminative Dictionary Learning with Spatial Priors, ICIP 2013. [Paper] [Presentation] (Top 10% paper)
Stable Discriminative Dictionary Learning via Discriminative Deviation

Discriminative learning of sparse-code based dictionaries tends to be inherently unstable. We show that using a discriminative version of the deviation function to learn such dictionaries leads to a more stable formulation that can handle the reconstruction/discrimination trade-off in a principled manner. Results on Graz02 and UCF Sports datasets validate the proposed formulation.

N. Khan and M. F. Tappen, Stable Discriminative Dictionary Learning via Discriminative Deviation, ICPR 2012. [Paper] [Presentation] (Acceptance Rate: 16.13%)
Correcting Cuboid Corruption for Action Recognition in Complex Environment

The success of recognizing periodic actions in single-person-simple-background datasets, such as Weizmann and KTH, has created a need for more difficult datasets to push the performance of action recognition systems. We identify the significant weakness in systems based on popular descriptors by creating a synthetic dataset using Weizmann dataset. Experiments show that introducing complex backgrounds, stationary or dynamic, into the video causes a significant degradation in recognition performance. Moreover, this degradation cannot be fixed by fine-tuning the system or selecting better interest points. Instead, we show that the problem lies at the cuboid level and must be addressed by modifying cuboids.

S.Z. Masood, A. Nagaraja, N. Khan, J. Zhu, and M. F. Tappen. Correcting Cuboid Corruption for Action Recognition in Complex Environment. VECTaR2011 Workshop at ICCV 2011. [Paper]
Training Many-Parameter Shape-from-Shading Models Using a Surface Database

Shape-from-shading (SFS) methods tend to rely on models with few parameters because these parameters need to be hand-tuned. This limits the number of different cues that the SFS problem can exploit. In this paper, we show how machine learning can be applied to an SFS model with a large number of parameters. Our system learns a set of weighting parameters that use the intensity of each pixel in the image to gauge the importance of that pixel in the shape reconstruction process. We show empirically that this leads to a significant increase in the accuracy of the recovered surfaces. Our learning approach is novel in that the parameters are optimized with respect to actual surface output by the system. In the first, offline phase, a hemisphere is rendered using a known illumination direction. The isophotes in the resulting reflectance map are then modelled using Gaussian mixtures to obtain a parametric representation of the isophotes. This Gaussian parameterization is then used in the second phase to learn intensity-based weights using a database of 3D shapes. The weights can also be optimized for a particular input image.

N. Khan and M.F. Tappen, Training Many-Parameter Shape-from-Shading Models Using a Surface Database, 3DIM 2009 Workshop at ICCV 2009. [Paper] [Presentation]
3D Pose Estimation Using Implicit Algebraic Surfaces

2D-3D pose estimation deals with estimating the relative position and orientation of a known 3D model from a 2D image of the model. Common explicit approaches to the problem involve registering the 3D model points to image data in order to reveal the optimal pose parameters. In contrast, this work presents an implicit approach by representing the 3D model and the image silhouette as zero- sets of implicit polynomials and then minimising the distance between image outline pixels and the zero-set of the silhouette equation to reveal the optimal pose parameters. This work deals with representing 3D models as implicit polynomials, then computing sillhouette equations using elimination theory and finally estimating pose parameters. (Work done under Bodo Rosenhahn at the Graphics department of Max-Planck Institute for Computer Science).

N. Khan, Implicit 2D-3D Pose Estimation, Masters Thesis, Universitaet des Saarlandes 2006. [PDF] [Thesis]