Clickfree, Videobased Document Capture
with Waqas Tariq We propose a clickfree method for videobased digitization of multipage documents. The work is targeted at the noncommercial, lowvolume, home user. The document is viewed through a mounted camera and the user is only required to turn pages manually while the system automatically extracts the video frames representing stationary document pages. This is in contrast to traditional document conversion approaches such as photocopying and scanning which can be timeconsuming, repetitive, redundant and can lead to document deterioration. Main contributions of our work are i) a 3step method for automatic extraction of unique, stable and clear document pages from video, ii) a manually annotated data set of 37 videos consisting of 763 page turn events covering a large variety of documents, and iii) a soft, quantitative evaluation criterion that is highly correlated with the hard F1measure. The criterion is motivated by the need to counter the subjectivity in human marked ground truth for videos. On our data set, we report an F1measure of 0.91 and a soft score of 0.94 for the page extraction task. W. Tariq and N. Khan, ClickFree, VideoBased Document Capture – Methodology and Evaluation, CBDAR 2017. [Paper] [Bib] [Dataset] 

Word Pair Similarity
with Asma Shaukat We present a novel approach for computing similarity of English word pairs. While many previous approaches compute cosine similarity of individually computed word embeddings, we compute a single embedding for the word pair that is suited for similarity computation. Such embeddings are then used to train a machine learning model. Testing results on MEN and WordSim353 datasets demonstrate that for the task of word pair similarity, computing word pair embeddings is better than computing word embeddings only. A. Shaukat and N. Khan, New Word Pair Level Embeddings to Improve Word Pair Similarity, ICDAR, WML 2017. [Paper] [Bib] 

LSM: Perceptually Accurate Line Segment Merging
with Naila Hamid Existing line segment detectors tend to break up perceptually distinct line segments into multiple segments. We propose an algorithm for merging such broken segments to recover the original perceptually accurate line segments. The algorithm proceeds by grouping line segments on the basis of angular and spatial proximity. Then those line segment pairs within each group that satisfy unique, adaptive mergeability criteria are successively merged to form a single line segment. This process is repeated until no more line segments can be merged. We also propose a method for quantitative comparison of line segment detection algorithms. Results on the York Urban dataset show that our merged line segments are closer to humanmarked groundtruth line segments compared to stateoftheart line segment detection algorithms. N. Hamid and N. Khan, LSM: Perceptually Accurate Line Segment Merging, Journal of Electronic Imaging, 25(6), 2016 [Project page] [PDF] [Bib] [Code] 

Incremental Ellipse Detection
with Saadia Shahzad, Zubair Nawaz, Jerome Kieffer and Claudio Ferrero Projections of spherical/ellipsoidal objects appear as ellipses in 2D images. Detection of these ellipses enables information about the objects to be extracted. In some applications, images contain ellipses with scattered data i.e. portions of an ellipse can have significant gaps inbetween. We initially group pixels to get small connected regions. Then we use an incremental algorithm to grow these scattered regions into ellipses. In our proposed algorithm, we start growing a region by selecting neighbours near this region and near the bestfit ellipse of this region. After merging the neighbours into the original region, a new ellipse again. This proceeds until convergence. We evaluate our method on the problem of detecting ellipses in Xray diffraction images where diffraction patterns appear as socalled DebyeScherrer rings. Detection of these rings allows calibration of the experimental setup. Manuscript submitted for publication, 2017 

Videobased Vehicular Statistics Estimation
with Nausheen Qaiser A method for automatically estimating vehicular statistics from video. Accurately locating vehicles in a video becomes a challenging task when the brightness is varying and when the vehicles are occluded by each other. We are looking towards:


Automated Rural Map Parsing For Land Record Digitization
with Malik Masood A framework for automated mauzamap stitching from digital images of Colonialera, handdrawn cadastral maps. The framework



A Fast and Improved Hough Transform based Ellipse Detector using 1D Parametric Space
with Umar Farooq There are many approaches to detect ellipses from images. The standard Hough Transform based approach depends on the number of parameters and requires a five dimensional accumulator array to gather votes for the five parameters of an ellipse. We propose a modified HT based ellipse detector which requires a 1D parametric space. It overcomes the weaknesses of previous 1D approaches which include i) missed detections when multiple ellipses are partially overlapped, ii) redundant and false detections. We overcome these weaknesses while also reducing the execution time of the algorithm by exploiting gradient information of edge pixels. 

Automated Road Condition Monitoring
with Naila Hamid, Kashif Murtaza and Raqib Omer A framework for automated road condition monitoring. The research emphasis is on simultaneous incorporation of chromogeometric information. Accordingly, color and vanishing point based road detection is performed. The condition of the road area is then determined using a hierarchical classification scheme to deal with the nonrobustness of using color as a feature. 

Discriminative Dictionary Learning with Spatial Priors
While smoothness priors are ubiquitous in analysis of visual information, dictionary learning for image analysis has traditionally relied on local evidences only. We present a novel approach to discriminative dictionary learning with neighborhood constraints. This is achieved by embedding dictionaries in a Conditional Random Field (CRF) and imposing labeldependent smoothness constraints on the resulting sparse codes at adjacent sites. This way, a smoothness prior is used while learning the dictionaries and not just to make inference. This is in contrast with competing approaches that learn dictionaries without such a prior. Pixellevel classification results on the Graz02 bikes dataset demonstrate that dictionaries learned in our discriminative setting with neighborhood smoothness constraints can equal the stateoftheart performance of bottomup (i.e. superpixelbased) segmentation approaches. N. Khan and M. F. Tappen, Discriminative Dictionary Learning with Spatial Priors, ICIP 2013. [Paper] [Presentation] (Top 10% paper) 
Stable Discriminative Dictionary Learning via Discriminative Deviation
Discriminative learning of sparsecode based dictionaries tends to be inherently unstable. We show that using a discriminative version of the deviation function to learn such dictionaries leads to a more stable formulation that can handle the reconstruction/discrimination tradeoff in a principled manner. Results on Graz02 and UCF Sports datasets validate the proposed formulation. N. Khan and M. F. Tappen, Stable Discriminative Dictionary Learning via Discriminative Deviation, ICPR 2012. [Paper] [Presentation] (Acceptance Rate: 16.13%) 

Correcting Cuboid Corruption for Action Recognition in Complex Environment
The success of recognizing periodic actions in singlepersonsimplebackground datasets, such as Weizmann and KTH, has created a need for more difficult datasets to push the performance of action recognition systems. We identify the significant weakness in systems based on popular descriptors by creating a synthetic dataset using Weizmann dataset. Experiments show that introducing complex backgrounds, stationary or dynamic, into the video causes a significant degradation in recognition performance. Moreover, this degradation cannot be fixed by finetuning the system or selecting better interest points. Instead, we show that the problem lies at the cuboid level and must be addressed by modifying cuboids. S.Z. Masood, A. Nagaraja, N. Khan, J. Zhu, and M. F. Tappen. Correcting Cuboid Corruption for Action Recognition in Complex Environment. VECTaR2011 Workshop at ICCV 2011. [Paper] 


Training ManyParameter ShapefromShading Models Using a Surface Database
Shapefromshading (SFS) methods tend to rely on models with few parameters because these parameters need to be handtuned. This limits the number of different cues that the SFS problem can exploit. In this paper, we show how machine learning can be applied to an SFS model with a large number of parameters. Our system learns a set of weighting parameters that use the intensity of each pixel in the image to gauge the importance of that pixel in the shape reconstruction process. We show empirically that this leads to a significant increase in the accuracy of the recovered surfaces. Our learning approach is novel in that the parameters are optimized with respect to actual surface output by the system. In the first, offline phase, a hemisphere is rendered using a known illumination direction. The isophotes in the resulting reflectance map are then modelled using Gaussian mixtures to obtain a parametric representation of the isophotes. This Gaussian parameterization is then used in the second phase to learn intensitybased weights using a database of 3D shapes. The weights can also be optimized for a particular input image. N. Khan and M.F. Tappen, Training ManyParameter ShapefromShading Models Using a Surface Database, 3DIM 2009 Workshop at ICCV 2009. [Paper] [Presentation] 

3D Pose Estimation Using Implicit Algebraic Surfaces
2D3D pose estimation deals with estimating the relative position and orientation of a known 3D model from a 2D image of the model. Common explicit approaches to the problem involve registering the 3D model points to image data in order to reveal the optimal pose parameters. In contrast, this work presents an implicit approach by representing the 3D model and the image silhouette as zero sets of implicit polynomials and then minimising the distance between image outline pixels and the zeroset of the silhouette equation to reveal the optimal pose parameters. This work deals with representing 3D models as implicit polynomials, then computing sillhouette equations using elimination theory and finally estimating pose parameters. (Work done under Bodo Rosenhahn at the Graphics department of MaxPlanck Institute for Computer Science). N. Khan, Implicit 2D3D Pose Estimation, Masters Thesis, Universitaet des Saarlandes 2006. [PDF] [Thesis] 