Watch-list screening using ensembles based on multiple face representations

Watch-list screening using ensembles based on multiple face representations

Bashbaghi, Saman and Granger, Eric and Sabourin, Robert and Bilodeau, Guillaume Alexandre

Proceedings – International Conference on Pattern Recognition 2014

Abstract : Still-to-video face recognition (FR) is an important function in watch list screening, where faces captured over a network of video surveillance cameras are matched against reference stills of target individuals. Recognizing faces in a watch list is a challenging problem in semi – and unconstrained surveillance environments due to the lack of control over capture and operational conditions, and to the limited number of reference stills. This paper provides a performance baseline and guidelines for ensemble-based systems using a single high-quality reference still per individual, as found in many watch list screening applications. In particular, modular systems are considered, where an ensemble of template matchers based on multiple face representations is assigned to each individual of interest. During enrollment, multiple feature extraction (FE) techniques are applied to patches isolated in the reference still to generate diverse face-part representations that are robust to various nuisance factors (e.g., illumination and pose) encountered in video surveillance. The selection of relevant feature subsets, decision thresholds, and fusion functions of ensembles are achieved using faces of non-target individuals selected from reference videos (forming a universal background model). During operations, a face tracker gradually regroups faces captured from different people appearing in a scene, while each user-specific ensemble generates a decision per face capture. This leads to robust spatio-temporal FR when accumulated ensemble predictions surpass a detection threshold. Simulation results obtained with the Chokepoint video dataset show a significant improvement to accuracy, (1) when performing score-level fusion of matchers, where patches-based and FE techniques generate ensemble diversity, (2) when defining feature subsets and decision thresholds for each individual matcher of an ensemble using non-target videos, and (3) when accumulating positive detections over multiple frames.