Robust watch-list screening using dynamic ensembles of SVMs based on multiple face representations

Robust watch-list screening using dynamic ensembles of SVMs based on multiple face representations

Bashbaghi, Saman and Granger, Eric and Sabourin, Robert and Bilodeau, Guillaume Alexandre

Machine Vision and Applications 2017

Abstract : Still-to-video face recognition (FR) is an important function in video surveillance (VS), where faces captured over a network of video cameras are matched against reference stills of target individuals. Screening faces against a watch-list is a challenging VS application because the appearance of faces varies due to changing capture conditions and operational domains. The facial models used for matching may not be representative of faces captured with video cameras because they are typically designed a priori with only one reference still. In this paper, a multi-classifier framework is proposed for robust still-to-video FR based on multiple and diverse face representations of a single reference face still. During enrollment of a target individual, the single reference face still is modeled using an ensemble of SVM classifiers based on different patches and face descriptors. Multiple feature extraction techniques are applied to patches isolated in the reference still to generate a diverse SVM pool that provides robustness to common nuisance factors (e.g., variations in illumination and pose). The estimation of discriminant feature subsets, classifier parameters, decision thresholds, and ensemble fusion functions is achieved using the high-quality reference still and a large number of faces captured in lower-quality video of non-target individuals in the scene. During operations, the most competent subset of SVMs is dynamically selected according to capture conditions. Finally, a head-face tracker gradually regroups faces captured from different people appearing in a scene, while each individual-specific ensemble performs face matching. The accumulation of matching scores per face track leads to a robust spatiotemporal FR when accumulated ensemble scores surpass a detection threshold. Experimental results obtained with the Chokepoint and COX-S2V datasets show a significant improvement in performance w.r.t. reference systems, especially when individual-specific ensembles (1) are designed using exemplar-SVMs rather than one-class SVMs and (2) exploit score-level fusion of local SVMs (trained using features extracted from each patch), rather than using either decision-level or feature-level fusion with a global SVM (trained by concatenating features extracted from patches).