Video Face Recognition Using Siamese Networks With Block-Sparsity Matching

Video Face Recognition Using Siamese Networks With Block-Sparsity Matching

Mokhayeri, Fania and Granger, Eric

IEEE Transactions on Biometrics, Behavior, and Identity Science 2019

Abstract : Deep learning models for still-to-video FR typically provide a low level of accuracy because faces captured in unconstrained videos are matched against a reference gallery comprised of a single facial still per individual. For improved robustness to intra-class variations, deep Siamese networks have recently been used for pair-wise face matching. Although these networks can improve state-of-the-art accuracy, the absence of prior knowledge from the target domain means that many images must be collected to account for all possible capture conditions, which is not practical for many real-world surveillance applications. In this paper, we propose the deep SiamSRC network that employs block-sparsity for face matching, while the reference gallery is augmented with a compact set of domain-specific facial images. Prior to deployment, clustering based on row sparsity is performed on unlabelled faces captured in videos from the target domain. Cluster centers discovered in the capture condition space (defined by, e.g., pose, scale and illumination) are used as rendering parameters with an off-the-shelf 3D face model, and a compact set of synthetic faces are thereby generated for each reference still based on representative intra-class information from the target domain. For pair-wise similarity matching with query facial images, the SiamSRC exploits sparse representation-based classification with a block structure. Experimental results obtained with the videos from the Chokepoint and COX-S2V datasets indicate that the proposed SiamSRC network can outperform state-of-the-art methods for still-to-video FR with a single sample per person, with only a moderate increase in computational complexity.