Progressive boosting for class imbalance and its application to face re-identification

Progressive boosting for class imbalance and its application to face re-identification

Soleymani, Roghayeh and Granger, Eric and Fumera, Giorgio

Expert Systems with Applications 2018

Abstract : In practice, pattern recognition applications often suffer from imbalanced data distributions between classes, which may vary during operations w.r.t. the design data. For instance, in many video surveillance applications, e.g., face re-identification, the face individuals must be recognized over a distributed network of video cameras. An important challenge in such applications is class imbalance since the number of faces captured from an individual of interest is greatly outnumbered by those of others. Two-class classification systems designed using imbalanced data tend to recognize the majority (negative) class better, while the class of interest (positive class) often has the smaller number of samples. Several data-level techniques have been proposed to alleviate this issue, where classifier ensembles are designed with balanced data subsets by up-sampling positive samples or under-sampling negative samples. However, some informative samples may be neglected by random under-sampling and adding synthetic positive samples through up-sampling adds to training complexity. In this paper, a new ensemble learning algorithm called Progressive Boosting (PBoost) is proposed that progressively inserts uncorrelated groups of samples into a Boosting procedure to avoid loosing information while generating a diverse pool of classifiers. In many real-world recognition problems, the samples may be regrouped using some application-based contextual information. For example, in face re-identification applications, facial regions of a same person appearing in a camera field of view may be regrouped based on their trajectories found by face tracker. From one iteration to the next, the PBoost algorithm accumulates these uncorrelated groups of samples into a set that grows gradually in size and imbalance. Base classifiers are trained on samples selected from this set and validated on the whole set. Consequently, PBoost is more robust when the operational data may have unknown and variable levels of skew. In addition, the computation complexity of PBoost is lower than Boosting ensembles in literature that use under-sampling for learning from imbalanced data because not all of the base classifiers are validated on all negative samples. The new loss factor used in PBoost avoids biasing performance towards the negative class. Using this loss factor, the weight update of samples and classifier contribution in final predictions are set according to the ability of classifiers to recognize both classes. The proposed approach was validated and compared using synthetic data and videos from the Faces In Action, and COX dataset that emulate face re-identification applications. Results show that PBoost outperforms state of the art techniques in terms of both accuracy and complexity over different levels of imbalance and overlap between classes.