Crim’s content-based copy detection system for TRECVID

Crim’s content-based copy detection system for TRECVID

Héritier, Maguelonne and Gupta, Vishwa and Gagnon, Langis and Boulianne, Gilles and Foucher, Samuel and Cardinal, Patrick

2009 TREC Video Retrieval Evaluation Notebook Papers 2009

Abstract : Approach we have tested in our submitted runs: For visual-based copy detection, we find links between video shot key-frames using a probabilistic latent space model over local matches between the keyframe images. This facilitates the extraction of significant groups of local matching descriptors that may represent common semantic elements of near duplicate key-frames. For 2009, we have worked on an optimal representation of the test database. We first select the discriminant local descriptors. Then, we quantize the selected local descriptors into a hierarchical structure. For audio based copy detection, we give results with two different feature parameters: 15-bit energy difference parameters similar to [1] and a feature-based mapping of test frames to query frames. Differences we found among the runs: We submitted 1 run for the video only copy detection task (same run for Balanced and for nofa). Four runs were submitted for the “audio only” copy detection task: • CRIM.a.NOFA.EnNN2pass: energy-diff parameter search rescored with nearest-neighbor mapping. • CRIM.a.NOFA.NN22para: search using nearest-neighbor mapping. • CRIM.a.BALANCED.EnNN2pass: lower threshold than for NOFA case. • CRIM.a.BALANCED.EnNN22wt15: fuse Energy-diff parameters search (wt 15) with nearest-neighbor mapping search. We fused the video submission from CRIM with each of the four audio only submissions to get four different submissions for audio+ video copy detection task. The threshold was adjusted based on the results of 2008 a+v queries. Relative contribution of each component of our approach: For visual-based copy detection, the probabilistic latent space model over local matches between the key-frame images produces a robust and accurate filtering process in relation to all possible local matches. It works well even if there are only a few local matches between the key-frames of the copied video in question. We have introduced a new method for SIFT quantizing. It improves the time computation performance while keeping a good precision for SIFT representation. For audio only copy detection, the fingerprints obtained by mapping each test frame to the nearest query frame (NN-based fingerprints) reduced minimal NDCR by half over that obtained with energy-difference based fingerprints. What we learned about runs/approaches and the research question(s) that motivated them: Approaches based on local descriptor matching are efficient for video copy detection but very time consuming. Our method is more adapted when there is very little common visual information to establish a link between two key-frames. Video copy detection may not need such a good precision. For audio copy detection, mapping each test frame to the nearest query frame (NN-mapping) results in robust audio copy detection. The minimal normalized detection cost rate (NDCR) for even the worst case transformations is less than 0.03 for 2008 queries, and less than 0.075 for 2009 queries. The algorithm provides easy parallel processing on a graphics processing unit, leading to a very fast search.