Document Image Retrieval Using Deep Features

Document Image Retrieval Using Deep Features

Wiggers, Kelly L. and Britto, Alceu S. and Heutte, Laurent and Koerich, Alessandro L. and Oliveira, Luiz Eduardo S.

Proceedings of the International Joint Conference on Neural Networks 2018

Abstract : This paper proposes a novel approach for content based graphical object retrieval in document images. The chal- lenge is to search for occurrences of a queried graphical objects in document images that can vary in terms of color, shape, texture and quality, increasing considerably the level of difficulty of the retrieval process. To that end, the manual feature engineering is avoided by learning the image representation for the retrieval task using a Convolutional Neural Network (CNN). However, such a representation should be as compact as possible to allow a fast document image retrieval and storage. Thus, a pretrained CNN model is used to cope with the lack of training data, which is fine tuned to achieve a compact yet discriminant representation of the graphical objects. From experiments conducted on the public Tobacco800 document image collection, we show that the proposed method compares favorably against state-of-the- art document image retrieval methods, reaching 0.72 of average precision (m A P). In addition, an increase of 4 percentage points in the average precision is observed using a compact deep representation in which the number of features is reduced by 16 times, thus allowing a reduction of 47% in terms of computation time by the image retrieval task.