A low-cost parallel K-means VQ algorithm using cluster computing

A low-cost parallel K-means VQ algorithm using cluster computing

Britto, Alceu De S. and De Souza, Paulo S.L. and Sabourin, Robert and De Souza, Simone R.S. and Borges, Díbio L.

Proceedings of the International Conference on Document Analysis and Recognition, ICDAR 2003

Abstract : In this paper we propose a parallel approach for the Kmeans Vector Quantization (VQ) algorithm used in a twostage Hidden Markov Model (HMM)-based system for recognizing handwritten numeral strings. With this parallel algorithm, based on the master/slave paradigm, we overcome two drawbacks of the sequential version: A) the time taken to create the codebook; and b) the amount of memory necessary to work with large training databases. Distributing the training samples over the slaves’ local disks reduces the overhead associated with the communication process. In addition, models predicting computation and communication time have been developed. These models are useful to predict the optimal number of slaves taking into account the number of training samples and codebook size.