Recent advances in ASR Applied to an Arabic transcription system for Al-Jazeera

Recent advances in ASR Applied to an Arabic transcription system for Al-Jazeera

Cardinal, Patrick and Ali, Ahmed and Dehak, Najim and Zhang, Yu and Al Hanai, Tuka and Zhang, Yifan and Glass, James and Vogel, Stephan

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2014

Abstract : This paper describes a detailed comparison of several state-of-the-art speech recognition techniques applied to a limited Arabic broadcast news dataset. The different approaches were all trained on 50 hours of transcribed audio from the Al-Jazeera news channel. The best results were obtained using i-vector-based speaker adaptation in a training scenario using the Minimum Phone Error (MPE) criteria combined with sequential Deep Neural Network (DNN) training. We report results for two different types of test data: broadcast news reports, with a best word error rate (WER) of 17.86%, and a broadcast conversations with a best WER of 29.85%. The overall WER on this test set is 25.6%.