Continuous speech emotion recognition with convolutional neural networks

Continuous speech emotion recognition with convolutional neural networks

Vryzas, Nikolaos and Vrysis, Lazaros and Matsiola, Maria and Kotsakis, Rigas and Dimoulas, Charalampos and Kalliris, George

AES: Journal of the Audio Engineering Society 2020

Abstract : A model for Speech Emotion Recognition, based on a Convolutional Neural Networks (CNNs) architecture, is proposed and evaluated. Recognition is performed on successive time frames of continuous speech. The dataset used for training and testing the model is the Acted Emotional Speech Dynamic Database (AESDD), while data augmentation techniques are applied as well. Experiments of subjective evaluation of the AESDD are presented, in order to serve as a reference for human-level recognition accuracy. The proposed CNN architecture outperforms previous baseline machine learning models (Support Vector Machines) by 8.4% in terms of accuracy and is also more efficient, since it bypasses the stage of handcrafted feature extraction.