Unsupervised feature learning for environmental sound classification using weighted cycle-consistent generative adversarial network

Unsupervised feature learning for environmental sound classification using weighted cycle-consistent generative adversarial network

Esmaeilpour, Mohammad and Cardinal, Patrick and Koerich, Alessandro L.

arXiv 2019

Abstract : In this paper we propose a novel environmental sound classification approach incorporating unsupervised feature learning via the spherical K-Means++ algorithm and a new architecture for high-level data augmentation. The audio signal is transformed into a 2D representation using a discrete wavelet transform (DWT). The DWT spectrograms are then augmented by a novel architecture for cycle-consistent generative adversarial network. This high-level augmentation bootstraps generated spectrograms in both intra- and inter-class manners by translating structural features from sample to sample. A codebook is built by coding the DWT spectrograms with the speeded-up robust feature detector and the K-Means++ algorithm. The Random forest is the final learning algorithm which learns the environmental sound classification task from the code vectors. Experimental results in four benchmarking environmental sound datasets (ESC-10, ESC-50, UrbanSound8k, and DCASE-2017) have shown that the proposed classification approach outperforms most of the state-of-the-art classifiers, including con-volutional neural networks such as AlexNet and GoogLeNet, improving the classification rate between 3.51% and 14.34%, depending on the dataset.