Options
Background sound classification in speech audio segments
Date Issued
01-10-2019
Author(s)
Singh, Janvijay
Joshi, Raviraj
Abstract
Background sound classification is the task of identifying secondary sound sources in the surrounding environment. Real-time speech is always accompanied by a context. This context can be very helpful in enhancing the behavior of a variety of applications. Traditionally, audio classification tasks have mainly focused on speech due to its wide applicability. Recent works have explored environmental scene classification using acoustic features. Availability of different datasets like UrbanSound, ESC50, and AUDIOSET have further aided the process. Previous works have mostly focused on the classification of independently occurring acoustic events. In this work, we explore the classification of background sound in audio recordings containing human speech. We prepare a new dataset YBSS-200 using youtube videos where each sample contains a distinct background sound and an accompanying foreground human voice. We present a convolutional neural network based transfer learning approach using a VGG like Network for classification of context in such acoustic signals. Specific data augmentation techniques were used to improve the classification results.