Background sound classification in speech audio segments

Singh, Janvijay; Joshi, Raviraj

doi:10.1109/SPED.2019.8906597

Background sound classification in speech audio segments

Date Issued

01-10-2019

Author(s)

Singh, Janvijay

Joshi, Raviraj

DOI

10.1109/SPED.2019.8906597

Abstract

Background sound classification is the task of identifying secondary sound sources in the surrounding environment. Real-time speech is always accompanied by a context. This context can be very helpful in enhancing the behavior of a variety of applications. Traditionally, audio classification tasks have mainly focused on speech due to its wide applicability. Recent works have explored environmental scene classification using acoustic features. Availability of different datasets like UrbanSound, ESC50, and AUDIOSET have further aided the process. Previous works have mostly focused on the classification of independently occurring acoustic events. In this work, we explore the classification of background sound in audio recordings containing human speech. We prepare a new dataset YBSS-200 using youtube videos where each sample contains a distinct background sound and an accompanying foreground human voice. We present a convolutional neural network based transfer learning approach using a VGG like Network for classification of context in such acoustic signals. Specific data augmentation techniques were used to improve the classification results.

Subjects

Options

Background sound classification in speech audio segments