Options
Acoustic unit discovery using transient and steady-state regions in speech and its applications
Date Issued
01-09-2021
Author(s)
Pandia, Karthik
Indian Institute of Technology, Madras
Abstract
Acoustic modelling in the absence of labelled audio is difficult in speech processing, especially in under-resourced languages. Ideas from theories of speech production and perception can aid acoustic modelling in such a setting. Several production and perception related studies have shown the importance of the dynamic nature of speech. In the present work, an attempt is made to discover and model the dynamic nature of the speech signal. Specifically, speech is modelled as a sequence of transient and steady-state units. Model initialisation, which is crucial for unsupervised acoustic modelling, is performed using the syllabic structure present in the speech signal. The proposed method has similarities with the distinctive region model (DRM) for speech production, where the dynamic regions are assumed to be contained within syllable-like segments. An analysis of the discovered units reveals that the units are of transient and steady-state forms. The steady-state units predominantly correspond to vowels. The transient units correspond to nasal, approximant, fricative, and stop transients. Finally, the effectiveness of the proposed method is explored by applying the acoustic units to zero-resource text-to-speech synthesis and unsupervised keyword spotting tasks.
Volume
88