Acoustic unit discovery using transient and steady-state regions in speech and its applications

Pandia, Karthik; Hema A Murthy

doi:10.1016/j.wocn.2021.101081

Acoustic unit discovery using transient and steady-state regions in speech and its applications

Date Issued

01-09-2021

Author(s)

Pandia, Karthik

Hema A Murthy

Indian Institute of Technology, Madras

DOI

10.1016/j.wocn.2021.101081

Abstract

Acoustic modelling in the absence of labelled audio is difficult in speech processing, especially in under-resourced languages. Ideas from theories of speech production and perception can aid acoustic modelling in such a setting. Several production and perception related studies have shown the importance of the dynamic nature of speech. In the present work, an attempt is made to discover and model the dynamic nature of the speech signal. Specifically, speech is modelled as a sequence of transient and steady-state units. Model initialisation, which is crucial for unsupervised acoustic modelling, is performed using the syllabic structure present in the speech signal. The proposed method has similarities with the distinctive region model (DRM) for speech production, where the dynamic regions are assumed to be contained within syllable-like segments. An analysis of the discovered units reveals that the units are of transient and steady-state forms. The steady-state units predominantly correspond to vowels. The transient units correspond to nasal, approximant, fricative, and stop transients. Finally, the effectiveness of the proposed method is explored by applying the acoustic units to zero-resource text-to-speech synthesis and unsupervised keyword spotting tasks.

Volume

88

Subjects

Options

Acoustic unit discovery using transient and steady-state regions in speech and its applications