Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    Have you forgotten your password?
Repository logo
  • Communities & Collections
  • Research Outputs
  • Fundings & Projects
  • People
  • Statistics
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    Have you forgotten your password?
  1. Home
 
  • Details
Options

Acoustic unit discovery using transient and steady-state regions in speech and its applications

Date Issued
01-09-2021
Author(s)
Pandia, Karthik
Hema A Murthy 
Indian Institute of Technology, Madras
DOI
10.1016/j.wocn.2021.101081
Abstract
Acoustic modelling in the absence of labelled audio is difficult in speech processing, especially in under-resourced languages. Ideas from theories of speech production and perception can aid acoustic modelling in such a setting. Several production and perception related studies have shown the importance of the dynamic nature of speech. In the present work, an attempt is made to discover and model the dynamic nature of the speech signal. Specifically, speech is modelled as a sequence of transient and steady-state units. Model initialisation, which is crucial for unsupervised acoustic modelling, is performed using the syllabic structure present in the speech signal. The proposed method has similarities with the distinctive region model (DRM) for speech production, where the dynamic regions are assumed to be contained within syllable-like segments. An analysis of the discovered units reveals that the units are of transient and steady-state forms. The steady-state units predominantly correspond to vowels. The transient units correspond to nasal, approximant, fricative, and stop transients. Finally, the effectiveness of the proposed method is explored by applying the acoustic units to zero-resource text-to-speech synthesis and unsupervised keyword spotting tasks.
Volume
88
Subjects
  • Acoustics

  • Coarticulation

  • Field phonetics

  • First language acquis...

  • Morphology

  • Quechua

  • Speaking rate

Indian Institute of Technology Madras Knowledge Repository developed and maintained by the Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback