Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • Research Outputs
  • Fundings & Projects
  • People
  • Statistics
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Indian Institute of Technology Madras
  3. Publication2
  4. USS Directed E2E Speech Synthesis for Indian Languages
 
  • Details
Options

USS Directed E2E Speech Synthesis for Indian Languages

Date Issued
01-01-2022
Author(s)
Srivastava, Sudhanshu
Hema A Murthy 
Indian Institute of Technology, Madras
DOI
10.1109/SPCOM55316.2022.9840801
Abstract
The state-of-the-art end-to-end (E2E) text-to-speech synthesis systems produce highly intelligible speech. But they lack the timbre of Unit Selection Synthesis (USS) and do not perform well in a low-resource environment. Moreover, the high synthesis quality of E2E is limited to read speech. But for conversational speech synthesis, we observe the problem of missing words and the creation of artifacts. On the other hand, USS not only produces the exact speech according to the text but also preserves the timbre. Combining the advantages of USS and the continuity property of E2E, this paper proposes a technique to combine the classical USS with the neural-network-based E2E system to develop a hybrid model for Indian languages.The proposed system guides the USS system using the E2E system. Syllable-based USS and character-based E2E TTS systems are built. Mel spectrograms of syllable-like units generated in the USS and E2E frameworks are compared, and the mel-spectrogram of the better unit is used in the waveglow vocoder. A dataset of 5 Indian languages is used for the experiments. DMOS scores are obtained for conversational speech utterances improperly synthesized in the vanilla E2E and USS frameworks using the Hybrid system and an average absolute improvement of 0.3 is observed over the E2E system.
Indian Institute of Technology Madras Knowledge Repository developed and maintained by the Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback