Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • Research Outputs
  • Fundings & Projects
  • People
  • Statistics
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Indian Institute of Technology Madras
  3. Publication7
  4. Building speech synthesis systems for Indian languages
 
  • Details
Options

Building speech synthesis systems for Indian languages

Date Issued
13-04-2015
Author(s)
Pradhan, Abhijit
Prakash, Anusha
Aswin Shanmugam, S.
Kasthuri, G. R.
Krishnan, Raghava
Hema A Murthy 
Indian Institute of Technology, Madras
DOI
10.1109/NCC.2015.7084931
Abstract
In this paper, new efforts to build text-to-speech synthesis systems (TTS) for Indian languages is presented. The synthesisers are built around both concatenative speech synthesis and statistical parametric speech synthesis frameworks. Text to speech synthesis systems require accurate segmentation. Obtaining accurate segmentation at the phone-level is a difficult task. Manual segmentation leads to human errors, while automatic segmentation using statistical approaches (hidden Markov model based approaches) leads to poor boundary information, when the amount of data used for training is small. A group delay based syllable segmentation semi-automatic tool is discussed. T he tool is semi-automatic as some of the boundaries obtained are inaccurate and have to be manually corrected. Next, a segmentation algorithm that uses both HMM based segmentation and group delay based segmentation, is used to obtain accurate boundaries automatically. The boundaries obtained are used in the syllable-based synthesiser for unit selection. In the statistical phone-based synthesiser, embedded reestimation is performed at the phone level. Syllable-based and penta-phone based HMMs are used for building the synthesiser. TTS systems for 12 different Indian languages namely Tamil, Hindi, Marathi, Malayalam, Telugu, Rajasthani, Bengali, Odia, Assamese, Manipuri, Kannada and Gujarati are built using semi-automatic segmentation and synthesisers have been built for 7 Indian languages using automatic segmentation. Evaluation of the semi-automatic segmentation systems indicate that the MOS (mean opinion score) is above 3.0 for most of the languages. Pair comparison tests on semi-automatic vs. automatic segmentation show that automatic segmentation is preferred.
Subjects
  • Indian languages

  • segmentation

  • statistical parametri...

  • syllable-based speech...

  • text-to-speech synthe...

  • unit selection synthe...

Indian Institute of Technology Madras Knowledge Repository developed and maintained by the Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback