Options
A Common Attribute based Unified HTS framework for Speech Synthesis in Indian Languages
Date Issued
01-01-2013
Author(s)
Ramani, B.
Christina, S. Lilly
Rachel, G. Anushiya
Solomi, V. Sherlin
Nandwana, Mahesh Kumar
Prakash, Anusha
Aswin, Shanmugam S.
Krishnan, Raghava
Kishore, S. P.
Samudravijaya, K.
Vijayalakshmi, P.
Nagarajan, T.
Indian Institute of Technology, Madras
Abstract
State-of-the art approaches to speech synthesis are unit selection based concatenative speech synthesis (USS) and hidden Markov model based Text to speech synthesis (HTS). The former is based on waveform concatenation of subword units, while the latter is based on generation of an optimal parameter sequence from subword HMMs. The quality of an HMM based synthesiser in the HTS framework, crucially depends on an accurate description of the phoneset, and accurate description of the question set for clustering of the phones. Given the number of Indian languages, building a HTS system for every language is time consuming. Exploiting the properties of Indian languages, a uniform HMM framework for building speech synthesisers is proposed. Apart from the speech and text data used, the tasks involved in building a synthesis system can be made language-independent. A language-independent common phone set is first derived. Similar articulatory descriptions also hold for sounds that are similar. The common phoneset and common question set are used to build HTS based systems for six Indian languages, namely, Hindi, Marathi, Bengali, Tamil, Telugu and Malayalam. Mean opinion score (MOS) is used to evaluate the system. An average MOS of 3.0 for naturalness and 3.4 for intelligibility is obtained for all languages.