Options
Analysis of Conversational Speech with Application to Voice Adaptation
Date Issued
01-01-2021
Author(s)
Abstract
Conversational speech has always been challenging in the context of text-to-speech synthesis (TTS). Most speech synthesis systems are trained on read speech data recorded in a studio environment. But, the intelligibility of TTS systems degrades drastically when using conversational speech. The proposed work attempts to perform extensive analysis on the issues in dealing with conversational speech compared to read speech. As an application, we try to dub the lectures available in English into an Indian language (Hindi) in the original speaker's voice. The task is difficult as classroom lectures are extempore, with variations in speaking rate, and contain speaker mannerisms that lead to disfluencies. We analyze the capability of end-to-end TTS systems in modeling lecture-based data. Based on the analysis, an attempt is made to adapt 'read speech TTS system' using conversational speech data to produce lectures in the original speaker's voice.