Analysis of Conversational Speech with Application to Voice Adaptation

Mukherjeee, Bhagyashree; Prakash, Anusha; Murthy, Hema A.

doi:10.1109/ASRU51503.2021.9688146

Analysis of Conversational Speech with Application to Voice Adaptation

Date Issued

01-01-2021

Author(s)

Mukherjeee, Bhagyashree

Prakash, Anusha

Murthy, Hema A.

Indian Institute of Technology, Madras

DOI

10.1109/ASRU51503.2021.9688146

Abstract

Conversational speech has always been challenging in the context of text-to-speech synthesis (TTS). Most speech synthesis systems are trained on read speech data recorded in a studio environment. But, the intelligibility of TTS systems degrades drastically when using conversational speech. The proposed work attempts to perform extensive analysis on the issues in dealing with conversational speech compared to read speech. As an application, we try to dub the lectures available in English into an Indian language (Hindi) in the original speaker's voice. The task is difficult as classroom lectures are extempore, with variations in speaking rate, and contain speaker mannerisms that lead to disfluencies. We analyze the capability of end-to-end TTS systems in modeling lecture-based data. Based on the analysis, an attempt is made to adapt 'read speech TTS system' using conversational speech data to produce lectures in the original speaker's voice.

Subjects

Options

Analysis of Conversational Speech with Application to Voice Adaptation