Options
Language identification from short segments of speech
Date Issued
01-01-2000
Author(s)
Abstract
Automatic language identification (LID) from the spoken speech utterance is a challenging problem. In this paper, we present an LID system that works for South Indian languages and Hindi. Each language is modeled using an approach based on Vector Quantisation [1]. The speech is segmented into different sounds (CVs) and the performance of the system on each of the segments is studied. Our studies indicate that the presence of some CVs is crucial for each language. We also find that for the same Consonant and Vowel (CV) combination, the quality of the sound is different in different languages. We show that once the speech signal is segmented into CVs, it is possible to perforin LID on very short segments (100-150ms) of speech itself.