Options
Continuous speech recognition using joint features derived from the modified group delay function and MFCC
Date Issued
01-01-2004
Author(s)
Abstract
Feature extraction and selection for continuous speech recognition is a complex task. State of the art speech recognition systems use features that are derived by ignoring the Fourier transform phase. In our earlier studies we have shown the efficacy of The Modified Group Delay Feature (MODGDF) derived from the Fourier transform phase for phoneme, syllable and speaker recognition. In this paper we use the MOD-GDF and the popular MFCC derived from Fourier transform magnitude to compute joint features for continuous speech recognition of two Indian languages Tamil and Telugu. A novel method of segmentation of the continuous speech signal into syllable like units followed by isolated style recognition using HMMs is used. We further use an innovative technique which transforms the problem of detecting the correct string of syllabic units with maximum likelihood to finding an optimal state sequence locally. The recognition system does not use any language models. The MODGDF gave promising recognition performance for the two languages and compared well with the MFCC. Joint features derived using MODGDF and MFCC gave a 10.6% improvement for both Tamil and Telugu languages. The improvement reinforces the hypothesis that MODGDF captures complementary information to that of the MFCC and can be used along with the MFCC to capture the complete information in the speech signal at functional level and help in avoiding heavy auditory and language models.