Options
Umesh Srinivasan
Loading...
Preferred name
Umesh Srinivasan
Official Name
Umesh Srinivasan
Alternative Name
Umesh, S.
Umesh, Srinivasan
Main Affiliation
Email
ORCID
Scopus Author ID
Google Scholar ID
6 results
Now showing 1 - 6 of 6
- PublicationOvercoming data sparsity in acoustic modeling of low-resource language by borrowing data and model parameters from high-resource languages(01-01-2016)
;Abraham, Basil; Joy, Neethu MariamIn this paper, we propose two techniques to improve the acoustic model of a low-resource language by: (i) Pooling data from closely related languages using a phoneme mapping algorithm to build acoustic models like subspace Gaussian mixture model (SGMM), phone cluster adaptive training (Phone-CAT), deep neural network (DNN) and convolutional neural network (CNN). Using the low-resource language data, we then adapt the afore mentioned models towards that language. (ii) Using models built from high-resource languages, we first borrow subspace model parameters from SGMM/Phone-CAT; or hidden layers from DNN/CNN. The language specific parameters are then estimated using the lowresource language data. The experiments were performed on four Indian languages namely Assamese, Bengali, Hindi and Tamil. Relative improvements of 10 to 30% were obtained over corresponding monolingual models in each case. - PublicationImproving acoustic models in TORGO dysarthric speech database(01-03-2018)
;Joy, Neethu MariamAssistive speech-based technologies can improve the quality of life for people affected with dysarthria, a motor speech disorder. In this paper, we explore multiple ways to improve Gaussian mixture model and deep neural network (DNN) based hidden Markov model (HMM) automatic speech recognition systems for TORGO dysarthric speech database. This work shows significant improvements over the previous attempts in building such systems in TORGO. We trained speaker-specific acoustic models by tuning various acoustic model parameters, using speaker normalized cepstral features and building complex DNN-HMM models with dropout and sequence-discrimination strategies. The DNN-HMM models for severe and severe-moderate dysarthric speakers were further improved by leveraging specific information from dysarthric speech to DNN models trained on audio files from both dysarthric and normal speech, using generalized distillation framework. To the best of our knowledge, this paper presents the best recognition accuracies for TORGO database till date. - PublicationInvestigation of different acoustic modeling techniques for low resource Indian language data(13-04-2015)
;Sriranjani, R. ;Murali Karthick, B.In this paper, we investigate the performance of deep neural network (DNN) and Subspace Gaussian mixture model (SGMM) in low-resource condition. Even though DNN outperforms SGMM and continuous density hidden Markov models (CDHMM) for high-resource data, it degrades in performance while modeling low-resource data. Our experimental results show that SGMM outperforms DNN for limited transcribed data. To resolve this problem in DNN, we propose to train DNN containing bottleneck layer in two stages: First stage involves extraction of bottleneck features. In second stage, the extracted bottleneck features from first stage are used to train DNN having bottleneck layer. All our experiments are performed using two Indian languages (Tamil & Hindi) in Mandi database. Our proposed method shows improved performance when compared to baseline SGMM and DNN models for limited training data. - PublicationDNNs for unsupervised extraction of pseudo FMLLR features without explicit adaptation data(01-01-2016)
;Joy, Neethu Mariam ;Baskar, Murali Karthick; Abraham, BasilIn this paper, we propose the use of deep neural networks (DNN) as a regression model to estimate feature-space maximum likelihood linear regression (FMLLR) features from unnormalized features. During training, the pair of unnormalized features as input and corresponding FMLLR features as target are provided and the network is optimized to reduce the mean-square error between output and target FMLLR features. During test, the unnormalized features are passed through this DNN feature extractor to obtain FMLLR-like features without any supervision or first pass decode. Further, the FMLLR-like features are generated frame-by-frame, requiring no explicit adaptation data to extract the features unlike in FMLLR or ivector. Our proposed approach is therefore suitable for scenarios where there is little adaptation data. The proposed approach provides sizable improvements over basis-FMLLR and conventional FMLLR when normalization is done at utterance level on TIMIT and Switchboard-33hour data sets. - PublicationOn improving acoustic models for TORGO dysarthric speech database(01-01-2017)
;Joy, Neethu Mariam; Abraham, BasilAssistive technologies based on speech have been shown to improve the quality of life of people affected with dysarthria, a motor speech disorder. Multiple ways to improve Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network (DNN) based automatic speech recognition (ASR) systems for TORGO database for dysarthric speech are explored in this paper. Past attempts in developing ASR systems for TORGO database were limited to training just monophone models and doing speaker adaptation over them. Although a recent work attempted training triphone and neural network models, parameters like the number of context dependent states, dimensionality of the principal component features etc were not properly tuned. This paper develops speakerspecific ASR models for each dysarthric speaker in TORGO database by tuning parameters of GMM-HMM model, number of layers and hidden nodes in DNN. Employing dropout scheme and sequence discriminative training in DNN also gave significant gains. Speaker adapted features like feature-space maximum likelihood linear regression (FMLLR) are used to pass the speaker information to DNNs. To the best of our knowledge, this paper presents the best recognition accuracies for TORGO database till date. - PublicationDNNs for unsupervised extraction of pseudo speaker-normalized features without explicit adaptation data(01-09-2017)
;Joy, Neethu Mariam ;Baskar, Murali KarthickIn this paper, we propose using deep neural networks (DNN) as a regression model to estimate speaker-normalized features from un-normalized features. We consider three types of speaker-specific feature normalization techniques, viz., feature-space maximum likelihood linear regression (FMLLR), vocal tract length normalization (VTLN) and a combination of both. The various un-normalized features considered were log filterbank features, Mel frequency cepstral coefficients (MFCC) and linear discriminant analysis (LDA) features. The DNN is trained using pairs of un-normalized features as input and corresponding speaker-normalized features as target. The network is optimized to reduce the mean square error between output and target speaker-normalized features. During test, un-normalized features are passed through this well trained DNN network to obtain pseudo speaker-normalized features without any supervision or adaptation data or first pass decode. As the pseudo speaker-normalized features are generated frame-by-frame, the proposed method requires no explicit adaptation data unlike in FMLLR or VTLN or i-vector. Our proposed approach is hence suitable for those scenarios where there is very little adaptation data. The proposed approach provides significant improvements over conventional speaker-normalization techniques when normalization is done at utterance level. The experiments done on TIMIT and 33-h subset and entire 300-h of Switchboard corpus supports our claim. With large amount of train data, the proposed pseudo speaker-normalized features outperforms conventional speaker-normalized features in the utterance-wise normalization scenario and gives consistent marginal improvements over un-normalized features.