Now showing 1 - 10 of 89
  • Placeholder Image
    Publication
    Outerproduct of trajectory matrix for acoustic modeling using support vector machines
    (01-12-2004)
    Anitha, R.
    ;
    Satish, D. Srikrishna
    ;
    In this paper, we address the issues in classification of varying duration segments of speech using support vector machines. Commonly used methods for mapping the varying duration segments into fixed dimension patterns may lead to loss of crucial information necessary for classification. We propose a method in which the representation of a segment of speech is considered as a trajectory in a multidimensional space. A fixed dimension pattern vector derived from the outerproduct operation on the matrix representation of a multidimensional trajectory is given as input to the support vector machines. For acoustic modeling of speech segments consisting of multiple phonemes, the outerproduct operation is carried out for the trajectory matrix of each phoneme. The effectiveness of the proposed methods is demonstrated in recognition of isolated utterances of the E-set of English alphabet. © 2004 IEEE.
  • Placeholder Image
    Publication
    Local density estimation based clustering
    (01-12-2007)
    Pamudurthy, Sheetal Reddy
    ;
    Chandrakala, S.
    ;
    In this paper we propose a density based clustering approach. A kernel based density estimation technique is used to estimate the density of the given data set using a Gaussian kernel. Generally, a fixed width parameter is used for all the Gaussians in such methods. Here, a method to automatically determine the widths of Gaussians by considering the information available locally at a data point has been proposed. Cluster boundary information is subsequently extracted from the estimated density of the data. The performance of the propsed method is demonstrated on several data sets. Studies comparing the performance of the proposed method with that of DBSCAN and SVC are also presented. ©2007 IEEE.
  • Placeholder Image
    Publication
    Speaker recognition using pyramid match kernel based support vector machines
    (01-09-2012)
    Dileep, A. D.
    ;
    Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel, GMM-UBM mean interval kernel (GUMI) and intermediate matching kernel. Recently, the pyramid match kernel (PMK) using grids in the feature space as histogram bins and vocabulary-guided PMK (VGPMK) using clusters in the feature space as histogram bins have been proposed for recognition of objects in an image represented as a set of local feature vectors. In PMK, a set of feature vectors is mapped onto a multi-resolution histogram pyramid. The kernel is computed between a pair of examples by comparing the pyramids using a weighted histogram intersection function at each level of pyramid. We propose to use the PMK-based SVM classifier for speaker identification and verification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the PMK-based SVM classifier is construction of a pyramid of histograms. We first propose to form hard clusters, using k-means clustering method, with increasing number of clusters at different levels of pyramid to design the codebook- based PMK (CBPMK). Then we propose the GMM-based PMK (GMMPMK) that uses soft clustering. We compare the performance of the GMM-based approaches, and the PMK and other dynamic kernel SVM-based approaches to speaker identification and verification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification and verification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the state-of-the-art GMM-based approaches. For speaker recognition task, the GMMPMK-based SVM gives a performance that is better than that of SVMs using many other dynamic kernels and comparable to that of SVMs using state-of-the-art dynamic kernel, GUMI kernel. The storage requirements of the GMMPMK-based SVMs are less than that of SVMs using any other dynamic kernel. © 2012 Springer Science+Business Media, LLC.
  • Placeholder Image
    Publication
    Signal processing based segmentation and HMM based acoustic clustering of syllable segments for low bit rate segment vocoder at 1.4 Kbps
    (01-12-2008)
    Chevireddy, Sadhana
    ;
    ;
    In this paper, we propose a novel approach for developing a segment-based vocoder at very low bit-rates. The segmental unit chosen for coding is a syllable. A signal processing technique called automatic group delay based segmentation is used to obtain syllable like segments. The segment codebook is prepared by acoustically clustering the syllable segments using a Hidden Markov Model (HMM) based unsupervised and incremental training algorithm. When the residual is modeled using MELP, a bit-rate of 1.4 Kbps is achieved. The synthesized speech quality is compared with that of the standard MELP codec at 2.4 Kbps using the objective evaluation measure, PESQ. copyright by EURASIP.
  • Placeholder Image
    Publication
    Use of fuzzy mathematical concepts in character spotting for automatic recognition of continuous speech in Hindi
    (25-02-1992)
    Eswar, P.
    ;
    ;
    Yegnanarayana, B.
    In this paper we describe the use of fuzzy mathematical concepts in spotting characters from continuous speech in the Indian language Hindi. The research effort reported here is concerned with identifying the phonetic features from the acoustic parameters of a speech signal and combining the features to spot a character in continuous speech. These functions are performed by using expert systems that use confidence values for grading the conclusions arrived at each stage of processing. This paper discusses how this grading is obtained by using numerical translation of the description of a character obtained from an expert phonetician. Confidence values are assigned to the premises of the rules obtained from an expert phonetician using fuzzy membership functions. Fuzzy relations are used to grade the conclusions derived from these premises. Results of applying these techniques to character spotting are given. © 1992.
  • Placeholder Image
    Publication
    Kernel auto-regressive model with eXogenous inputs for nonlinear time series prediction
    (02-08-2007)
    Venkataramana Kim, B.
    ;
    In this paper we present a novel approach for nonlinear time series prediction using Kernel methods. The kernel methods such as Support Vector Machine(SVM) and Support Vector RegressionSVR) deal with nonlinear problems assuming independent and identically distributed (i.i.d.) data, without explicit notion of time. However, the problem of prediction necessitates temporal information. In this regard, we propose a novel time series modeling technique, Kernel Auto-Regressive model with eXogenous inputs (KARX) and associated estimation methods. Amongst others the advantage of KARX model compared to the widely used Nonlinear Auto-Regressive eXogenous (NARX) model (which is implemented using Artificial Neural Network (ANN)) is, implicit nonlinear mapping and better regularization capability. In this work, we make use of Kalman recursions instead of quadratic programming which is generally used in kernel methods. Also, we employ online estimation schemes for estimating model noise parameters. The efficacy of the approach is demonstrated on artificial time series as well as real world time series acquired from aircraft engines. © 2007 IEEE.
  • Placeholder Image
    Publication
    Spotting consonant-vowel units in continuous speech using autoassociative neural networks and support vector machines
    (01-12-2004)
    Gangashetty, Suryakanth V.
    ;
    ;
    Yegnanarayana, B.
    In this paper, we propose an approach for continuous speech recognition by spotting consonant-vowel (CV) units. The main issues in spotting CV units are the location of anchor points and labelling the regions around these anchor points using suitable classifiers. The vowel onset points (VOPs) have been used as anchor points. The distribution capturing ability of autoassociative neural network (AANN) models is explored for detection of VOPs in continuous speech. We consider support vector machine (SVM) based classifiers due to their ability of generalisation from limited training data and also due to their inherent discriminative learning. The CV spotting approach for continuous speech recognition has been demonstrated for sentences in Indian languages. © 2004 IEEE.
  • Placeholder Image
    Publication
    Reordering network as postprocessor in modular approach-based neural network architecture for recognition of consonant-vowel (CV) utterances
    (01-05-1999) ;
    Siva Rama Krishna Rao, J. Y.
    Recognition of consonant-vowel (CV) utterances in Indian languages is a challenging task because of the large number of classes and the high confusability among several classes. Modular approach based on artificial neural network models is considered for recognition of CV utterances. In this approach, the large number of classes is divided into sub-groups and a separate network is trained for each subgroup. Three different grouping criteria are considered and the performance of modular networks based on these criteria is studied. An improved performance is obtained by combining evidence from the three modular networks. Because of similarities among several classes, the class of a test utterance may not always have the strongest evidence. However, it may be among a small set of alternative classes with strong evidence. We propose to train another neural network to further discriminate among these classes and reorder the alternatives. A significant increase in the performance is obtained by using the reordering network as a postprocessor for recognition of isolated utterances of 65 CV classes in Indian languages.
  • Placeholder Image
    Publication
    Clustering of nonlinearly separable data using spiking neural networks
    (01-01-2007)
    Panuku, Lakshmi Narayana
    ;
    In this paper, we study the clustering capabilities of spiking neural networks. We first study the working of spiking neural networks for clustering linearly separable data. Also, a biological interpretation has been given to the delay selection in spiking neural networks. We show that by varying the firing threshold of spiking neurons during the training, nonlinearly separable data like the ring data can be clustered. When a multi-layer spiking neural network is trained for clustering, subclusters are formed in the hidden layer and these subclusters are combined in the output layer, resulting in hierarchical clustering of the data. A spiking neural network with a hidden layer is generally trained by modifying the weights of the connections to the nodes in the hidden layer and the output layer simultaneously. We propose a two-stage learning method for training a spiking neural network model for clustering. In the proposed method, the weights for the connections to the nodes in the hidden layer are learnt first, and then the weights for the connections to the nodes in the output layer are learnt. We show that the proposed two-stage learning method can cluster complex data such as the interlocking cluster data, without using lateral connections. © Springer-Verlag Berlin Heidelberg 2007.
  • Placeholder Image
    Publication
    Modular approach to recognition of strokes in Telugu script
    (01-12-2007)
    Jayaraman, Anitha
    ;
    ;
    In this paper, we address some issues in developing an online handwritten character recognition(HCR) system for an Indian language script, Telugu. The number of characters in this script is estimated to be around 5000. A character in this script is written as a sequence of strokes. The set of strokes in Telugu consists of 253 unique strokes. As the similarity among several strokes is high, we propose a modular approach for recognition of strokes. Based on the relative position of a stroke in a character, the stroke set has been divided into three subsets, namely, baseline strokes, bottom strokes and top strokes. Classifiers for the different subsets of strokes are built using support vector machines(SVMs). We study the performance of the classifiers for subsets of strokes and propose methods to improve their performance. A comparative study using hidden Markov models(HMMs) shows that the SVM based approach gives a significantly better performance. © 2007 IEEE.