Options
An SVD based approach for spoken language identification
Date Issued
01-07-2018
Author(s)
Abstract
In this paper, we revisit the classical Singular Value Decomposition (SVD) based approach for dimension reduction in Language Identification (LID). This is proposed as an alternative to the state-of-the-art TVS based framework. A UBM-GMM is first built as in the state-of-the-art system. The training utterances are aligned with UBM using MAP adaptation to yield supervectors. The training supervectors are stacked row-wise to form a matrix. SVD is performed on this matrix of supervectors. The issue of ill-conditioned matrix is solved using a novel proxy projection technique. The supervectors are then projected along the top \mathcalL singular vectors. An SVM-based classifier is trained on the projected supervectors. During testing, the test supervector obtained by aligning with the UBM-GMM is projected along the same \mathcalL directions. The reduced dimension test vector is then classified using the SVM classifier. The proposed system shows an absolute improvement of 8.4% over the best i-vector based LID system for 30 second utterances of the CallFriend dataset with 12 languages. Proxy projection technique gives ≥3% absolute improvement over ordinary projection. As the T-matrix obtained in the TVS does not have orthogonal basis, the i-vectors are projected in orthogonal basis through SVD, which gives an absolute improvement of 6.4%. The proposed approach scales well with an accuracy of 93.87% on the Topcoder dataset with 176 languages.