Options
The relevance of NIST speaker recognition evaluations
Date Issued
12-12-2014
Author(s)
Asha, T.
Indian Institute of Technology, Madras
Abstract
Feature extraction and building of the Universal Background Model (UBM) are crucial for building speaker verification/identification systems in the total variability subspace (TVS) framework. The motivation of this study is to analyze the significance of various parameters involved in front end processing for different databases. A number of different parameters like energy threshold for voice activity detection, the number of filters, the warping of the frequency scale, the number of cepstral coefficients and the shape of the filter are studied. Three different databases namely, NIST 2003, NIST 2010 and NTIMIT are studied. The optimal front-end obtained using NIST 2003 is observed to function well for NIST 2010 as conditions involving similar data was evaluated for both the databases. On the other hand, it is shown that the same optimal front-end is not scalable for NTIMIT database which is collected from a different environment. The experiments performed in this paper indicate that the optimal front-end parameters are specific to a particular dataset. In addition, mismatch between development data and evaluation data is shown to result in a poor system. Given the results, the paper questions the relevance of the NIST Speaker Recognition evaluations in real environments.