Options
Multi-layer perceptron based speech activity detection for speaker verification
Date Issued
19-12-2011
Author(s)
Ganapathy, Sriram
Rajan, Padmanabhan
Hermansky, Hynek
Abstract
In this paper, we present a speech activity detection (SAD) technique for speaker verification in noisy environments. The proposed SAD is based on phoneme posteriors derived from a multi-layer perceptron (MLP). The MLP is trained using modulation spectral features, where long temporal segments of the speech signal are analyzed in critical bands. In each sub-band, temporal envelopes are derived using the autoregressive modelling technique called frequency domain linear prediction (FDLP). The robustness of the sub-band envelopes is achieved by a minimum mean square envelope estimation technique. We also experiment with MFCC features processed with cepstral mean subtraction. The speech features are input to the trained MLP to estimate phoneme posterior probabilities. For SAD, all the speech phoneme probabilities are merged to one speech class to derive speech/non-speech decisions. The proposed SAD is applied for a speaker verification task using noisy versions of NIST 2008 speaker recognition evaluation (SRE) data, where the proposed SAD provides significant improvements (relative equal error rate (EER) improvement of about 9 % in additive noise and about 19 % in reverberant conditions). Furthermore, the improvements are consistent for the two different front-ends (FDLP and MFCC) considered here. © 2011 IEEE.