Phoneme selective speech enhancement using parametric estimators and the mixture maximum model: A unifying approach

Das, Amit; Hansen, John H.L.

doi:10.1109/TASL.2012.2201471

Phoneme selective speech enhancement using parametric estimators and the mixture maximum model: A unifying approach

Date Issued

30-08-2012

Author(s)

Das, Amit

Hansen, John H.L.

DOI

10.1109/TASL.2012.2201471

Abstract

This study presents a ROVER speech enhancement algorithm that employs a series of prior enhanced utterances, each customized for a specific broad level phoneme class, to generate a single composite utterance which provides overall improved objective quality across all classes. The noisy utterance is first partitioned into speech and non-speech regions using a voice activity detector, followed by a mixture maximum (MIXMAX) model which is used to make probabilistic decisions in the speech regions to determine phoneme class weights. The prior enhanced utterances are weighted by these decisions and combined to form the final composite utterance. The enhancement system that generates the prior enhanced utterances comprises of a family of parametric gain functions whose parameters are flexible and can be varied to achieve high enhancement levels per phoneme class. These parametric gain functions are derived using 1) a weighted Euclidean distortion cost function, and 2) by modeling clean speech spectral magnitudes or discrete Fourier transform coefficients by Chi or two-sided Gamma priors, respectively. The special case estimators of these gain functions are the generalized spectral subtraction (GSS), minimum mean square error (MMSE), two-sided Gamma or joint maximum a posteriori (MAP) estimators. Performance evaluations performed over two noise types and signal-to-noise ratios (SNRs) ranging from - 5 dB to 10 dB suggest that the proposed ROVER algorithm not only outperforms the special case estimators but also the family of parametric estimators when all phoneme classes are jointly considered. © 2012 IEEE.

Volume

20

Subjects

Options

Phoneme selective speech enhancement using parametric estimators and the mixture maximum model: A unifying approach