Options
Improved phone-cluster adaptive training acoustic model
Date Issued
16-11-2016
Author(s)
Abstract
Phone-cluster adaptive training (Phone-CAT) is a subspace based acoustic modeling technique inspired from cluster adaptive training (CAT) and subspace Gaussian mixture model (SGMM). This paper explores three extensions, viz., increasing phonetic subspace dimension, including sub-states and speaker subspace, to the basic Phone-CAT model to improve its recognition performance. The latter two extensions are similar in implementation as that of SGMM as both acoustic models share a similar subspace framework. But, since the phonetic subspace dimension of Phone-CAT is constrained to be equal to the number of monophones, the first extension is not straightforward to implement. We propose a Two-stage Phone-CAT model where we increase the phonetic subspace dimension to that of the number of monophone states. This model will still be able to retain the center phone capturing property of the state-specific vectors in basic Phone-CAT. Experiments done on 33-hour train subset of Switchboard database shows improvements in recognition performance of basic Phone-CAT model with the inclusion of the proposed extensions.