April 1, 2007

Combining Discriminative Feature, Transform, and Model Training for Large Vocabulary Speech Recognition

Citation

J. Zheng, O. Cetin, M. -Y. Hwang, X. Lei, A. Stolcke and N. Morgan, “Combining Discriminative Feature, Transform, and Model Training for Large Vocabulary Speech Recognition,” 2007 IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP ’07, 2007, pp. IV-633-IV-636, doi: 10.1109/ICASSP.2007.366992.

Abstract

Recent developments in large vocabulary continuous speech recognition (LVCSR) have shown the effectiveness of discriminative training approaches, employing the following three representative techniques: discriminative Gaussian training using the minimum phone error (MPE) criterion, discriminately trained features estimated by multilayer perceptrons (MLPs); and discriminative feature transforms such as feature-level MPE (fMPE). Although MLP features, MPE models, and fMPE transforms have each been shown to improve recognition accuracy, no previous work has applied all three in a single LVCSR system. This paper uses a state-of-the-art Mandarin recognition system as a platform to study the interaction of all three techniques. Experiments in the broadcast news and broadcast conversation domains show that the contribution of each technique is nonredundant, and that the full combination yields the best performance and has good domain generalization.

↓ Download

↓ View online

Combining Discriminative Feature, Transform, and Model Training for Large Vocabulary Speech Recognition

Abstract

Read more from SRI

SRI appoints Peter Marcotullio as Senior Vice President of Commercialization

PARC Forum: How innovation can save the living ocean

Researchers assess EV industry workforce potential in Northeast Ohio