Noise-resistant Feature Extraction and Model Training for Robust Speech Recognition


Sankar, A., Stolcke, A., Chung, T., Neumeyer, L., Weintraub, M., Franco, H., & Beaufays, F. (1996, February). Noise-resistant feature extraction and model training for robust speech recognition. In Proceedings of the 1996 DARPA CSR Workshop, Ardenhouse, NY.


In this paper we report on our recent work on noise-robust feature extraction and model training to alleviate the mismatch caused by different microphones and ambient room noise in the context of the 1995 DARPA-sponsored H3 benchmark test, which used the unlimited-vocabulary North American Business News (NABN) database. We present a novel noise-robust feature extraction algorithm that is a combination of our previously developed minimum mean square error (MMSE) log-energy estimation algorithm and the probabilistic optimum filtering (POF) algorithm. We also studied an approach based on training the automatic speech recognition (ASR) system with previously collected noisy speech. While both the above approaches gave significant improvements, it was found that combining them gave the best results. We also report on a new part-of-speech (POS) language model that makes it possible to train robust POS language models that incorporate longer contexts than is possible with word-based language models. Preliminary results using this approach were encouraging. 

Read more from SRI