Zheng, J., Franco, H., & Stolcke, A. (2004). Effective acoustic modeling for rate-of-speech variation in large vocabulary conversational speech recognition. In Eighth International Conference on Spoken Language Processing.
We investigate several variants of speech-rate-dependent acoustic models for large-vocabulary conversational speech recognition, in the framework of combining rate-specific models in decoding to compensate for speech rate variation. We study two basic approaches to combining rate-specific models: one combines models at the pronunciation level and the other at the HMM state level. Furthermore, we investigate the influence of different numbers of rate-of-speech classes and different parameter tying schemes. Experiments on the Switchboard database, using SRI?s DECIPHER recognition system, show that rate-dependent acoustic modeling resulted in a 2 pct. relative word error rate reduction over a rate- independent baseline, and that the pronunciation-level constraint, Gaussian sharing between rate-specific models, and a well-chosen number of rate-of-speech classes are all important for best performance.