Stolcke, A., & Shriberg, E. (1996, May). Statistical language modeling for speech disfluencies. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (Vol. 1, pp. 405-408). IEEE.
Speech disfluencies (such as filled pauses, repetitions, restarts) are among the characteristics distinguishing spontaneous speech from planned or read speech. We introduce a language model that predicts disfluencies probabilistically and uses an edited, fluent context to predict following words. The model is based on a generalization of the standard N-gram language model. It uses dynamic programming to compute the probability of a word sequence, taking into account possible hidden disfluency events. We analyze the model’s performance for various disfluency types on the Switchboard corpus. We find that the model reduces word perplexity in the neighborhood of disfluency events; however, overall differences are small and have no significant impact on recognition accuracy. We also note that for modeling of the most frequent type of disfluency, filled pauses, a segmentation of utterances into linguistic (rather than acoustic) units is required. Our analysis illustrates a generally useful technique for language model evaluation based on local perplexity comparisons.