Morgan, N., Chen, B. Y., Zhu, Q., & Stolcke, A. (2004, May). TRAPping conversational speech: Extending TRAP/Tandem approaches to conversational telephone speech recognition. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. I-537). IEEE.
TempoRAl Patterns (TRAPs) and Tandem MLP/HMM approaches incorporate feature streams computed from longer time intervals than the conventional short-time analysis. These methods have been used for challenging small- and medium-vocabulary recognition tasks, such as Aurora and SPINE. Conversational telephone speech recognition is a difficult large-vocabulary task, with current systems giving incorrect output for 20-40% of the words, depending on the system complexity and test set. Training and test times for this problem also tend to be relatively long, making rapid development quite difficult. In this paper we report experiments with a reduced conversational speech task that led to the adoption of a number of engineering decisions for the design of an acoustic front end. We then describe our results with this front end on a full vocabulary conversational telephone speech task. In both cases the front end yielded significant improvements over the baseline.