Vergyri, D., Shafran, I., Stolcke, A., Gadde, V. R. R., Akbacak, M., Roark, B., & Wang, W. (2007, August). The SRI/OGI 2006 spoken term detection system. In Interspeech (pp. 2393-2396).
This paper describes the system developed jointly at SRI and OGI for participation in the 2006 NIST Spoken Term Detection (STD) evaluation. We participated in the three genres of the English track: Broadcast News (BN), Conversational Telephone Speech (CTS), and Conference Meetings (MTG). The system consists of two phases. First, audio indexing, an offline phase, converts the input speech waveform into a searchable index. Second, term retrieval, possibly an online phase, returns a ranked list of occurrences for each search term. We used a word-based indexing approach, obtained with SRI’s large vocabulary Speech-to-Text (STT) system.
Apart from describing the submitted system and its performance on the NIST evaluation metric, we study the tradeoffs between performance and system design. We examine performance versus indexing speed, effectiveness of different index ranking schemes on the NIST score, and the utility of approaches to deal with out-of-vocabulary (OOV) terms.
Index Terms: spoken term detection, audio indexing