Recent Innovations in Speech-to-Text Transcription at SRI-ICSI-UW

Citation

Stolcke, A., Chen, B., Franco, H., Gadde, V. R. R., Graciarena, M., Hwang, M. Y., … & Zhu, Q. (2006). Recent innovations in speech-to-text transcription at SRI-ICSI-UW. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1729-1744.

Abstract

We summarize recent progress in automatic speechto-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling. In the front end, we experimented with nonstandard features, including various measures of voicing, discriminative phone posterior features estimated by multilayer perceptrons, and a novel phone-level macro-averaging for cepstral normalization. Acoustic modeling was improved with combinations of front ends operating at multiple frame rates, as well as by modifications to the standard methods for discriminative Gaussian estimation. We show that acoustic adaptation can be improved by predicting the optimal regression class complexity for a given speaker. Language modeling innovations include the use of a syntaxmotivated almost-parsing language model, as well as principled vocabulary-selection techniques. Finally, we address portability issues, such as the use of imperfect training transcripts, and language-specific adjustments required for recognition of Arabic
and Mandarin.


Read more from SRI

  • An arid, rural Nevada landscape

    Can AI help us find valuable minerals?

    SRI’s machine learning-based geospatial analytics platform, already adopted by the USGS, is poised to make waves in the mining industry.

  • Two students in a computer lab

    Building a lab-to-market pipeline for education

    The SRI-led LEARN Network demonstrates how we can get the best evidence-based educational programs to classrooms and students.

  • Code reflected in a man's eyeglasses

    LLM risks from A to Z

    A new paper from SRI and Brazil’s Instituto Eldorado delivers a comprehensive update on the security risks to large language models.