Prosodic Knowledge Sources for Automatic Speech Recognition

Citation

Vergyri, D., Stolcke, A., Gadde, V. R. R., Ferrer, L., & Shriberg, E. (2003, April). Prosodic knowledge sources for automatic speech recognition. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). (Vol. 1, pp. I-I). IEEE.

Abstract

In this work, different prosodic knowledge sources are integrated into a state-of-the-art large vocabulary speech recognition system. Prosody manifests itself on different levels in the speech signal: within the words as a change in phone durations and pitch, inbetween the words as a variation in the pause length, and beyond the words, correlating with higher linguistic structures and nonlexical phenomena. We investigate three models, each exploiting a different level of prosodic information, in rescoring N-best hypotheses according to how well recognized words correspond to prosodic features of the utterance. Experiments on the Switchboard corpus show word accuracy improvements with each prosodic knowledge source. A further improvement is observed with the combination of all models, demonstrating that they each capture somewhat different prosodic characteristics of the speech signal.


Read more from SRI

  • An arid, rural Nevada landscape

    Can AI help us find valuable minerals?

    SRI’s machine learning-based geospatial analytics platform, already adopted by the USGS, is poised to make waves in the mining industry.

  • Two students in a computer lab

    Building a lab-to-market pipeline for education

    The SRI-led LEARN Network demonstrates how we can get the best evidence-based educational programs to classrooms and students.

  • Code reflected in a man's eyeglasses

    LLM risks from A to Z

    A new paper from SRI and Brazil’s Instituto Eldorado delivers a comprehensive update on the security risks to large language models.