Modeling Word-Level Rate-of-Speech Variation in Large Vocabulary Conversational Speech Recognition

Citation

Zheng, J., Franco, H., & Stolcke, A. (2003). Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition. Speech Communication, 41(2-3), 273-285.

Abstract

Variations in rate of speech (ROS) produce variations in both spectral features and word pronunciations that affect automatic speech recognition systems. To deal with these ROS effects, we propose to use a set of parallel rate-specific acoustic and pronunciation models. Rate switching is permitted at word boundaries, to allow within-sentence speech rate variation, which is common in conversational speech. Because of the parallel structure of rate-specific models and the maximum likelihood decoding method, our approach does not require ROS estimation before recognition, which is hard to achieve. We evaluate our models on a large-vocabulary conversational speech recognition task over the telephone.

Experiments on the NIST 2000 Hub-5 development set show that word-level ROS-dependent modeling results in a 2.2% absolute reduction in word error rate over a rate-independent baseline system. Relative to an enhanced baseline system that models crossword phonetic elision and reduction in a multiword dictionary, rate-dependent models achieve an absolute improvement of 1.5%. Furthermore, we introduce a novel method to modeling reduced pronunciations that are common in fast speech based on the approach of skipping short phones in the pronunciation models while preserving the phonetic context for the adjacent phones. This method is shown to also produce a small additional improvement on top of ROS-dependent acoustic modeling.


Read more from SRI

  • An arid, rural Nevada landscape

    Can AI help us find valuable minerals?

    SRI’s machine learning-based geospatial analytics platform, already adopted by the USGS, is poised to make waves in the mining industry.

  • Two students in a computer lab

    Building a lab-to-market pipeline for education

    The SRI-led LEARN Network demonstrates how we can get the best evidence-based educational programs to classrooms and students.

  • Code reflected in a man's eyeglasses

    LLM risks from A to Z

    A new paper from SRI and Brazil’s Instituto Eldorado delivers a comprehensive update on the security risks to large language models.