The Use of Word N-grams and Parts of Speech for Hierarchical Cluster Language Modeling

Citation

Wen Wang and D. Vergyri, “The Use of Word N-Grams and Parts of Speech for Hierarchical Cluster Language Modeling,” 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2006, pp. I-I, doi: 10.1109/ICASSP.2006.1660206.

Abstract

We present extensions to the work of backoff hierarchical class n-gram language modeling of Zitouni et al. (2003) by studying the efficacy of exploring the use of parts of speech (POS) information in hierarchical word clustering. We propose two approaches. One is to use POS n-gram contextual distributions of a target word for clustering. The other is to generate a class tree for each group of words sharing the same POS. The resulting class tree and a set of class trees, from the two approaches, respectively, are then employed in the hierarchical cluster language modeling. We evaluate the two approaches on SRI Arabic conversational telephone speech recognition system and show that the approach of building a set of POS-specific class trees achieves a 3 pct. relative improvement on perplexity compared to the model of Zitouni et al. and a 8 pct. relative improvement on perplexity over the baseline standard word n-grams. When used for N-best rescoring, our approach also outperforms the model of Zitouni et al. and the baseline and achieves significant word error rate (WER) reductions.


Read more from SRI

  • surgeons around a surgical robot

    The SRI research behind today’s surgical robotics

    Intuitive’s da Vinci 5 system represents a major leap in robotic-assisted medicine. It all started at SRI, which continues to advance teleoperation technologies.

  • a collage of digital graphs

    A banner year for quantum

    SRI-managed QED-C’s annual report on quantum trends captures an industry accelerating rapidly from technical promise toward major global impact.

  • ICE Cube containing SRI’s aerogel experiment, photographed prior to launch. Source: Aerospace Applications North America

    An SRI carbon capture experiment launches into space

    By synthesizing carbon-absorbing aerogels in microgravity, SRI research will give us a rare glimpse into how these materials could be radically improved.