January 1, 2010

Leveraging speaker diarization for meeting recognition from distant microphones

Citation

A. Stolcke, G. Friedland, and D. Imseng, “Leveraging speaker diarization for meeting recognition from distant microphones,” in Proc. 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4390–4393.

Abstract

We investigate using state-of-the-art speaker diarization output for speech recognition purposes. While it seems obvious that speech recognition could benefit from the output of speaker diarization (“Who spoke when”) for effective feature normalization and model adaptation, such benefits have remained elusive in the very challenging domain of meeting recognition from distant microphones. In this study, we show that recognition gains are possible by careful postprocessing of the diarization output. Still, recognition accuracy may suffer when the underlying diarization system performs worse than expected, even compared to far less sophisticated speaker-clustering techniques. We obtain a more accurate and robust overall system by combining recognition output with multiple speaker segmentations and clusterings. We evaluate our methods on data from the 2009 NIST Rich Transcription meeting recognition evaluation.

Keywords— speech processing, speaker diarization, meeting recognition, rich transcription, system combination.

↓ Download

Leveraging speaker diarization for meeting recognition from distant microphones

Abstract

Read more from SRI

Researchers develop materials that can take on the toughest conditions

Podcast: Re-imagining instructional quality and coaching

SRI’s Genome Explorer: Enhanced genome browser delivers better user experience