Multispeaker Speech Activity Detection for the ICSI Meeting Recorder

Citation

T. Pfau, D. P. W. Ellis and A. Stolcke, “Multispeaker speech activity detection for the ICSI meeting recorder,” IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU ’01., 2001, pp. 107-110, doi: 10.1109/ASRU.2001.1034599.

Abstract

As part of a project into speech recognition in meeting environments, we have collected a corpus of multi-channel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in channel characteristics. Therefore, we have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM).

A baseline HMM speech activity detector has been extended to use mixtures of Gaussians to achieve robustness for different speakers under different conditions. Feature normalization and crosscorrelation processing are used to increase the channel independence and to detect crosstalk. The use of both energy normalization and crosscorrelation based post-processing results in a 35% relative reduction of the frame error rate.

Speech recognition experiments show that it is beneficial in this multi-speaker setting to use the output of the speech activity detector for pre-segmenting the recognizer input, achieving word error rates within 10% of those achieved with manual turn labeling.


Read more from SRI

  • Banner and attendees at the IEEE Hard Tech Venture Summit

    Cultivating hard tech startups that scale

    IEEE’s Hard Tech Venture Summit convened innovators at SRI to refine strategies and build new networks.

  • Patient going into a MRI

    Bringing surgical tools inside the MRI

    Drawing on SRI’s unique innovation ecosystem, the startup Medical Devices Corner is seeking to improve cancer surgery by advancing MRI-safe teleoperation.

  • Christopher Mims and Susan Patrick

    PARC Forum: How to AI

    The Wall Street Journal tech columnist Christopher Mims and SRI Education’s Susan Patrick discuss how AI can strengthen human agency.