Softsad: Integrated frame-based speech confidence for speaker recognition

Citation

M. McLaren, M. Graciarena and Y. Lei, “Softsad:  Integrated frame-based speech confidence for speaker recognition,” In Proc. 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

Abstract

In this paper we propose softSAD:  the direct integration of speech posteriors into a speaker recognition system instead of using speech activity detection (SAD).  SoftSAD improves the generalization of speech/non-speech models to unseen conditions by removing the need to make binary speech/non-speech decisions based on a threshold.  Instead, softSAD explicitly integrates into the Baum-Welch statistics a speech posterior for each frame.  We demonstrate the benefits of softSAD over SAD in severely mismatched conditions by evaluating a system developed for the National Institute for Standards and Technology (NIST) 2012 speaker recognition evaluation (SRE) on the channel-degraded Defense Advanced Research Projects Agency Robust Automatic Transcription of Speech speaker identification task (and vice versa).  We also show that SoftSAD provides benefits over SAD in matched conditions.

Index Terms— Speech activity detection, speaker identification, unseen conditions, mismatched conditions.


Read more from SRI

  • Banner and attendees at the IEEE Hard Tech Venture Summit

    Cultivating hard tech startups that scale

    IEEE’s Hard Tech Venture Summit convened innovators at SRI to refine strategies and build new networks.

  • Patient going into a MRI

    Bringing surgical tools inside the MRI

    Drawing on SRI’s unique innovation ecosystem, the startup Medical Devices Corner is seeking to improve cancer surgery by advancing MRI-safe teleoperation.

  • Christopher Mims and Susan Patrick

    PARC Forum: How to AI

    The Wall Street Journal tech columnist Christopher Mims and SRI Education’s Susan Patrick discuss how AI can strengthen human agency.