Robust Features and System Fusion for Reverberation-robust Speech Recognition


Mitra, V., Wang, W., Lei, Y., Kathol, A., Sivaraman, G., & Espy-Wilson, C. (2014). Robust features and system fusion for reverberation-robust speech recognition. Proc. of REVERB Challenge.


Reverberation in speech degrades the performance of speech recognition systems, leading to higher word error rates. Human listeners can often ignore reverberation, indicating that the auditory system somehow compensates for reverberation degradations. In this work, we present robust acoustic features motivated by the knowledge gained from human speech perception and production, and we demonstrate that these features provide reasonable robustness to reverberation effects compared to traditional melfilterbank-based features. Using a single-feature system trained with the data distributed through the REVERB 2014 challenge on automatic speech recognition, we show a modest 12% and 0.2% relative reduction in word error rate (WER) compared to the melscale-feature-based baseline system for simulated and real reverberation conditions. The reduction is more pronounced when three systems are combined, resulting in a relative 20% reduction in WER for the simulated reverberation condition and 11.7% for the real reverberation condition compared to the mel-scale-featurebased baseline system. The WER was found to reduce even further with addition of more systems trained with robust acoustic features. HLDA transform of features and MLLR adaptation of speaker clusters were also explored in this study and both of them were found to improve the recognition performance under reverberant conditions.

Index Terms— feature combination, robust speech recognition, reverberation robustness, robust features.

Read more from SRI