Deep convolutional nets and robust features for reverberations-robust speech recognition


Mitra, V., Wang, W. and Franco, H., “Deep convolutional nets and robust features for reverberation-robust speech recognition,” In Proc. IEEE Spoken Language Technology Workshop ’14, 2014, pp. 548–553.


While human listeners can understand speech in reverberant conditions, indicating that the auditory system is robust to such degradations, reverberation leads to high word error rates for automatic speech recognition (ASR) systems. In this work, we present robust acoustic features motivated by human speech perception for use in a convolutional deep neural network (CDNN)-based acoustic model for recognizing continuous speech in a reverberant condition. Using a single-feature system trained with the single channel data distributed through the REVERB 2014 challenge on ASR in reverberant conditions, we show a substantial relative reduction in word error rates (WERs) compared to the conventional filterbank energy-based features for single-channel simulated and real reverberation conditions. The reduction is more pronounced when multiple features and systems were combined together. The proposed system outperforms the best system reported in REVERB-2014 challenge in single channel full-batch processing task.

Index Terms—deep convolutional networks, feature combination, robust speech recognition, reverberation robustness, robust features.

Read more from SRI