Damped oscillator cepstral coefficients for robust speech recognition



V. Mitra, H. Franco, and M. Graciarena, “Damped oscillator cepstral coefficients for robust speech recognition,” in Proc. Interspeech, 2013, pp. 886–890.


This paper presents a new signal-processing technique motivated by the physiology of human auditory system. In this approach, auditory hair cells are modeled as damped oscillators that are stimulated by band-limited time domain speech signals acting as forcing functions. Oscillation synchrony is induced by time aligning and three-way coupling of the forcing functions across the individual bands such that a given oscillator is induced not only by its critical band’s forcing function but also by its two neighboring functions. We present two separate features; one which uses the damped oscillator response to the forcing functions without synchrony which we name as the Damped Oscillator Cepstral Coefficient (DOCC) and the other which uses the damped oscillator response to a time synchronized forcing function and we name it as the Synchronized Damped Oscillator Cepstral Coefficient (SyDOCC). The proposed features are used in an Aurora4 noise- and channel-degraded speech recognition task, and the results indicate that they improved speech-recognition performance in all conditions compared to the baseline melcepstral feature and other published noise robust features.

Index Terms—robust speech recognition, damped oscillators, modulation features, noise and channel degradation.

Read more from SRI