back icon
close icon

Capture phrases in quotes for more specific queries (e.g. "rocket ship" or "Fred Lynn")

Conference Paper  June 1, 2016

Noise and reverberation effects on depression detection from speech

SRI Authors Andreas Tsiartas

Citation

COPY

V. Mitra, A. Tsiartas and E. Shriberg, “Noise and reverberation effects on depression detection from speech,”  in Proc. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5795-5799.

Abstract

Speech-based depression detection has gained importance in recent years, but most research has used relatively quiet conditions or examined a single corpus per study. Little is thus known about the robustness of speech cues in the wild. This study compares the effect of noise and reverberation on depression prediction using 1) standard mel-frequency cepstral coefficients (MFCCs), and 2) features designed for noise robustness, damped oscillator cepstral coefficients (DOCCs). Data come from the 2014 Audio-Visual Emotion Recognition Challenge (AVEC). Results using additive noise and reverberation reveal a consistent pattern of findings for multiple evaluation metrics under both matched and mismatched conditions. First and most notably: standard MFCC features suffer dramatically under test/train mismatch for both noise and reverberation; DOCC features are far more robust. Second, including higher-order cepstral coefficients is generally beneficial. Third, artificial neural networks tend to outperform support vector regression. Fourth, spontaneous speech appears to offer better robustness than read speech. Finally, a cross-corpus (and crosslanguage) experiment reveals better noise and reverberation robustness for DOCCs than for MFCCs. Implications and future directions for real-world robust depression detection are discussed.

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you. Expect a response within 48 hours.

Our Privacy Policy