M. H. Sanchez, D. Vergyri, L. Ferrer, C. Richey, P. Garcia, B. Knoth and W. Jarrold, “Using prosodic and spectral features in detecting depression in elderly males,” in Proc. Interspeech, 2011, pp. 3001–3004.
As research in speech processing has matured, there has been much interest in paralinguistic speech processing problems including the speaker’s mental and psychological health. In this study, we focus on speech features that can identify the speaker’s emotional health, i.e., whether the speaker is depressed or not. We use prosodic speech measurements, such as pitch and energy, in addition to spectral features, such as formants and spectral tilt, and compute statistics of these features over different regions of the speech signal. These statistics are used as input features to a discriminative classifier that predicts the speaker’s depression state. We find that with an N-fold leave-one-out cross-validation setup, we can achieve a prediction accuracy of 81.3%, where random guess is 50%.