D. V. M. Sanchez, A. Lawson, and H. Bratt, “Multi-system fusion of extended context prosodic and cepstral features for paralinguistic speaker trait classification,” in Proc. Interspeech, 2012, pp. 514–517.
Abstract
As automatic speech processing has matured, research atten- tion has expanded to paralinguistic speech problems that aim to detect beyond-the-words information. This paper focuses on the identification of seven speaker trait categories from the Interspeech Speaker Trait Challenge: likeability, intelligibility, openness, conscientiousness, extraversion, agreeableness, and neuroticism. Our approach combines multiple features includ- ing prosodic, cepstral, shifted-delta cepstral, and a reduced set of the OpenSMILE features. Our classification approaches included GMM-UBM, eigenchannel, support vector machines, and distance based classifiers. Optimized feature reduction and logistic regression-based score calibration and fusion led to results that perform competitively against the challenge baseline in all categories.
Share this



