We propose a novel staged hybrid model for emotion detection in speech. Hybrid models exploit the strength of discriminative classifiers along with the representational power of generative models.
Lexical Stress Classification for Language Learning Using Spectral and Segmental Features
We present a system for detecting lexical stress in English words spoken by English learners. The system uses both spectral and segmental features to detect three levels of stress for each syllable in a word.
Articulatory trajectories for large-vocabulary speech recognition
We present a neural network model to estimate articulatory trajectories from speech signals where the model was trained using synthetic speech signals generated by Haskins Laboratories’ task-dynamic model of speech production.
Detecting Leadership and Cohesion in Spoken Interactions
We present a system for detecting leadership and group cohesion in multiparty dialogs and broadcast conversations in English and Mandarin.
Using Prosodic and Spectral Features in Detecting Depression in Elderly Males
In this study, we focus on speech features that can identify the speaker’s emotional health, i.e., whether the speaker is depressed or not.
Detection of agreement and disagreement in broadcast conversations
We present Conditional Random Fields based approaches for detecting agreement/disagreement between speakers in English broadcast conversation shows.
Automatic identification of speaker role and agreement/disagreement in broadcast conversation
We present supervised approaches for detecting speaker roles and agreement/disagreement between speakers in broadcast conversation shows in three languages: English, Arabic, and Mandarin.
Acoustic data sharing for Afghan and Persian languages
In this work, we compare several known approaches for multilingual acoustic modeling for three languages, Dari, Farsi and Pashto, which are of recent geo-political interest.
Improving language recognition with multilingual phone recognition and speaker adaptation transforms
We investigate a variety of methods for improving language recognition accuracy based on techniques in speech recognition, and in some cases borrowed from speaker recognition.