We propose a novel staged hybrid model for emotion detection in speech. Hybrid models exploit the strength of discriminative classifiers along with the representational power of generative models.
We present a system for detecting lexical stress in English words spoken by English learners. The system uses both spectral and segmental features to detect three levels of stress for each syllable in a word.
We present a neural network model to estimate articulatory trajectories from speech signals where the model was trained using synthetic speech signals generated by Haskins Laboratories’ task-dynamic model of speech production.
We present a system for detecting leadership and group cohesion in multiparty dialogs and broadcast conversations in English and Mandarin.
In this study, we focus on speech features that can identify the speaker’s emotional health, i.e., whether the speaker is depressed or not.
We present Conditional Random Fields based approaches for detecting agreement/disagreement between speakers in English broadcast conversation shows.
We present supervised approaches for detecting speaker roles and agreement/disagreement between speakers in broadcast conversation shows in three languages: English, Arabic, and Mandarin.
In this work, we compare several known approaches for multilingual acoustic modeling for three languages, Dari, Farsi and Pashto, which are of recent geo-political interest.
Improving language recognition with multilingual phone recognition and speaker adaptation transforms
We investigate a variety of methods for improving language recognition accuracy based on techniques in speech recognition, and in some cases borrowed from speaker recognition.