We present a system for detecting lexical stress in English words spoken by English learners. The system uses both spectral and segmental features to detect three levels of stress for each syllable in a word.
Emotion Detection in Speech Using Deep Networks
We propose a novel staged hybrid model for emotion detection in speech. Hybrid models exploit the strength of discriminative classifiers along with the representational power of generative models.
Articulatory trajectories for large-vocabulary speech recognition
We present a neural network model to estimate articulatory trajectories from speech signals where the model was trained using synthetic speech signals generated by Haskins Laboratories’ task-dynamic model of speech production.
Detecting Leadership and Cohesion in Spoken Interactions
We present a system for detecting leadership and group cohesion in multiparty dialogs and broadcast conversations in English and Mandarin.
Using Prosodic and Spectral Features in Detecting Depression in Elderly Males
In this study, we focus on speech features that can identify the speaker’s emotional health, i.e., whether the speaker is depressed or not.
Detection of agreement and disagreement in broadcast conversations
We present Conditional Random Fields based approaches for detecting agreement/disagreement between speakers in English broadcast conversation shows.
Automatic identification of speaker role and agreement/disagreement in broadcast conversation
We present supervised approaches for detecting speaker roles and agreement/disagreement between speakers in broadcast conversation shows in three languages: English, Arabic, and Mandarin.
Acoustic data sharing for Afghan and Persian languages
In this work, we compare several known approaches for multilingual acoustic modeling for three languages, Dari, Farsi and Pashto,
Improving language recognition with multilingual phone recognition and speaker adaptation transforms
We investigate a variety of methods for improving language recognition accuracy based on techniques in speech recognition, and in some cases borrowed from speaker recognition. First, we look at the question of language-dependent versus language-independent phone recognition for phonotactic (PRLM) language recognizers, and find that language-independent recognizers give superior performance in both PRLM and PPRLM systems. We then investigate ways to use speaker adaptation (MLLR) transforms as a complementary feature for language characterization. Borrowing from speech recognition, we find that both PRLM and MLLR systems can be improved with the inclusion of discriminatively trained multilayer perceptrons as front ends. Finally, we compare language models to support vector machines as a modeling approach for phonotactic language recognition, and find them to be potentially superior, and surprisingly complementary.