December 1, 1997

Automatic Detection of Discourse Structure for Speech Recognition and Understanding

Citation

D. Jurafsky et al., “Automatic detection of discourse structure for speech recognition and understanding,” 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997, pp. 88-95, doi: 10.1109/ASRU.1997.658992.

Abstract

We describe a new approach for statistical modeling and detection of discourse structure for natural conversational speech. Our model is based on 42 `Dialog Acts’ (DAs), (question, answer, backchannel, agreement, disagreement, apology, etc). We labeled 1155 conversations from the Switchboard (SWBD) database (Godfrey et al. 1992) of human-to-human telephone conversations with these 42 types and trained a Dialog Act detector based on three distinct knowledge sources: sequences of words which characterize a dialog act, prosodic features which characterize a dialog act, and a statistical Discourse Grammar. Our combined detector, although still in preliminary stages, already achieves a 65 percent Dialog Act detection rate based on acoustic waveforms, and a 72 percent accuracy based on word transcripts. Using this detector to switch among the 42 Dialog-Act-Specific trigram LMs also gave us an encouraging but not statistically significant reduction in SWBD word error.

↓ Download

↓ View online

Automatic Detection of Discourse Structure for Speech Recognition and Understanding

Abstract

Read more from SRI

Researchers develop materials that can take on the toughest conditions

Podcast: Re-imagining instructional quality and coaching

SRI’s Genome Explorer: Enhanced genome browser delivers better user experience