D. Jurafsky et al., “Automatic detection of discourse structure for speech recognition and understanding,” 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997, pp. 88-95, doi: 10.1109/ASRU.1997.658992.
We describe a new approach for statistical modeling and detection of discourse structure for natural conversational speech. Our model is based on 42 `Dialog Acts’ (DAs), (question, answer, backchannel, agreement, disagreement, apology, etc). We labeled 1155 conversations from the Switchboard (SWBD) database (Godfrey et al. 1992) of human-to-human telephone conversations with these 42 types and trained a Dialog Act detector based on three distinct knowledge sources: sequences of words which characterize a dialog act, prosodic features which characterize a dialog act, and a statistical Discourse Grammar. Our combined detector, although still in preliminary stages, already achieves a 65 percent Dialog Act detection rate based on acoustic waveforms, and a 72 percent accuracy based on word transcripts. Using this detector to switch among the 42 Dialog-Act-Specific trigram LMs also gave us an encouraging but not statistically significant reduction in SWBD word error.