Enhanced end-of-turn detection for speech to a personal assistant


H. Arsikere, E. Shriberg and U. Ozertem  “Enhanced End-of-Turn Detection for Speech to a Personal Assistant.” 2015 AAAI Spring Symposium Series. 2015.


Speech to personal assistants (e.g., reminders, calendar entries, messaging, voice search) is often uttered under cognitive load, causing nonfinal pausing that can result in premature recognition cut-offs. Prior research suggests that prepausal features can discriminate final from nonfinal pauses, but it does not reveal how speakers would behave if given longer to pause. To this end, we collected and compared two elicitation corpora differing in naturalness and task complexity. The Template Corpus (4409 nonfinal pauses) uses keyword-based prompts; the Freeform Corpus (8061 nonfinal pauses) elicits open-ended speech. While nonfinal pauses are longer and twice as frequent in the Freeform data, prepausal feature modelling is roughly equally effective in both corpora. At a response latency of 100 ms, prepausal features modelled by an SVM reduced cut-off rates from 100% to 20% for both corpora. Results have implications for enhancing turn-taking efficiency and naturalness in personal-assistant technology.

Read more from SRI