Speech: A Privileged Modality


Julia, L. E., & Cheyer, A. (1997). Speech: a privileged modality. In EUROSPEECH.


Ever since the publication of Bolt’s ground-breaking “Put-That There” paper, providing multiple modalities as a means of easing the interaction between humans and computers has been a desirable attribute of user interface design. In Bolt’s early approach, the style of modality combination required the user to conform to a rigid order when entering spoken and gestural commands. In the early 1990s, the idea of synergistic multimodal combination began to emerge, although actual implemented systems (generally using keyboard and mouse) remained far from being synergistic. Next-generation approaches involved time-stamped events to reason about the fusion of multimodal input arriving in a given time window, but these systems were hindered by time-consuming matching algorithms. To overcome this limitation, we proposed [JULIA 93] a truly synergistic application and a distributed architecture for flexible interaction that reduces the need for explicit time stamping. Our slot-based approaches command directed, making it suitable for applications using speech as a primary modality. In this article, we use our interaction model to demonstrate that during multimodal fusion, speech should be a privileged modality, driving the interpretation of a query, and that in certain cases, speech has even more power to override and modify the combination of other modalities than previously believed. 

Read more from SRI