The “Mother of All Demos” on December 9, 1968 was a truly seminal event. Doug Engelbart and his SRI team introduced to the world forms of human-computer interaction that are now ubiquitous: a screen divided into windows, typing integrated with a pointing device, hypertext, shared-screen teleconferencing. While these innovations have had enormous impact, they are just the beginning of Engelbart’s vision: computers as tools for augmenting human intelligence.
As the concept of computers has expanded to encompass smartphones and the Internet of Things, the opportunities for augmentation have increased dramatically. Here are a few examples:
Getting things done through conversation
Engelbart interacted with his system through typing and pointing. Starting with Siri, which also began at SRI, we have become accustomed to interacting with computers via speaking or texting. Now the world is full of devices that we can talk to, but current virtual assistants can augment only a few of our simple problem-solving activities. They can help us find information, make calendar appointments, play music, and a few other tasks. I am not saying that the technology of understanding natural language intent and finding the right information is simple – far from it. But the *kind* of problem-solving is very limited: almost entirely question-and-answer.
We need to go beyond this even for everyday problem-solving: buying something, handling finances, getting healthcare advice. We need a workflow, a connected set of actions that leads to the desired outcome. Current computer systems can augment us, but only if we follow their pre-defined workflow, usually quite rigidly. Getting clarifications, finding alternatives, making changes – anything that deviates from the pre-defined flow – can be difficult or impossible.
In our daily life we mostly rely on human-human augmentation to help solve our problems. We engage other humans through the medium of conversation, a special form of interaction that is full of mechanisms for flexibly handling workflows. Sometimes we answer a question with a question, introducing a new sub-workflow, but rarely disturbing the overall flow. We have vocal techniques (and a few textual ones) for indicating that we want to keep speaking or that we are expecting a response. We continually express our engagement, disengagement, agreement, disagreement, confusion, skepticism.
To get to the next level of augmentation, computers need to be able to better handle conversation. A few systems can do that in specialized areas – a great example is SRI spinoff Kasisto and its conversational AI for banking. But it’s just the beginning of what is likely to become a fundamental mechanism for computer-human augmentation.
Integrating machine learning and human knowledge
The latest generation of data-driven machine learning is achieving very impressive results. Machine learning is at the core of powerful human augmentation systems ranging from machine translation to autonomous driving. The pattern-finding ability of these systems is dramatically improving core augmentation technologies like speech and vision. Applied to massive data, it is providing insights that can improve healthcare, logistics, and many other areas.
But this approach has some critical limitations. Since it is based on sophisticated algorithms operating over complex mathematical representations, its operation is difficult to understand, explain, or predict in human terms. This makes it hard to guide – and hard to trust.
A truly Engelbartian approach to addressing these limitations is to create a synergy of human knowledge and machine learning, using the strength of each to form a powerful augmentation system. Several research groups including SRI are exploring this synergy, encoding symbolic knowledge in neural net form to guide system behavior.
Augmentation via the environment
The physical environment around us is becoming more and more computerized – our cars, homes, stores, farms – based on the proliferation of low cost, low power sensing and computing devices. These systems are already augmenting us, e.g., lane-keeping, home security, product tracking, pest prediction. But the Internet of Things enables much greater augmentation of human-computer and human-human collaboration.
Almost all current computer-human interfaces leverage only a tiny portion of our communication capability. Our gestures, posture, expressions, between-the-words speech (pauses, changes in intonation) are ignored. This can be acceptable when we are communicating through special purpose devices like smartphones and laptops, because the communication patterns (and limitations) are well established. But in open environments interaction through computers is not well defined.
The act of walking into a room or getting into a car has become a human-computer interaction. Soon looking confused in front of the stove, physical reaction to a visitor, and sounding tired will be human-computer interactions. Computerized spaces also create new opportunities for computer-mediated human-human interaction. When more than one person occupies a space, they almost always interact with each other, usually through speech. Now the computer can join a conversation, automatically find pertinent information, and make suggestions – guided and controlled by the human participants, or maybe taking a more active role as the “room manager.”
Engelbart’s vision continues to excite – perhaps now more than ever as we expand the boundaries of computation and its role in our lives.