75 Years of Innovation: Computer vision

Blurring the lines between humans and computers.

The 75 Years of Innovation series highlights the groundbreaking innovations spanning from SRI’s founding in 1946 to today. Each week, SRI will release an innovation, leading up to its 75th anniversary in November 2021.

SRI continues to innovate at a fast pace, taking full advantage of machine learning to incorporate high-level learning-based semantic information (recognition of objects and scene layouts) into dynamic maps and scene graphs.”
— SRI International

SRI International has contributed to the blurring of the boundaries between humans and computers. To do so, SRI has created cutting-edge answers to facilitate the human-computer interface (HCI). From the humble computer mouse to augmented reality and computer vision, SRI has made computing more human, more compelling, and widened the scope of its use for the good of humanity. Here are a few of the innovations that gave computers the vision to enhance human interactions.

(Computer) visions of the past

To see the breadth of involvement of SRI in modern computing and human interaction with computer systems, we must look to the past:

The Computer Mouse

The computer mouse, so-called because it resembled a mouse, placed the human operator at the center of computer interaction. Being able to point and click is easier said than done when interacting with a digital interface. In the 1960s, SRI scientist Douglas Engelbart published a seminal paper, “Augmenting Human Intellect: A Conceptual Framework,” that formed the framework for developing the computer mouse later in the decade.

Eye Tracker

Tracking eye movement was a joint venture between SRI and NASA in the 1960s that helped improve flight safety by alleviating the visual blurring experienced by pilots flying at high speed and low altitudes. However, this innovation resulted in many more applications in areas as diverse as medicine and web design.

Failure analysis

SRI International developed FRacture Surface Topography Analysis (FRASTA) in the 1980s. The system offers a mechanism to reconstruct a fracture event, providing details of its history, trajectory, and even possible causes, to help prevent failure in materials such as metal.

Visual advertising insertion

The first-down yellow line, synonymous with U.S. field sport, made augmented reality real for sports fans in 1998. This magic line was based on pattern recognition algorithms that fit the exacting needs of live-action TV.

Ground- and foliage-penetrating radar

Being able to “see the wood for the trees” was the goal of SRI’s technology known as Foliage-Penetrating Radar technologies (FOLPEN). FOLPEN, developed in the 1990s, allowed military operatives to see objects hidden in deep foliage. SRI used the well-understood natural phenomenon of permittivity and applied it to help protect military operations and save lives. From this initial innovation, SRI then built further technologies that used FOLPEN, namely the FOLPEN Radar SAR aircraft.

Iris Biometrics

In 2006, Sarnoff Corporation (now part of SRI) developed Iris on the Move (IoM), a biometric solution that used iris recognition in a manner that made the technique much more commercially viable.

Video Mosaicing/ Video Brush

SRI developed real time video mosaicking technologies and used it for various government, commercial and consumer applications (Video Brush). It received R&D 100 award for its video mosaicking technology in 1998. SRI’s Video Brush software was licensed to Google for use in Android smart phones in 2011.

Terrasight ®

SRI has brought some ground-breaking innovations to the world using its vast knowledge of broadcast cameras and imaging. In the late 2000s, SRI released the TerraSight ® software

as an innovation in air and ground video surveillance. The Terrasight ® software, used by the U.S. military, provides advanced image processing to generate a composite common operating picture (COP) in real-time from a wide variety of input variables.

Augmented reality binoculars

SRI began research into using augmented reality (AR) within a set of binoculars back in 2010. Originally viewed as a military application, applications soon expanded to other civilian use cases. One of the biggest hurdles that SRI had to overcome was managing continuous hand and body movement to create a precise jitter-free and drift-free insertion of AR images into the viewer’s eye.

Computer Vision: into the future

SRI’s major innovations in computer vision and related areas like augmented reality have led to further solutions.

Pyramid Processing

Over the past 30 years, SRI has designed, developed and implemented many real-time vision systems, enabling a range of applications for government and commercial use. Solutions and systems in this area are based on SRI’s Pyramid Processing Architecture (PPA). PPA provides a framework to support the invention of various low-power, high-performance vision processing chips for US government vehicles, warfighters and robotic systems.

Image enhancement and multi-spectral image fusion are key goals for SRI vision systems. Innovations such as the Acadia ®Vision Processor provide real-time video stabilization, mosaicking, video fusion, motion-stereo and video enhancement. This capability enables applications for both military and civilians that would otherwise have been impossible for 24/7 operational needs within a degraded visual environment, e.g., those made less visible because of dust, snow or rain.

Skin monitoring

Even human skin can be enhanced using computer vision. Using novel image processing and artificial intelligence algorithms, SRI has developed world-leading image analysis to monitor the health of human skin from a mobile device. Using a smartphone app, SRI can precisely illuminate, image and sense skin moisture and other characteristics to determine the skin’s health. This capability has been used commercially to support recommendations for cosmetics. SRI is currently evaluating the same technology for use in self-analysis of possible cancerous skin conditions.

Change Detection: Peek, breast cancer, IED detection

SRI has developed a change detection algorithm for a variety of applications, such as detecting cancerous tumors by comparing pre- and post-contrast MRI scans. Besides health applications, the change detection algorithm can be used to detect cars at intersections for traffic light control. Military applications include using algorithms to look at multiple passes of aerial video data to detect change signatures of buried roadside improvised explosive devices.

Navigation and mapping: CamSlam, GPS-denied navigation

SRI has developed multi-sensor navigation systems for wide-area augmented-reality robot navigation. SRI’s vision-based tracking has been used to provide GPS-denied localization for humans and mobile platforms operating in air, ground, naval, and subterranean environments.

Human behavior measurement

Having a deeper understanding of human behavior is vital to assess human emotional states. The Toyota 2020 concept car uses SRI’s emotional AI-based technology — Vision AI — to observe drivers and monitor their emotional and physical state.

Future visions of the Center for Vision Technologies

Projects continue to emerge from the wealth of computer vision technologies that SRI has developed:

CVT is the center for innovation in computer vision. CVT creates fundamental computer vision solutions based on leading-edge technologies. The team at CVT works across several key areas in computer vision and leverages a wide variety of sensors and computation platforms. CVT develops and applies its algorithms and hardware for:

SRI’s Center for Vision Technologies lab (CVT)

· Computational Sensing to develop advanced vision

· 2D/3D Reasoning to understand a scene

· Interactive Intelligent Systems to understand and interact with humans

· Collaborative Autonomy to develop team frameworks

· Multi-modal Data Analytics to mine big data

· Machine Learning to continuously learn

CVT carries out both early-stage research and developmental work to build prototype solutions for government and commercial markets, including defense, healthcare, automotive, and others. Like many other successful areas of SRI, CVT technology successes give birth to spin-off ventures to further commercialize SRI discoveries and inventions.

Machine Learning (ML) at the computing edge

SRI is working on novel solutions that allow machine learning applications to operate on edge devices by dynamically learning and reconfiguring their algorithms without a connection to the cloud. This is highly innovative and will allow further advances in wearable machine learning devices due to very low power needs.

Semantic navigation

SRI has developed a strong portfolio of 2D-3D reasoning. This includes navigation and mapping using 2D and 3D sensors such as video and LIDAR. In recent years, machine learning has significantly improved the semantic understanding of 2D and 3D data. Incorporating semantics enables a new class of algorithms for navigation, Simultaneous Localization and Mapping (SLAM), geo-registration, wide-area search, augmented reality, data compression, 3D modeling, and surveillance.

Long-range, wide-area, augmented reality: SRI has combined the localization and geo-registration methods described above with low-powered, compact, ruggedized hardware to create wide-area augmented reality applications. SRI has extended its augmented reality capabilities to work over multiple square kilometers while in GPS-challenged environments. This also includes long-range 3D occlusion-reasoning for augmented reality applications.

Enhancing human-machine synergy: The world is rapidly integrating autonomy into vehicles, drones, and other robotic platforms, resulting in an increased demand for autonomous platforms that cooperate with one another and with humans to achieve more complex tasks. SRI is developing core methods for different multi-machine, multi-human systems across numerous DARPA, commercial, and IRAD programs.

Multi-robot and multi-human collaborative planning: Effective and efficient planning for teams of robots and humans towards a desired goal requires the ability to adapt and respond smoothly and collaboratively to dynamic situations. SRI created novel human-machine collaborative planning capabilities by extracting and using semantic information to enable advanced interaction and robot autonomy.

Vision and language navigation (VLN): VLN requires an autonomous robot to follow natural language instructions in unseen environments. Existing learning-based methods struggle with this as they focus mostly on raw visual observation and lack the semantic reasoning capabilities that are crucial in generalizing to new environments. To overcome these limitations, SRI creates a temporal memory by building a dynamic semantic map and performs cross-modal grounding to align map and language modalities, enabling more effective VLN results.

3D scene classification and 3D compression

SRI has developed extremely robust 3D scene classification methods over the last decade. These methods have now transitioned to Department of Defense (DoD) programs of record and commercially available software packages. Working with the Office of Naval Research (ONR) and the U.S. Army and the National Geospatial-Intelligence Agency (NGA), SRI is now developing the next-generation 3D scene-understanding methods using machine learning. These methods incorporate top-down and bottom-up contextual reasoning and human-specified geographic rules within the learning process. The robust scene-understanding methods have enabled SRI to revisit 3D compression methods widely available today. By incorporating knowledge about different feature classes (such as ground, building and foliage), SRI achieves significantly better bit rates in the compression of 3D data.

Human behavior understanding

SRI’s human behaviour modelling centres around three aims: 1) human behaviour monitoring, 2) human interaction/communication and 3) facilitation. SRI has developed a layered approach to human behaviour analytics that enabled development of the core technologies that address all three aims.

SRI has developed systems for driver behaviour analytics under a project with Toyota Motor Corporation in which we developed techniques for assessing driver emotional states as well as drowsiness. These were enabled by gaze tracking, facial expression extraction, and blink rate extraction. Note that the automotive environments possess formidable lighting and pose variation challenges.

SRI has been working towards a solution to monitor collaboration in classrooms by further developing our layered behaviour analytics. SRI has built on the low-level behaviours captured by MIBA to build more levels starting from individual behaviours such as problem solving, group dynamics as well as role-playing, to arrive at the level of overall assessment of the group’s collaborative state. Such a layered architecture enables fine-grained feedback to the teacher and their students to improve collaboration. This work has been featured in the National Science Foundation (NSF) showcase.

Multi-media Analytics

SRI has developed the Computer Vision AI Search Tool (CVAST) for rapidly building searchable-image and user-annotation AI training databases. The CVAST tool supports ingestion and rapid object clustering and annotation of common scene object features. A flexible image/attribute database allows users to search for related features within a vast collection of image sets.

Under the DARPA Social Media in Strategic Communication (SMISC), Computational Simulation of Online Social Behavior (SocSim) (SBIR M3I system), ONR CEROSS and AFRL Multimedia-Enhanced Social Media Analytics (MESA)programs, SRI has developed social media content analytics for seamless multi-way cross-platform retrieval between images, videos, text, and users using multimodal embedding of users and content in the same geometric space. Furthermore, SRI has developed a system called MatchStax that can detect the intent behind social media postings. Our work provides a framework for tracking the propagation of influence in social media. SRI’s MatchStax system has also been licensed to an SRI venture Vitrina.

Machine Learning

SRI has a rich history of R&D in machine learning, including enabling computer image sensors to sense, learn, and adapt to capture actionable information. SRI’s recent work has focused on deep learning and reinforcement learning for several applications, including Explainable Artificial Intelligence (XAI), Learning with Less Labels (LwLL), the Science of Artificial Intelligence and Learning for Open World Novelty (SAIL ON), Competency Aware Machine Learning (CAML), Lifelong Learning, Creative Artificial Intelligence, Approximate Computing, and Robust Artificial Intelligence.

SRI’s focus is on extending the state of the art in machine learning beyond supervised learning with large training datasets as well as human-intelligible explanations of machine learning techniques. In the DARPA XAI program, SRI developed new visual attention-based techniques to display where the machine learning-based question answering system has gone wrong and why. In the DARPA LwLL program, SRI developed techniques for learning from very little data. In the SAIL ON program, SRI developed predictive coding-based techniques for detection of novelties in new worlds. In the CAML program, SRI developed new calibration techniques that help accurately predict actual performance by a machine learning algorithm in a new domain.

Lifelong Learning

SRI is working on new novel machine learning (ML) algorithms that enable systems to be trained to perform tasks. The novelty in this design is that the algorithms continue to learn as they perform tasks. This feedback loop allows the ML-enabled systems to apply this knowledge to take on new and related tasks.

SRI is a visionary in more ways than one. One of the ways that these visions have come to fruition is the vast amount of foundational work carried out over the last seven decades. SRI continues to build on previous generations of innovations to build better, more efficient technologies that cross the human-computer boundaries.

Resources

The Computer Mouse: https://medium.com/dish/75-years-of-innovation-the-computer-mouse-fef5161ba45d

Visual Advertising Insertion (Augmented Reality): https://medium.com/dish/75-years-of-innovation-virtual-advertising-insertion-augmented-reality-363cc5a291b0

Terrasight ®software: https://medium.com/dish/75-years-of-innovation-terrasight-software-e700519bc849

Ground-and-Foliage Penetrating Radar: https://medium.com/dish/75-years-of-innovation-foliage-penetrating-radar-technologies-folpen-39e31e101a57

Failure Analysis (FRASTA (FRActure Surface Topography Analysis): https://medium.com/dish/75-years-of-innovation-frasta-fracture-surface-topography-analysis-b4c6dc9ae4ab

Eye Tracker: https://medium.com/dish/75-years-of-innovation-eyetracker-d28b32608430

AR Binoculars: https://medium.com/dish/75-years-of-innovation-augmented-reality-binoculars-a341086029ee

Iris Biometrics: https://medium.com/dish/75-years-of-innovation-iris-recognition-201f7bacde61

GPS Denied Navigation https://medium.com/dish/75-years-of-innovation-gps-denied-navigation-1c70d35500cb

Van der Wal, G.S., Burt, P.J. A VLSI pyramid chip for multiresolution image analysis. Int J Comput Vision 8, 177–189 (1992). https://doi.org/10.1007/BF00055150

The Acadia ®Vision Processor: https://www.researchgate.net/publication/2438269_The_Acadia_Vision_Processor

Emerging trends for cosmetic applications on mobile devices https://www.teknoscienze.com/Contents/Riviste/Sfogliatore/HPC2_2018/59/#zoom=z

SRI International Debuts “Emotional AI” Vision Technology to Advance the Driving Experience: https://www.sri.com/press-release/sri-international-debuts-emotional-ai-vision-technology-to-advance-the-driving-experience/