SRI pushes the frontiers of computer vision to enable machines to see, understand and remember. Our end-to-end video and processing technologies make computer vision work in real-world applications, including robotics, vehicles, and people-worn systems.
The Center for Vision Technologies
SRI’s Center for Vision Technologies creates fundamental computer vision solutions based on leading-edge technologies, leveraging a wide variety of sensors and computation platforms.
Recent developments from the Center for Vision Technologies
The Center for Vision Technologies (CVT) develops and applies its algorithms and hardware to be able to see better with computational sensing, understand the scene using 2D/3D reasoning, understand and interact with humans using interactive intelligent systems, support teamwork through collaborative autonomy, mine big data with multi-modal data analytics and continuously learn through machine learning. CVT does both early-stage research and developmental work to build prototype solutions that impact government and commercial markets, including defense, healthcare, automotive and more. Numerous companies have been spun-off from CVT technology successes.
Recent developments from CVT include core machine learning algorithms in various areas such as learning with fewer labels, predictive machine learning for handling surprise and novel situations, lifelong learning, reinforcement learning using semantics and robust/explainable artificial intelligence.
SmartVision imaging systems use semantic processing/multi-modal sensing and embedded low-power processing for machine learning to automatically adapt and capture good quality imagery and information streams in challenging and degraded visual environments.
Multi-sensor navigation systems are used for wide-area augmented reality and provide GPS-denied localization for humans and mobile platforms operating in air, ground, naval, and subterranean environments. CVT has extended its navigation and 3D modeling work to include semantic reasoning, making it more robust to changes in the scene. Collaborative autonomy systems can use semantic reasoning, enabling platforms to efficiently exchange dynamic scene information with each other and allow a single user to control many robotic platforms using high-level directives.
Human behavior understanding is used to assess human state and emotions (e.g., in the Toyota 2020 concept car) and to build full-body, multi-modal (speech, gesture, gaze, etc.) human-computer interaction systems.
Core technologies and applications
SRI’s Center for Vision Technologies (CVT) tackles data acquisition and exploitation challenges across a broad range of applications and industries. Our researchers work in cross-disciplinary teams, including robotics and artificial intelligence, to advance, combine and customize technologies in areas including computational sensing, 2D-3D reasoning, collaborative autonomy, human behavior modeling, vision analytics, and machine learning.
Computational sensing and low-power processing
In this paper we present Hyper-Dimensional Reconfigurable Analytics at the Tactical Edge using low-SWaP embedded hardware that can perform real-time reconfiguration at the edge leveraging non-MAC deep neural nets (DNN) combined with hyperdimensional (HD) computing accelerators.
2d 3d reasoning and augmented reality
Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation.
Collaborative human-robot autonomy
We propose a method to train an autonomous agent to learn to accumulate a 3D scene graph representation of its environment by simultaneously learning to navigate through said environment.
Human behavior modeling
Automated Student Group Collaboration Assessment and Recommendation System Using Individual Role and Behavioral Cues
Abstract Early development of specific skills can help students succeed in fields like Science, Technology, Engineering and Mathematics. Different education standards consider “Collaboration” as a required and necessary skill that can help students excel in these fields. Instruction-based methods is the most common approach, adopted by teachers to instill collaborative skills. However, it is difficult […]
Multi-modal data analytics
This paper presents a new 3D time-space detector for small ships in single look complex (SLC) synthetic aperture radar (SAR) imagery, optimized for small targets around 5-15 m long that are unfocused due to target motion induced by ocean surface waves.
In this paper, we show that leveraging NAS for incremental learning results in strong performance gains for classification tasks.
Striking the Right Balance: Recall Loss for Semantic Segmentation
Graph Mapper: Efficient Visual Navigation by Scene Graph Generation
SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments
Head-Worn Markerless Augmented Reality Inside a Moving Vehicle
Hyper-Dimensional Analytics of Video Action at the Tactical Edge
Long-Range Augmented Reality with Dynamic Occlusion Rendering
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
Semantically-Aware Attentive Neural Embeddings for 2D Long-Term Visual Localization
Multi-Sensor Fusion for Motion Estimation in Visually-Degraded Environments
Augmented Reality Driving Using Semantic Geo-Registration
“SRI offers a unique blend of academia and industry, which offers an opportunity to work on problems that involve research and are practically relevant.”
Computer Scientist, Information & Computing Sciences