The Center for Vision Technologies
SRI’s Center for Vision Technologies creates fundamental computer vision solutions based on leading-edge technologies, leveraging a wide variety of sensors and computation platforms.
Recent developments from the Center for Vision Technologies
The Center for Vision Technologies (CVT) develops and applies its algorithms and hardware to be able to see better with computational sensing, understand the scene using 2D/3D reasoning, understand and interact with humans using interactive intelligent systems, support teamwork through collaborative autonomy, mine big data with multi-modal data analytics and continuously learn through machine learning. CVT does both early-stage research and developmental work to build prototype solutions that impact government and commercial markets, including defense, healthcare, automotive and more. Numerous companies have been spun-off from CVT technology successes.
Recent developments from CVT include core machine learning algorithms in various areas such as learning with fewer labels, predictive machine learning for handling surprise and novel situations, lifelong learning, reinforcement learning using semantics and robust/explainable artificial intelligence.
SmartVision imaging systems use semantic processing/multi-modal sensing and embedded low-power processing for machine learning to automatically adapt and capture good quality imagery and information streams in challenging and degraded visual environments.
Multi-sensor navigation systems are used for wide-area augmented reality and provide GPS-denied localization for humans and mobile platforms operating in air, ground, naval, and subterranean environments. CVT has extended its navigation and 3D modeling work to include semantic reasoning, making it more robust to changes in the scene. Collaborative autonomy systems can use semantic reasoning, enabling platforms to efficiently exchange dynamic scene information with each other and allow a single user to control many robotic platforms using high-level directives.
Human behavior understanding is used to assess human state and emotions (e.g., in the Toyota 2020 concept car) and to build full-body, multi-modal (speech, gesture, gaze, etc.) human-computer interaction systems.
Multi-modal data analytics systems are used for fine-grain object recognition, activity, and change detection and search in cluttered environments.
Our work
more +-
New welding helmet from SRI delivers crisp high definition, 3D, real-time views
SRI has licensed the technology to Kawada Technologies Inc. to commercialize the helmet around the world.
-
Karan Sikka is helping create tools to improve media discourse
Sikka brings expertise in deep learning and multi-model learning to improve how social media operates, including how we use and communicate on it.
-
SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments
SayNav is a novel planning framework, that leverages human knowledge from Large Language Models (LLMs) to dynamically generate step-by-step instructions for autonomous agents to complicated navigation tasks in unknown large-scale environments.
Core technologies and applications
SRI’s Center for Vision Technologies (CVT) tackles data acquisition and exploitation challenges across a broad range of applications and industries. Our researchers work in cross-disciplinary teams, including robotics and artificial intelligence, to advance, combine and customize technologies in areas including computational sensing, 2D-3D reasoning, collaborative autonomy, human behavior modeling, vision analytics, and machine learning.
Recent publications by research area
more +-
Computational sensing and low-power processing
Low-Power In-Pixel Computing with Current-Modulated Switched Capacitors
We present a scalable in-pixel processing architecture that can reduce the data throughput by 10X and consume less than 30 mW per megapixel at the imager frontend.
-
2d 3d reasoning and augmented reality
Vision based Navigation using Cross-View Geo-registration for Outdoor Augmented Reality and Navigation Applications
In this work, we present a new vision-based cross-view geo-localization solution matching camera images to a 2D satellite/ overhead reference image database. We present solutions for both coarse search for cold start and fine alignment for continuous refinement.
-
Collaborative human-robot autonomy
Ranging-Aided Ground Robot Navigation Using UWB Nodes at Unknown Locations
This paper describes a new ranging-aided navigation approach that does not require the locations of ranging radios.
-
Human behavior modeling
Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction
We present a series of two studies conducted to understand user’s affective states during voice-based human-machine interactions.
-
Multi-modal data analytics
Time-Space Processing for Small Ship Detection in SAR
This paper presents a new 3D time-space detector for small ships in single look complex (SLC) synthetic aperture radar (SAR) imagery, optimized for small targets around 5-15 m long that are unfocused due to target motion induced by ocean surface waves.
-
Machine learning
C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation
We propose C-SFDA, a curriculum learning aided self-training framework for SFDA that adapts efficiently and reliably to changes across domains based on selective pseudo-labeling. Specifically, we employ a curriculum learning scheme to promote learning from a restricted amount of pseudo labels selected based on their reliabilities.
Publications
more +-
C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation
We propose C-SFDA, a curriculum learning aided self-training framework for SFDA that adapts efficiently and reliably to changes across domains based on selective pseudo-labeling. Specifically, we employ a curriculum learning scheme to promote learning from a restricted amount of pseudo labels selected based on their reliabilities.
-
Night-Time GPS-Denied Navigation and Situational Understanding Using Vision-Enhanced Low-Light Imager
In this presentation, we describe and demonstrate a novel vision-enhanced low-light imager system to provide GPS-denied navigation and ML-based visual scene understanding capabilities for both day and night operations.
-
Vision based Navigation using Cross-View Geo-registration for Outdoor Augmented Reality and Navigation Applications
In this work, we present a new vision-based cross-view geo-localization solution matching camera images to a 2D satellite/ overhead reference image database. We present solutions for both coarse search for cold start and fine alignment for continuous refinement.
-
Cross-View Visual Geo-Localization for Outdoor Augmented Reality
We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database. Recently, neural network-based methods have shown state-of-the-art performance in cross-view matching.
-
On auxiliary latitudes
The auxiliary latitudes are essential tools in cartography. This paper summarizes methods for converting between them with an emphasis on providing full double-precision accuracy.
-
Autonomous Docking Using Learning-Based Scene Segmentation in Underground Mine Environments
This paper describes a vision-based autonomous docking solution that moves a coalmine shuttle car to the continuous miner in GPS-denied underground environments.
-
Ranging-Aided Ground Robot Navigation Using UWB Nodes at Unknown Locations
This paper describes a new ranging-aided navigation approach that does not require the locations of ranging radios.
-
Low-Power In-Pixel Computing with Current-Modulated Switched Capacitors
We present a scalable in-pixel processing architecture that can reduce the data throughput by 10X and consume less than 30 mW per megapixel at the imager frontend.
-
Unpacking Large Language Models with Conceptual Consistency
We propose conceptual consistency to measure a LLM’s understanding of relevant concepts. This novel metric measures how well a model can be characterized by finding out how consistent its responses to queries about conceptually relevant background knowledge are.
Our team

“SRI offers a unique blend of academia and industry, which offers an opportunity to work on problems that involve research and are practically relevant.”
Karan Sikka
Computer Scientist, Information & Computing Sciences