Center for vision technologies

Fundamental computer vision solutions based on leading-edge technologies, leveraging a variety of sensors and computation platforms

The Center for Vision Technologies does both early-stage research and developmental work to build prototype solutions that impact government and commercial markets, including defense, healthcare, automotive and more. Numerous companies have been spun-off from CVT technology successes.

The Center for Vision Technologies (CVT) develops and applies its algorithms and hardware to be able to see better with computational sensing, understand the scene using 2D/3D reasoning, understand and interact with humans using interactive intelligent systems, support teamwork through collaborative autonomy, mine big data with multi-modal data analytics and continuously learn through machine learning.

Recent developments from CVT include core machine learning algorithms in various areas such as learning with fewer labels, predictive machine learning for handling surprise and novel situations, lifelong learning, reinforcement learning using semantics and robust/explainable artificial intelligence.

SmartVision imaging systems use semantic processing/multi-modal sensing and embedded low-power processing for machine learning to automatically adapt and capture good quality imagery and information streams in challenging and degraded visual environments.

Multi-sensor navigation systems are used for wide-area augmented reality and provide GPS-denied localization for humans and mobile platforms operating in air, ground, naval, and subterranean environments. CVT has extended its navigation and 3D modeling work to include semantic reasoning, making it more robust to changes in the scene. Collaborative autonomy systems can use semantic reasoning, enabling platforms to efficiently exchange dynamic scene information with each other and allow a single user to control many robotic platforms using high-level directives.

Human behavior understanding is used to assess human state and emotions (e.g., in the Toyota 2020 concept car) and to build full-body, multi-modal (speech, gesture, gaze, etc.) human-computer interaction systems.

Multi-modal data analytics systems are used for fine-grain object recognition, activity, and change detection and search in cluttered environments.

Our work

May 12, 2026

SRI’s Industrial Vision technology for facility management

SRI’s AI-powered vision system serves as a digital bridge, transforming today’s legacy solution into modern enterprise intelligence.
May 11, 2026

SRI addresses the rural healthcare gap with AI innovation

SRI researchers are building an AI-driven system designed to upskill clinicians and transform rural healthcare delivery.
April 27, 2026

Designing better human-machine teams

SRI is discovering how AI can organize humans and machines into complex, collaborative, high-functioning units.

Core technologies and applications

SRI’s Center for Vision Technologies (CVT) tackles data acquisition and exploitation challenges across a broad range of applications and industries. Our researchers work in cross-disciplinary teams, including robotics and artificial intelligence, to advance, combine and customize technologies in areas including computational sensing, 2D-3D reasoning, collaborative autonomy, human behavior modeling, vision analytics, and machine learning.

Publications by research area

October 14, 2022

Low-Power In-Pixel Computing with Current-Modulated Switched Capacitors

We present a scalable in-pixel processing architecture that can reduce the data throughput by 10X and consume less than 30 mW per megapixel at the imager frontend.

2d 3d reasoning and augmented reality

June 7, 2026

GeoSURGE: Geo-localization using semantic fusion with hierarchy of geographic embeddings

Abstract Worldwide visual geo-localization aims to determine the geographic location of an image anywhere on Earth using only its visual content. Despite recent progress, learning expressive representations of geographic space…

Collaborative human-robot autonomy

June 20, 2026

Concepts for autonomous navigation of underground mine face haulage equipment using depth cameras

Abstract This paper describes the development of a software pipeline for autonomous navigation of underground mine haulage equipment. The approach uses depth-sensing cameras mounted on the haulage vehicle to capture visual and depth information. Semantic segmentation is used to identify various objects in the scene, e.g., roof, ribs, floor, people, and mining equipment, and the…

Human behavior modeling

July 15, 2022

Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction

We present a series of two studies conducted to understand user’s affective states during voice-based human-machine interactions.

Multi-modal data analytics

May 27, 2022

Time-Space Processing for Small Ship Detection in SAR

This paper presents a new 3D time-space detector for small ships in single look complex (SLC) synthetic aperture radar (SAR) imagery, optimized for small targets around 5-15 m long that…

Machine learning

March 16, 2026

DUDA: Distilled unsupervised domain adaptation for lightweight semantic segmentation

Abstract Unsupervised Domain Adaptation (UDA) is essential for enabling semantic segmentation in new domains without requiring costly pixel-wise annotations. State-of-the-art (SOTA) UDA methods primarily use self-training with architecturally identical teacher…

Publications

June 20, 2026

Han-Pang Chiu

Concepts for autonomous navigation of underground mine face haulage equipment using depth cameras

Abstract This paper describes the development of a software pipeline for autonomous navigation of underground mine haulage equipment. The approach uses depth-sensing cameras mounted on the haulage vehicle to capture…
June 20, 2026

Medical Procedure Tracking using Abductive Planning – A PARADIGM Shift

Abstract This paper presents PARACHUTE, a Prolog-based frame-work for abductive trace completion and projection for real-time medical procedure tracking and guidance, developed as part of the AI-powered AMIRA system in…
June 7, 2026

Han-Pang Chiu

GeoSURGE: Geo-localization using semantic fusion with hierarchy of geographic embeddings

Abstract Worldwide visual geo-localization aims to determine the geographic location of an image anywhere on Earth using only its visual content. Despite recent progress, learning expressive representations of geographic space…

Computer vision leadership

September 8, 2021

William Mark

Senior Technology Advisor, Commercialization
September 8, 2021

Rakesh “Teddy” Kumar

Vice President, Information and Computing Sciences and Director, Center for Vision Technologies

Our team

September 8, 2021

Rakesh “Teddy” Kumar

Vice President, Information and Computing Sciences and Director, Center for Vision Technologies
November 29, 2021

Supun Samarasekera

Senior Technical Director, Vision and Robotics Laboratory, Center for Vision Technologies
September 8, 2021

Michael Piacentino

Senior Technical Director, Vision Systems Laboratory, Center for Vision Technologies
September 8, 2021

Han-Pang Chiu

Technical Director, Vision and Robotics Laboratory, Center for Vision Technologies
September 8, 2021

Bogdan Matei

Technical Director, Vision and Robotics Laboratory, Center for Vision Technologies

Fundamental computer vision solutions based on leading-edge technologies, leveraging a variety of sensors and computation platforms

Our work

Core technologies and applications

Computational sensing and low power processing .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

2D-3D reasoning and augmented reality .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

Collaborative human-robot autonomy .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

Human behavior modeling .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

Multi-modal data analytics .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

Machine learning .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }