• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close

Computer vision

SRI pushes the frontiers of computer vision to enable machines to see, understand and remember. Our end-to-end video and processing technologies make computer vision work in the real world in applications including robotics, vehicles and people-worn systems.

CONTACT US

The Center for Vision Technologies

SRI’s Center for Vision Technologies creates fundamental computer vision solutions based on leading-edge technologies, leveraging a wide variety of sensors and computation platforms.

Recent developments from the Center for Vision Technologies 

The Center for Vision Technologies (CVT) develops and applies its algorithms and hardware to be able to see better with computational sensing, understand the scene using 2D/3D reasoning, understand and interact with humans using interactive intelligent systems, support teamwork through collaborative autonomy, mine big data with multi-modal data analytics and continuously learn through machine learning. CVT does both early-stage research and developmental work to build prototype solutions that impact government and commercial markets, including defense, healthcare, automotive and more. Numerous companies have been spun-off from CVT technology successes. 

Recent developments from CVT include core machine learning algorithms in various areas such as learning with fewer labels, predictive machine learning for handling surprise and novel situations, lifelong learning, reinforcement learning using semantics and robust/explainable artificial intelligence. 

SmartVision imaging systems use semantic processing/multi-modal sensing and embedded low-power processing for machine learning to automatically adapt and capture good quality imagery and information streams in challenging and degraded visual environments.

Multi-sensor navigation systems are used for wide-area augmented reality and provide GPS-denied localization for humans and mobile platforms operating in air, ground, naval, and subterranean environments. CVT has extended its navigation and 3D modeling work to include semantic reasoning, making it more robust to changes in the scene. Collaborative autonomy systems can use semantic reasoning, enabling platforms to efficiently exchange dynamic scene information with each other and allow a single user to control many robotic platforms using high-level directives.

Human behavior understanding is used to assess human state and emotions (e.g., in the Toyota 2020 concept car) and to build full-body, multi-modal (speech, gesture, gaze, etc.) human-computer interaction systems. 

Multi-modal data analytics systems are used for fine-grain object recognition, activity, and change detection and search in cluttered environments.

Our work

more +

  • SRI’s Center for Vision Technologies is working to make social media more civil
    December 12, 2022

    SRI’s Center for Vision Technologies is working to make social media more civil

    With funding from DARPA, researchers are building an AI technology designed to work alongside humans to promote more prosocial online behavior

  • Pedestrian Detection from Moving Unmanned Ground Vehicles
    May 20, 2022

    Pedestrian Detection from Moving Unmanned Ground Vehicles

    SRI’s vision-based systems enable safe operations of moving unmanned ground vehicles around stationary and moving people in urban/cluttered environments. Under the Navy Explosive Ordnance Disposal project, SRI has developed a real-time, fused-sensor system that significantly improves stationary and dynamic object detection, pedestrian classification, and tracking capabilities from a moving unmanned ground vehicle (UGV). The system […]

  • Vision and Language Navigation
    May 18, 2022

    Vision and Language Navigation

    SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments  SRI International has developed a new learning-based approach to enable the mobile robot to resemble human capabilities in semantic understanding. The robot can employ semantic scene structures to reason about the world and pay particular attention to relevant semantic landmarks to develop navigation strategies. […]

Core technologies and applications

SRI’s Center for Vision Technologies (CVT) tackles data acquisition and exploitation challenges across a broad range of applications and industries. Our researchers work in cross-disciplinary teams, including robotics and artificial intelligence, to advance, combine and customize technologies in areas including computational sensing, 2D-3D reasoning, collaborative autonomy, human behavior modeling, vision analytics, and machine learning. 

Computational sensing and low power processing

  • Smart vision
  • Multi-sensor fusion
  • Embedded, low-power processing

2D-3D reasoning and augmented reality

  • Semantic and GPS-denied navigation
  • Augmented reality
  • 3D scene classification and modeling
  • Surveillance

Collaborative human-robot autonomy

  • Human-robot collaboration testbeds
  • Collaboration across multiple robot platforms and humans

Human behavior modeling

  • Human state assessment
  • Assessment of collaboration
  • Communicating with computers
  • Intelligent AR based interactive systems

Multi-modal data analytics

  • Image and video search
  • Activity recognition
  • Fine-grain recognition
  • Social media analytics

Machine learning

  • Explainable AI
  • Lifelong learning
  • Approximate computing
  • Robust AI

Recent publications by research area

more +

  • Computational sensing and low-power processing

    Real-Time Hyper-Dimensional Reconfiguration at the Edge using Hardware Accelerators

    In this paper we present Hyper-Dimensional Reconfigurable Analytics at the Tactical Edge using low-SWaP embedded hardware that can perform real-time reconfiguration at the edge leveraging non-MAC deep neural nets (DNN) combined with hyperdimensional (HD) computing accelerators.

  • 2d 3d reasoning and augmented reality

    Striking the Right Balance: Recall Loss for Semantic Segmentation

    Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation.

  • Collaborative human-robot autonomy

    Graph Mapper: Efficient Visual Navigation by Scene Graph Generation

    We propose a method to train an autonomous agent to learn to accumulate a 3D scene graph representation of its environment by simultaneously learning to navigate through said environment.

  • Human behavior modeling

    Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction

    We present a series of two studies conducted to understand user’s affective states during voice-based human-machine interactions. Emphasis is placed on the cases of communication errors or failures.

  • Multi-modal data analytics

    Time-Space Processing for Small Ship Detection in SAR

    This paper presents a new 3D time-space detector for small ships in single look complex (SLC) synthetic aperture radar (SAR) imagery, optimized for small targets around 5-15 m long that are unfocused due to target motion induced by ocean surface waves.

  • Machine learning

    Sensor Control for Information Gain in Dynamic, Sparse and Partially Observed Environments

    We present an approach for autonomous sensor control for information gathering under partially observable, dynamic and sparsely sampled environments.

Publications

more +

  • Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction

    July 15, 2022

    We present a series of two studies conducted to understand user’s affective states during voice-based human-machine interactions. Emphasis is placed on the cases of communication errors or failures.

  • Incremental Learning with Differentiable Architecture and Forgetting Search

    July 14, 2022

    In this paper, we show that leveraging NAS for incremental learning results in strong performance gains for classification tasks.

  • Saccade Mechanisms for Image Classification, Object Detection and Tracking

    June 19, 2022

    We examine how the saccade mechanism from biological vision can be used to make deep neural networks more efficient for classification and object detection problems. Our proposed approach is based on the ideas of attention-driven visual processing and saccades, miniature eye movements influenced by attention.

  • Conformal Prediction Intervals for Markov Decision Process Trajectories

    June 8, 2022

    This paper extends previous work on conformal prediction for functional data and conformalized quantile regression…

  • Time-Space Processing for Small Ship Detection in SAR

    May 27, 2022

    This paper presents a new 3D time-space detector for small ships in single look complex (SLC) synthetic aperture radar (SAR) imagery, optimized for small targets around 5-15 m long that are unfocused due to target motion induced by ocean surface waves.

  • Striking the Right Balance: Recall Loss for Semantic Segmentation

    May 18, 2022

    Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation.

  • Graph Mapper: Efficient Visual Navigation by Scene Graph Generation

    May 18, 2022

    We propose a method to train an autonomous agent to learn to accumulate a 3D scene graph representation of its environment by simultaneously learning to navigate through said environment.

  • SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments

    May 18, 2022

    This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments.

  • Broadening AI Ethics Narratives: An Indic Arts View

    April 8, 2022

    In this paper, we investigate uncovering the unique socio-cultural perspectives embedded in human-made art, which in turn, can be valuable in expanding the horizon of AI ethics.

Computer vision leadership

william-mark-bio-pic

William Mark

President, Information and Computing Sciences

rakesh-teddy-kumar-bio-pic

Rakesh “Teddy” Kumar

Vice President, Information and Computing Sciences Director, Center for Vision Technologies

Our team

rakesh-teddy-kumar-bio-pic

Rakesh “Teddy” Kumar

Vice President, Information and Computing Sciences Director, Center for Vision Technologies

supun-samarasekera

Supun Samarasekera

Senior Technical Director, Vision and Robotics Laboratory, Center for Vision Technologies

ajay divakaran bio pic

Ajay Divakaran

Senior Technical Director, Vision and Learning Laboratory, Center for Vision Technologies

michael-piacentino-bio-pic

Michael Piacentino

Senior Technical Director, Vision Systems Laboratory, Center for Vision Technologies

han-pang-chiu-bio-pic

Han-Pang Chiu

Technical Director, Vision and Robotics Laboratory, Center for Vision Technologies

bogdan matei bio pic

Bogdan Matei

Technical Director, Vision and Robotics Laboratory, Center for Vision Technologies

Yi Yao bio pic

Yi Yao

Technical Director, Vision and Learning Laboratory, Center for Vision Technologies

karan-sikka-pic-for-quote

“SRI offers a unique blend of academia and industry, which offers an opportunity to work on problems that involve research and are practically relevant.”

Karan Sikka

Computer Scientist, Information & Computing Sciences

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International