• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Project May 18, 2022

Vision and Language Navigation

SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments 

SRI International has developed a new learning-based approach to enable the mobile robot to resemble human capabilities in semantic understanding. The robot can employ semantic scene structures to reason about the world and pay particular attention to relevant semantic landmarks to develop navigation strategies. The robot can also efficiently learn from past experiences to explore new environments, such as discovering common semantic scene entities that can be generalized to similar but previously unseen places.

This approach can be adapted to a variety of applications. SRI has developed novel deep reinforcement learning (DRL) techniques to incorporate real-time semantic scene information for improving learning-based autonomous navigation in new unknown environments. DRL techniques have the potential to outperform the classical solutions; however, they come at a significantly increased computation load. By encoding explicit scene semantics into a map [3] or a graph representation [2], SRI’s approach performs better or comparable results to the existing learning-based solutions but under a clear time or computational budget.

SRI has utilized this approach to create a new vision-and-language navigation (VLN) framework [1] called SASRA (Semantically-Aware Spatio-temporal Reasoning Agent). VLN requires an autonomous robot to follow natural language instructions from humans in unseen environments. Existing learning-based methods struggle as they primarily focus on raw visual observation and lack semantic reasoning capabilities that are crucial in generalizing to new environments. To overcome these limitations, SRI developed a temporal memory by building a dynamic semantic map and performing cross-modal grounding to align map and language modalities, enabling more effective VLN results.

Resources

2022
International Conference on Pattern Recognition (ICPR)

SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments

2022
International Conference on Pattern Recognition (ICPR)

GraphMapper: Efficient Visual Navigation by Scene Graph Generation

2021
IEEE International Conference on Robots and Automation (ICRA)

MaAST: Map Attention with Semantic Transformers for Efficient Visual Navigation

Share this

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International