• Skip to primary navigation
  • Skip to main content
SRI InternationalSRI mobile logo

SRI International

SRI International - American Nonprofit Research Institute

  • About
    • Blog
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Show Search
Hide Search
Home » Publication » Computer vision publications

Computer vision publications

2d-3d reasoning and augmented reality May 18, 2022 Conference Paper

Striking the Right Balance: Recall Loss for Semantic Segmentation

SRI International May 18, 2022

SRI author: Han-Pang Chiu Abstract Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation. Specifically, uneven class distributions in a training dataset often result in unsatisfactory performance on under-represented classes. Many works have proposed to weight the standard cross entropy loss function with pre-computed weights based on class statistics, such […]

Collaborative autonomy May 18, 2022 Conference Paper

Graph Mapper: Efficient Visual Navigation by Scene Graph Generation

SRI International May 18, 2022

SRI authors: Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar Abstract Understanding the geometric relationships between objects in a scene is a core capability in enabling both humans and autonomous agents to navigate in new environments. A sparse, unified representation of the scene topology will allow agents to act efficiently to move through their environment, communicate the […]

Publication May 18, 2022 Conference Paper

SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments

SRI International May 18, 2022

SRI authors: Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar Abstract This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments, which requires an autonomous agent to follow natural language instructions in unseen environments. Existing end-to-end learning-based VLN methods struggle at this task as they focus mostly on utilizing raw visual […]

2d-3d reasoning and augmented reality March 12, 2022 Conference Paper

Head-Worn Markerless Augmented Reality Inside a Moving Vehicle

SRI International March 12, 2022

SRI authors: Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar Abstract This paper describes a system that provides general head-worn outdoor augmented reality (AR) capability for the user inside a moving vehicle. Our system follows the concept of combining pose estimation from both vehicle navigation system and wearable sensors to address the failure of commercial AR devices […]

2d-3d reasoning and augmented reality January 24, 2022 Conference Paper

SIGNAV: Semantically-Informed GPS-Denied Navigation and Mapping in Visually-Degraded Environments

SRI International January 24, 2022

SRI authors: Han-Pang Chiu, Supun Samarasekera Abstract Understanding the perceived scene during navigation enables intelligent robot behaviors. Current vision-based semantic SLAM (Simultaneous Localization and Mapping) systems provide these capabilities. However, their performance decreases in visually-degraded environments, that are common places for critical robotic applications, such as search and rescue missions. In this paper, we present […]

Computer vision publications October 1, 2021 Conference Paper

Resilient Data Augmentation Approaches to Multimodal Verification in the News Domain

Martin Graciarena October 1, 2021

With the advent of generative adversarial networks and misinformation in social media, there has been increased interest in multimodal verification. Image-text verification typically involves determining whether a caption and an image correspond with each other. Building on multimodal embedding techniques, we show that data augmentation via two distinct approaches improves results: entity linking and cross-domain local similarity scaling. We refer to the approaches as resilient because we show state-of-the-art results against manipulations specifically designed to thwart the exact multimodal embeddings we are using as the basis for all of our features.

Computer vision publications August 27, 2021 Journal Article

Long-Range Augmented Reality with Dynamic Occlusion Rendering

Supun Samarasekera, Han-Pang Chiu, Rakesh “Teddy” Kumar August 27, 2021

Proper occlusion based rendering is very important to achieve realism in all indoor and outdoor Augmented Reality (AR) applications. This paper addresses the problem of fast and accurate dynamic occlusion reasoning by real objects in the scene for large scale outdoor AR applications. Conceptually, proper occlusion reasoning requires an estimate of depth for every point in augmented scene which is technically hard to achieve for outdoor scenarios, especially in the presence of moving objects. We propose a method to detect and automatically infer the depth for real objects in the scene without explicit detailed scene modeling and depth sensing (e.g. without using sensors such as 3D-LiDAR). Specifically, we employ instance segmentation of color image data to detect real dynamic objects in the scene and use either a top-down terrain elevation model or deep learning based monocular depth estimation model to infer their metric distance from the camera for proper occlusion reasoning in real time. The realized solution is implemented in a low latency real-time framework for video-see-though AR and is directly extendable to optical-see-through AR. We minimize latency in depth reasoning and occlusion rendering by doing semantic object tracking and prediction in video frames.

Computer vision publications May 30, 2021 Conference Paper

MaAST: Map Attention with Semantic Transformers for Efficient Visual Navigation

Han-Pang Chiu, Supun Samarasekera, Rakesh “Teddy” Kumar May 30, 2021

Visual navigation for autonomous agents is a core task in the fields of computer vision and robotics. Learning based methods, such as deep reinforcement learning, have the potential to outperform the classical solutions developed for this task; however, they come at a significantly increased computational load. Through this work, we design a novel approach that focuses on performing better or comparable to the existing learning-based solutions but under a clear time/computational budget. To this end, we propose a method to encode vital scene semantics such as traversable paths, unexplored areas, and observed scene objects–alongside raw visual streams such as RGB, depth, and semantic segmentation masks—into a semantically informed, top-down egocentric map representation. Further, to enable the effective use of this information, we introduce a novel 2-D map attention mechanism, based on the successful multi-layer Transformer networks. We conduct experiments on 3-D reconstructed indoor PointGoal visual navigation and demonstrate the effectiveness of our approach. We show that by using our novel attention schema and auxiliary rewards to better utilize scene semantics, we outperform multiple baselines trained with only raw inputs or implicit semantic information while operating with an 80% decrease in the agent’s experience.

Computer vision publications October 12, 2020 Conference Paper

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

Han-Pang Chiu, Supun Samarasekera, Rakesh “Teddy” Kumar October 12, 2020

We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior works were demonstrated on small datasets and did not lend themselves to scaling up for large-scale applications. To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images. We propose a novel joint embedding based method that effectively combines the appearance and semantic cues from both modalities to handle drastic cross-modal variations. Experiments on the proposed dataset show that our model achieves a strong result of a median rank of 5 in matching across a large test set of 50K location pairs collected from a 14km^2 area. This represents a significant advancement over prior works in performance and scale. We conclude with qualitative results to highlight the challenging nature of this task and the benefits of the proposed model. Our work provides a foundation for further research in cross-modal visual localization.

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Interim pages omitted …
  • Go to page 14
  • Go to Next Page »

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Our privacy policy
Career call to action image

Make your own mark.

Search jobs
Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Blog

Institute

Leadership

Press room

Media inquiries

Compliance

Privacy policy

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter

日本支社

SRI International

  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International