We propose a method to train an autonomous agent to learn to accumulate a 3D scene graph representation of its environment by simultaneously learning to navigate through said environment.
SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments
This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments.
Head-Worn Markerless Augmented Reality Inside a Moving Vehicle
This paper describes a system that provides general head-worn outdoor AR capability for the user inside a moving vehicle.
Global Heading Estimation for Wide Area Augmented Reality Using Road Semantics for Geo-referencing
We present a method to estimate global camera head- ing by associating directional information from road segments in the camera view with annotated satellite imagery.
Long-Range Augmented Reality with Dynamic Occlusion Rendering
This paper addresses the problem of fast and accurate dynamic occlusion reasoning by real objects in the scene for large scale outdoor AR applications.
MaAST: Map Attention with Semantic Transformers for Efficient Visual Navigation
By using our novel attention schema and auxiliary rewards to better utilize scene semantics, we outperform multiple baselines trained with only raw inputs or implicit semantic information while operating with an 80% decrease in the agent’s experience.
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud.
Semantically-Aware Attentive Neural Embeddings for 2D Long-Term Visual Localization
We present an approach that combines appearance and semantic information for 2D image-based localization (2D-VL) across large perceptual changes and time lags.
Evaluating Visual-Semantic Explanations using a Collaborative Image Guessing Game
Abstract While there have been many proposals on making AI algorithms explainable, few have attempted to evaluate the impact of AI-generated explanations on human performance in conducting human-AI collaborative tasks. To bridge the gap, we propose a Twenty-Questions style collaborative image retrieval game, Explanation-assisted Guess Which (ExAG), as a method of evaluating the efficacy of […]