2D-3D reasoning and augmented reality

SRI has a strong portfolio of 2D-3D reasoning. This includes navigation and mapping using 2D and 3D sensors such as video and LIDAR.

In recent years, machine learning has significantly improved the semantic understanding of the 2D and 3D data. Incorporating semantics enables a new class of algorithms for navigation, Simultaneous Localization and Mapping (SLAM), geo-registration, wide-area search, augmented reality, data compression, 3D modeling, and surveillance.

Semantic and GPS-denied navigation

CVT has developed highly efficient low-drift localization and mapping methods that exploit visual and inertial sensors. SRI has supported a large portfolio of programs and spin-offs using this technology. CVT has also incorporated high-level learning-based semantic information (recognition of objects and scene layouts) into dynamic maps and scene graphs, improving accuracy, efficiency, and robustness in our state-of-the-art navigation systems.

Changes in lighting and weather significantly effects vision algorithms. CVT has developed a novel deep-embedding approach to project image data into a high-dimensional feature space with geo-spatial coherence to learn features that are invariant to weather and time of day. CVT processed two million images from thousands of webcams worldwide to learn how a scene changes over time (i.e., across day, night, and seasonal changes) via this approach. These learned embeddings incorporate scene semantics for contextual reasoning, which enables highly reliable image retrieval across extremely large reference image databases.

Geo-registration is the process of matching video to previous geo-reference data sources such as satellite imagery or LIDAR. CVT has worked across multiple government programs to perform high-precision geo-registration with and without GPS for aerial and ground platforms. CVT has also leveraged recent advances in machine learning to extract semantic features that can be matched across large viewpoint variations and changes in sensing modalities.

Long-range, wide-area, augmented reality

CVT has combined the localization and geo-registration methods described above with low-powered, compact, ruggedized hardware to create wide-area augmented reality applications. CVT has extended its augmented reality capabilities to work over multiple square kilometers while in GPS-challenged environments. This also includes long-range 3D occlusion-reasoning for augmented reality applications.

3D scene classification and modeling

CVT has developed extremely robust 3D scene classification methods over the last decade. These methods have now transitioned to Department of Defense (DoD) programs of record and commercially available software packages. Working with the Office of Naval Research (ONR), the U.S. Army and the National Geospatial-Intelligence Agency (NGA), CVT is now developing the next-generation 3D scene-understanding methods using machine learning. These methods incorporate top-down and bottom-up contextual reasoning and human-specified geographic rules within the learning process.

Surveillance

CVT’s work in change detection supports deployed improvised explosive devices (IEDs). These algorithms look at multiple passes of video data to detect change signatures of buried roadside IEDs. The recent integration of machine learning-based road-detection methods has significantly improved our change detection performance. SRI is developing novel anomaly detection and anomaly-guided change detection methods for next-generation systems. Specifically, CVT is developing a transformer-based joint spatiotemporal model encompassing multiple space and time resolutions. Transformer networks enable retention of properties of various data modalities, namely geography, weather, seasonal variations, knowledge of typical events and activities of interest, thus providing modularity that enables multimodality, scalability, and explainability.

CVT has developed an end-to-end pipeline that fuses multi-modal data in deep embedding space for specific tasks—such as target detection and recognition—by directly optimizing target metrics and learning the optimal contribution/control of each mode to the results. This pipeline has been applied to different modalities, including electro-optic/infrared images, hyperspectral imaging, and LIDAR/RADAR data. CVT has also incorporated scene-contextual information to further improve performance of the target task.

Our work

July 5, 2023

A new augmented reality system delivers a smoother, more immersive experience

By combining ground and aerial views with computer-generated elements, users on the ground view a more accurate augmented reality experience.
November 22, 2021

A modern approach to building inspections

Using augmented reality and mobile technology to reduce construction overhead.
July 28, 2021

75 Years of Innovation: augmented reality binoculars

The first mobile, precision, non-jitter, augmented reality binoculars

Recent publications

more +

June 4, 2024

Machine Learning Aided GPS-Denied Navigation Using Uncertainty Estimation through Deep Neural Networks

Rakesh Kumar, Supun Samarasekera, Han-Pang Chiu

We describe and demonstrate a novel approach for generating accurate and interpretable uncertainty estimation for outputs from a DNN in real time.
March 27, 2023

Vision based Navigation using Cross-View Geo-registration for Outdoor Augmented Reality and Navigation Applications

Rakesh Kumar, Supun Samarasekera, Han-Pang Chiu

In this work, we present a new vision-based cross-view geo-localization solution matching camera images to a 2D satellite/ overhead reference image database. We present solutions for both coarse search for…
March 25, 2023

Cross-View Visual Geo-Localization for Outdoor Augmented Reality

Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database. Recently, neural network-based methods have shown state-of-the-art performance in…

Featured publications

August 27, 2021

Long-Range Augmented Reality with Dynamic Occlusion Rendering

This paper addresses the problem of fast and accurate dynamic occlusion reasoning by real objects in the scene for large scale outdoor AR applications.
May 18, 2022

Striking the Right Balance: Recall Loss for Semantic Segmentation

We propose a hard-class mining loss by reshaping the vanilla cross entropy loss such that it weights the loss for each class dynamically based on instantaneous recall performance.
October 12, 2020

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud.
September 9, 2019

Semantically-Aware Attentive Neural Embeddings for 2D Long-Term Visual Localization

We present an approach that combines appearance and semantic information for 2D image-based localization (2D-VL) across large perceptual changes and time lags.
September 2, 2019

Multi-Sensor Fusion for Motion Estimation in Visually-Degraded Environments

This paper analyzes the feasibility of utilizing multiple low-cost on-board sensors for ground robots or drones navigating in visually-degraded environments.
March 18, 2018

Augmented Reality Driving Using Semantic Geo-Registration

We propose a new approach that utilizes semantic information to register 2D monocular video frames to the world using 3D georeferenced data, for augmented reality driving applications.
October 16, 2017

Utilizing Semantic Visual Landmarks for Precise Vehicle Navigation

This paper presents a new approach for integrating semantic information for vision-based vehicle navigation.
November 1, 2016

Sub-Meter Vehicle Navigation Using Efficient Pre-Mapped Visual Landmarks

This paper presents a vehicle navigation system that is capable of achieving sub-meter GPS-denied navigation accuracy in large-scale urban environments, using pre-mapped visual landmarks.