Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation.
Conference Paper
Graph Mapper: Efficient Visual Navigation by Scene Graph Generation
We propose a method to train an autonomous agent to learn to accumulate a 3D scene graph representation of its environment by simultaneously learning to navigate through said environment.
SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments
This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments.
Head-Worn Markerless Augmented Reality Inside a Moving Vehicle
This paper describes a system that provides general head-worn outdoor AR capability for the user inside a moving vehicle.
SIGNAV: Semantically-Informed GPS-Denied Navigation and Mapping in Visually-Degraded Environments
We present SIGNAV, a real-time semantic SLAM system to operate in perceptually-challenging situations.
MaAST: Map Attention with Semantic Transformers for Efficient Visual Navigation
By using our novel attention schema and auxiliary rewards to better utilize scene semantics, we outperform multiple baselines trained with only raw inputs or implicit semantic information while operating with an 80% decrease in the agent’s experience.
Hyper-Dimensional Analytics of Video Action at the Tactical Edge
We review HyDRATE, a low-SWaP reconfigurable neural network architecture developed under the DARPA AIE HyDDENN (Hyper-Dimensional Data Enabled Neural Network) program.
Wideband Spectral Monitoring Using Deep Learning
We present a system to perform spectral monitoring of a wide band of 666.5 MHz, located within a range of 6 GHz of Radio Frequency (RF) bandwidth, using state-of-the-art deep learning approaches.
Semantically-Aware Attentive Neural Embeddings for 2D Long-Term Visual Localization
We present an approach that combines appearance and semantic information for 2D image-based localization (2D-VL) across large perceptual changes and time lags. Compared to appearance features, the semantic layout of a scene is generally more invariant to appearance variations. We use this intuition and propose a novel end-to-end deep attention-based framework that utilizes multimodal cues to generate robust embeddings for 2D-VL. The proposed attention module predicts a shared channel attention and modality-specific spatial attentions to guide the embeddings to focus on more reliable image regions. We evaluate our model against state-of-the-art (SOTA) methods on three challenging localization datasets. We report an average (absolute) improvement of 19% over current SOTA for 2D-VL. Furthermore, we present an extensive study demonstrating the contribution of each component of our model, showing 8–15% and 4% improvement from adding semantic information and our proposed attention module. We finally show the predicted attention maps to offer useful insights into our model.