Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation.
We propose a method to train an autonomous agent to learn to accumulate a 3D scene graph representation of its environment by simultaneously learning to navigate through said environment.
SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments
This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments.
This paper describes a system that provides general head-worn outdoor AR capability for the user inside a moving vehicle.
We present SIGNAV, a real-time semantic SLAM system to operate in perceptually-challenging situations.
With the advent of generative adversarial networks and misinformation in social media, there has been increased interest in multimodal verification. Image-text verification typically involves determining whether a caption and an image correspond with each other. Building on multimodal embedding techniques, we show that data augmentation via two distinct approaches improves results: entity linking and cross-domain local similarity scaling. We refer to the approaches as resilient because we show state-of-the-art results against manipulations specifically designed to thwart the exact multimodal embeddings we are using as the basis for all of our features.
We describe work in progress towards deriving a unification algorithm automatically from a declarative specification using deductive methods. The specification is phrased as a logical theorem and the program is extracted from the proof. The theorem is proved using Stickel’s system SNARK, operating over a theory of expressions and substitutions. The theory has been formulated to allow a simpler specification of the algorithm, and the theorem prover has discovered novelties in the implementation. It is hoped that the same techniques may enable the discovery of previously unknown unification algorithms for specific theories.
By using our novel attention schema and auxiliary rewards to better utilize scene semantics, we outperform multiple baselines trained with only raw inputs or implicit semantic information while operating with an 80% decrease in the agent’s experience.
We review HyDRATE, a low-SWaP reconfigurable neural network architecture developed under the DARPA AIE HyDDENN (Hyper-Dimensional Data Enabled Neural Network) program.