Accurate motion estimation using low-cost sensors for autonomous robots in visually-degraded environments is critical to applications such as infrastructure inspection and indoor rescue missions. This paper analyzes the feasibility of utilizing multiple low-cost on-board sensors for ground robots or drones navigating in visually-degraded environments. We select four low-cost and small-size sensors for evaluation: IMU, EO stereo cameras with LED lights, active IR cameras, and 2D LiDAR. We adapt and extend state-of-the-art multi-sensor motion estimation techniques, including a factor graph framework for sensor fusion, under poor illumination conditions. We evaluate different sensor combinations using the factor graph framework, and benchmark each combination with its accuracy for two representative datasets acquired in totally dark environments. Our results show the potential of this sensor fusion approach towards an improved ego-motion solution in challenging dark environments.
We propose a new approach that utilizes semantic information to register 2D monocular video frames to the world using 3D georeferenced data, for augmented reality driving applications. The geo-registration process uses our predicted vehicle pose to generate a rendered depth map for each frame, allowing 3D graphics to be convincingly blended with the real world view. We also estimate absolute depth values for dynamic objects, up to 120 meters, based on the rendered depth map and update the rendered depth map to reflect scene changes over time. This process also creates opportunistic global heading measurements, which are fused with other sensors, to improve estimates of the 6 degrees-of-freedom global pose of the vehicle over state-of-the-art outdoor augmented reality systems. We evaluate the navigation accuracy and depth map quality of our system on a driving vehicle within various large-scale environments for producing realistic augmentations.
This paper presents a new approach for integrating semantic information for vision-based vehicle navigation. Although vision-based vehicle navigation systems using pre-mapped visual landmarks are capable of achieving submeter level accuracy in large-scale urban environment, a typical error source in this type of systems comes from the presence of visual landmarks or features from temporal objects in the environment, such as cars and pedestrians. We propose a gated factor graph framework to use semantic information associated with visual features to make decisions on outlier/ inlier computation from three perspectives: the feature tracking process, the geo-referenced map building process, and the navigation system using pre-mapped landmarks. The class category that the visual feature belongs to is extracted from a pre-trained deep learning network trained for semantic segmentation. The feasibility and generality of our approach is demonstrated by our implementations on top of two vision-based navigation systems. Experimental evaluations validate that the injection of semantic information associated with visual landmarks using our approach achieves substantial improvements in accuracy on GPS-denied navigation solutions for large-scale urban scenarios.
This paper presents a vehicle navigation system that is capable of achieving sub-meter GPS-denied navigation accuracy in large-scale urban environments, using pre-mapped visual landmarks. Our navigation system tightly couples IMU data with local feature track measurements, and fuses each observation of a pre-mapped visual landmark as a single global measurement. This approach propagates precise 3D global pose estimates for longer periods. Our mapping pipeline leverages a dual-layer architecture to construct high-quality pre-mapped visual landmarks in real time. Experimental results demonstrate that our approach provides sub-meter GPS-denied navigation solutions in large-scale urban scenarios.
This paper introduces a user-worn Augmented Reality (AR) based first-person weapon shooting system (AR-Weapon), suitable for both training and gaming. Different from existing AR-based first-person shooting systems, AR-Weapon does not use fiducial markers placed in the scene for tracking. Instead it uses natural scene features observed by the tracking camera from the live view of the world. The AR-Weapon system estimates 6-degrees of freedom orientation and location of the weapon and of the user operating it, thus allowing the weapon to fire simulated projectiles for both direct fire and non-line of sight during live runs. In addition, stereo cameras are used to compute depth and provide dynamic occlusion reasoning. Using the 6-DOF head and weapon tracking, dynamic occlusion reasoning and a terrain model of the environment, the fully virtual projectiles and synthetic avatars are displayed on the user’s head mounted Optical-See-Through (OST) display overlaid over the live view of the real world. Since the projectiles, weapon characteristics and virtual enemy combatants are all simulated they can easily be changed to vary scenarios, new projectile types and future weapons. In this paper, we present the technical algorithms, system design and experiment results for a prototype AR-Weapon system.
AR-Mentor is a wearable real time Augmented Reality (AR) mentoring system that is configured to assist in maintenance and repair tasks of complex machinery, such as vehicles, appliances, and industrial machinery. The system combines a wearable Optical-See-Through (OST) display device with high precision 6-Degree-Of-Freedom (DOF) pose tracking and a virtual personal assistant (VPA) with natural language, verbal conversational interaction, providing guidance to the user in the form of visual, audio and locational cues. The system is designed to be heads-up and hands-free allowing the user to freely move about the maintenance or training environment and receive globally aligned and context aware visual and audio instructions (animations, symbolic icons, text, multimedia content, speech). The user can interact with the system, ask questions and get clarifications and specific guidance for the task at hand. A pilot application with AR-Mentor was successfully built to instruct a novice to perform an advanced 33-step maintenance task on a training vehicle. The initial live training tests demonstrate that AR-Mentor is able to help and serve as an assistant to an instructor, freeing him/her to cover more students and to focus on higher-order teaching.
In this paper, we expand our previous work on augmented reality (AR) binoculars to support wider range of user motion – up to thousand square meters compared to only a few square meters as before. We present our latest improvements and additions to our pose estimation pipeline and demonstrate stable registration of objects on the real world scenery while the binoculars are undergoing significant amount of parallax-inducing translation.
This paper proposes a novel vision-aided navigation approach that continuously estimates precise 3D absolute pose for aerial vehicles, using only inertial measurements and monocular camera observations. Our approach is able to provide accurate navigation solutions under long-term GPS outage, by tightly incorporating absolute geo-registered information into two kinds of visual measurements: 2D-3D tie-points, and geo-registered feature tracks. 2D-3D tie-points are established by finding feature correspondences to align an aerial video frame to a 2D geo-referenced image rendered from the 3D terrain database. These measurements provide global information to correct accumulated error in navigation estimation. Geo-registered feature tracks are generated by associating features across consecutive frames. They enable the propagation of 3D geo-referenced values to further improve the pose estimation. All sensor measurements are fully optimized in a smoother-based inference framework, which achieves efficient relinearization and real-time estimation of navigation states and their covariances over a constant-length of sliding window. Experimental results demonstrate that our approach provides accurate and consistent aerial navigation solutions on several large-scale GPS-denied scenarios.
This paper proposes a real-time navigation approach that is able to integrate many sensor types while fulfilling performance needs and system constraints. Our approach uses a plug-and-play factor graph framework, which extends factor graph formulation to encode sensor measurements with different frequencies, latencies, and noise distributions. It provides a flexible foundation for plug-and-play sensing, and can incorporate new evolving sensors. A novel constrained optimal selection mechanism is presented to identify the optimal subset of active sensors to use, during initialization and when any sensor condition changes. This mechanism constructs candidate subsets of sensors based on heuristic rules and a ternary tree expansion algorithm. It quickly decides the optimal subset among candidates by maximizing observability coverage on state variables, while satisfying resource constraints and accuracy demands. Experimental results demonstrate that our approach selects subsets of sensors to provide satisfactory navigation solutions under various conditions, on large-scale real data sets using many sensors.
In this paper we present an augmented reality binocular system to allow long range high precision augmentation of live telescopic imagery with aerial and terrain based synthetic objects, vehicles, people and effects. The inserted objects must appear stable in the display and must not jitter and drift as the user pans around and examines the scene with the binoculars. The design of the system is based on using two different cameras with wide field of view, and narrow field of view lenses enclosed in a binocular shaped shell. Using the wide field of view gives us context and enables us to recover the 3D location and orientation of the binoculars much more robustly, whereas the narrow field of view is used for the actual augmentation as well as to increase precision in tracking. We present our navigation algorithm that uses the two cameras in combination with an IMU and GPS in an Extended Kalman Filter (EKF) and provides jitter free, robust and real-time pose estimation for precise augmentation. We have demonstrated successful use of our system as part of a live simulated training system for observer training, in which fixed and rotary wing aircrafts, ground vehicles, and weapon effects are combined with real world scenes.
Abstract This paper proposes a navigation algorithm that provides a low-latency solution while estimating the full nonlinear navigation state. Our approach uses Sliding-Window Factor Graphs, which extend existing incremental smoothing methods to operate on the subset of measurements and states that exist inside a sliding time window. We split the estimation into a fast short-term […]
There is a need within the military to enhance its training capability to provide more realistic and timely training, but without incurring excessive costs in time and infrastructure. This is especially true in preparing for urban combat. Unfortunately the creation of facility based training centers that provide sufficient realism is time consuming and costly. Many supporting actors are needed to provide opponent forces and civilians. Elaborate infrastructure is needed to create a range of training scenarios, and record and review training sessions. In this paper we describe the technical methods and experimental results on building an Augmented Reality Training system for training dismounts doing maneuver operations that addresses the above shortcomings. The augmented reality system uses computer graphics and special head mounted displays to insert virtual actors and objects into the scene as viewed by each trainee wearing augmented reality eyewear. The virtual actors respond in realistic ways to actions of the Warfighters, taking cover, firing back, or milling as crowds.
Perhaps most importantly, the system is designed to be infrastructure free. The primary hardware needed to implement augmented reality is worn by the individual trainees. The system worn by a trainee includes helmet mounted sensors, see through eye-wear, and a compact computer in his backpack. The augmented reality system tracks the actions, locations and head and weapon poses of each trainee in detail so the system can appropriately position virtual objects in his field of view. Synthetic actors, objects and effects are rendered by a game engine on the eyewear display. Stereo based 3D reasoning is used to occlude all or parts of synthetic entities obscured by real world three dimensional structures based on the location of the synthetic. We present implementation details for each of the modules and experimental results for both day time and night time operations.
Multi-Sensor Navigation Algorithm Using Monocular Camera, IMU and GPS for Large Scale Augmented Reality
Camera tracking system for augmented reality applications that can operate both indoors and outdoors is described. The system uses a monocular camera, a MEMS-type inertial measurement unit (IMU) with 3-axis gyroscopes and accelerometers, and GPS unit to accurately and robustly track the camera motion in 6 degrees of freedom (with correct scale) in arbitrary indoor or outdoor scenes. IMU and camera fusion is performed in a tightly coupled manner by an error-state extended Kalman filter (EKF) such that each visually tracked feature contributes as an individual measurement as opposed to the more traditional approaches where camera pose estimates are first extracted by means of feature tracking and then used as measurement updates in a filter framework. Robustness in feature tracking and hence in visual measurement generation is achieved by IMU aided feature matching and a two-point relative pose estimation method, to remove outliers from the raw feature point matches. Landmark matching to contain long-term drift in orientation via on the fly user generated geo-tiepoint mechanism is described.
We present an augmented reality system based on Kinect for on-line handbag shopping. The users can virtually try on different handbags on a TV screen at home. They can interact with the virtual handbags naturally, such as sliding a handbag to different positions on their arms and rotating a handbag to see it from different angles. The users can also see how the handbags fit them in different virtual environments other than the current real background. We describe the technical details for building such a system and demonstrate the experimental results.