Rakesh Kumar, Supun Samarasekera, Niluthpol Mithun, Kshitij Minhas, Taragay Oskiper, Kevin Kaighn, Mikhail Sizintsev, Han-Pang Chiu, Vision based Navigation using Cross-View Geo-Registration for Outdoor Augmented Reality and Navigation Applications, 2023 Joint Navigation Conference, Institute of Navigation
Estimating precise geo-location of ground imagery and video streams in the world is crucial to many applications, including wide area augmented reality, dismount tracking, navigation for autonomous vehicles, and robotics. Visual-Inertial Odometry and SLAM (simultaneous localization and mapping) system are often used for navigation in these applications. The visual odometry systems have increasingly become more accurate and can achieve 0.1% drift with respect to distance travelled. To reset the drift, the input images are matched to a geo-reference landmark database. Most prior works consider the problem as matching the input image queries against a pre-built database of geo-referenced ground images or video streams collected from similar ground viewpoints and the same sensor modality. However, collecting ground images over a large area is time-consuming and may not be feasible in many cases. To overcome this limitation, there are significant recent interests in geo-localization of ground imagery against an overhead reference image database. Due to the wide availability, easier obtainability, and dense coverage, 2D satellite data has become a very attractive reference data source.
In this work, we present a new vision-based cross-view geo-localization solution matching camera images to a 2D satellite/ overhead reference image database. We present solutions for both coarse search for cold start and fine alignment for continuous refinement. The geo-localization solution is based on a neural network-based framework that can perform both location and orientation estimation based on cross-view matching. We have developed solutions using both convolutional neural networks and transformers. We compare the results for each case. We also present an approach to extend the single image query-based cross-view geo-localization by utilizing temporal information across video frames for continuous and consistent geo-localization, which fits the demanding requirements in navigation applications. Our cross-view geo-localization approach can be used to augment existing navigation methods as an additional sensor measurement. The cross-view matching neural network model is optimized to run on low-cost embedded smartphone processors. We also present the methods to optimize the neural network and compare the performance achieved between running on a MSI VR backpack with powerful GTX 1070 GPU versus running on a Qualcomm RB5 smartphone processor with embedded GPU. Specifically, we develop and present a navigation system to continuously estimate 6-DoF (Degrees of Freedom) camera geo-poses for outdoor navigation applications. The system consists of a helmet-mounted sensor platform (including cameras, IMU, magnetometer, and GPS when available), an embedded computer (e.g., Qualcomm RB5) mounted on the backpack, and a video see-through head-mounted display (HMD). The backbone of our navigation system is built on a tightly coupled error-state Extended Kalman Filter (EKF) based sensor fusion for visual-inertial navigation. The tightly coupled visual-inertial-odometry module produce 6-DoF platform pose updates at 15-30 Hz. However, these pose updates drift over time. To correct the drift, our novel cross-view visual geo-localization solution estimates 3-DoF (latitude, longitude and heading) camera pose, by matching camera images to satellite images. In addition to the relative measurements from frame-to-frame feature tracks for odometry purposes, our error-state EKF framework can fuse the estimates from the cross-view geo-registration model with global measurements from GPS, for heading and location correction to counter visual odometry drift accumulation over time. The visual geo-localization solution is used both for providing initial global heading and location (cold-start procedure) and for continuous global heading refinement over time to the navigation system. We show experimental result videos demonstrating both navigation and augmented reality performance and accuracy performance results with and without GPS.