Cross-View and Cross-Modal Visual Geo-Localization for Augmented Reality and Robot/ Vehicle Navigation Applications


Rakesh (Teddy) Kumar, Supun Samarasekera, Han Pang Chiu, Niluthpol Mithun, Kshitij Minhas, Taragay Oskiper, SRI International (June 9, 2022) 2022 Joint Navigation Conference, Institute of Navigation


Image-based geo-localization is the problem of estimating the precise geo-location of a new captured image, by searching and matching this image against a geo-referenced 2D-3D database. Localizing a ground image within a large-scale environment is crucial to many applications, including wide area augmented reality, autonomous vehicles, and robotics. It typically involves two steps: (1) coarse search (or geo-tagging) of the 2D input image to find a set of candidate matches from the database, and (2) fine alignment performs 2D-3D verification for each candidate and returns the best match with refined 3D pose. Most works consider this problem by matching the input image to a database collected from similar ground viewpoints and same sensor modality (camera). Although these methods show good localization performance, their applications are limited by the difficulty in collecting and updating reference ground images covering a large area from all ground viewpoints and for all different weather and time of day/ year conditions.

Cross-view and cross-modal visual geo-localization has become a new research field to address this problem. Real-world can be represented in many data modalities that are sensed by disparate sensing devices from different viewpoints. For example, the same scene perceived from a ground camera can be captured as a 2D reference RGB image or a set of 3D point cloud from an aerial vehicle using LIDAR or motion-imagery or from satellite. Matching a ground image to aerial/ overhead reference data, whose acquisition is simpler and faster, also opens more opportunities for defense, industrial and consumer applications. However, cross-view and cross-modal visual geo-localization comes with additional technical challenges due to dramatic changes in appearance between the ground image and aerial database, which capture the same scene differently in viewpoints or/and sensor modalities.

In this talk, we will present methods and results for estimation of geo-location and/ or orientation for dismounts and platforms (robots and vehicles) for wide area, outdoor augmented reality (AR) and other applications under GPS denied/ challenged conditions. Precise estimation of global heading alongside the correct geodetic position (latitude, longitude, and height above ellipsoid) are crucial for accurate placement of AR information so that the overlaid graphics coincide with their true locations in the environment. For long range targets that are hundreds of meters away from the user, the heading accuracy plays even more of an important role in the accurate placement, as the horizontal insertion errors are dominated by heading errors due to the very large lever arm effect.

Read more from SRI