CVPR 2021 tutorial on Cross-view and Cross-modal Visual Geo-Localization

Cross-view and cross-modal visual geo-localization has become a new research field to address the problem of image-based geo-localization. We describe state-of-the-art methods utilizing hand-designed feature descriptors, pre-trained CNN based features and learning-based approaches.

A new research field to address image-based geo-localization

Image-based geo-localization is the problem of estimating the precise geo-location of a new captured image, by searching and matching this image against a geo-referenced 2D-3D database. Localizing a ground image within a large-scale environment is crucial to many applications, including autonomous vehicles, robotics, wide area augmented reality etc.. It typically involves two steps: (1) coarse search (or geo-tagging) of the 2D input image to find a set of candidate matches from the database, (2) fine alignment performs 2D-3D verification for each candidate and returns the best match with refined 3D pose. Most works consider this problem by matching the input image to a database collected from similar ground viewpoints and same sensor modality (camera). Although these methods show good localization performance, their applications are limited by the difficulty in collecting and updating reference ground images covering a large area from all ground view-points and for all different weather and time of day/ year conditions.

Cross-view and cross-modal visual geo-localization has become a new research field to address this problem. Real-world can be represented in many data modalities that are sensed by disparate sensing devices from different viewpoints. For example, the same scene perceived from a ground camera can be captured as an RGB image or a set of 3D point cloud from an aerial vehicle using LIDAR or motion-imagery or from satellite. Localizing a ground image using an aerial/ overhead geo-referenced database has gained noticeable momentum in recent years, due to significant growth in the availability of public aerial/ overhead data with multiple modalities (such as aerial images from google maps, and USGS 2D and 3D data, Aerial LiDAR data, Satellite 3D Data etc.). Matching a ground image to aerial/ overhead data, whose acquisition is simpler and faster, also opens more opportunities to industrial and consumer applications. However, cross-view and cross-modal visual geo-localization comes with additional technical challenges due to dramatic changes in appearance between the ground image and aerial database, which capture the same scene differently in viewpoints or/and sensor modalities.

This tutorial will offer a) an overview of cross-view and cross-modal visual geo-localization and will stress the related algorithmic aspects, such as: b) ground-to-aerial image matching, and c) image-to-3D coarse search and fine alignment. For each topic, this tutorial will describe state-of-the-art methods utilizing hand-designed feature descriptors, pre-trained CNN based features and learning-based approaches (such as deep embeddings, graph based matching techniques, semantic segmentation, and Generative Adversarial Networks). For practical applications, the vision-based geo-localization techniques must work for images taken at different times of the day and year, and across different seasons and weather conditions.

Program agenda

TUTORIALS TO TAKE PLACE ON JUNE 20th, 2021

Cross-view and Cross-modal Visual Geo-Localization

2:15 PM – 3:30 PM USA EST

Visual Localization Cross time CVPR 2021 Tutorial

3:30 PM – 4:15 PM USA EST

Cross-View Geo-Localization: Ground-to-Aerial Image Matching

4:15 PM – 4:45 PM USA EST

Large Scale Cross View Image Geo-localization

4:45 PM – 6:00 PM USA EST

Cross-Modal Geo-Localization: Image-to-3D Coarse Search & Fine Alignment

Tutorials

The tutorials will take place on June 20th, 2021, and will consist of 5 lectures, as detailed below:

CVPR 2021 tutorial on Cross-view and Cross-modal Visual Geo-Localization

2:00 PM – 2:15 PM USA EST

Abstract: This lecture will provide the general context for this new and emerging topic. It presents the aims of image-based geo-localization, the challenges and the important issues to be tackled for matching images to geo-reference databases across viewpoints, weather/ long-term time differences and modalities. It will also describe a typical visual geo-localization system in details, for both industrial and consumer applications such as autonomous navigation and wide area, outdoor augmented reality.

Speakers: Han-Pang Chiu and Rakesh (Teddy) Kumar

View video
View presentation

Visual Localization Cross time CVPR 2021 Tutorial

2:15 PM – 3:30 PM USA EST

Abstract: This lecture will discuss techniques for long-term Visual Localization in real-world environments, which are required to operate under (1) extreme perceptual changes such as weather and illumination (matching day images to night reference etc.), and (2) dynamic scene changes and occlusions such as from moving vehicles and people. The lecture will focus on state-of-the-art matching techniques using deep learning based methods including semantic scene segmentation, attention networks, triplet loss and embeddings.

Speaker: Rakesh (Teddy) Kumar

View video
View presentation

Cross-View Geo-Localization: Ground-to-Aerial Image Matching

3:30 PM – 4:15 PM USA EST

Abstract: The lecture includes the essential knowledge about how we estimate the geo-location of a query street view image in a structured database of city-wide aerial reference images with known GPS coordinates. It will start with a brief review of traditional geo-localization involving DOQ, DEM and meta data. Then it will present computer vision methods employing either traditional hard-craft features or learning based features with view-invariant descriptors for ground-to-aerial image matching. It will end with recent methods employing deep learning for cross-view image geo-localization employing Generative Adversarial Networks.

Speaker: Mubarak Shah

View video
View presentation

Large Scale Cross View Image Geo-localization

4:15 PM – 4:45 PM USA EST

Abstract: Image-based geo-localization aims at providing image-level GPS location by matching a query street/ground image with the GPS-tagged images in a reference dataset. I will present our recent effort on cross-view image geo-localization using advanced deep learning algorithms and provide a visual explanation method to interpret the learning outcome for cross-view image matching. Then I will present the cross-view image geo-localization in a more realistic and challenging setting that breaks the one-to-one retrieval setting of existing datasets. To bridge the gap between this realistic setting and existing datasets, we propose a new large-scale benchmark –VIGOR– for cross View Image Geo-localization beyond One-to-one Retrieval. We benchmark existing state-of-the-art methods and present a novel end-to-end framework to localize the query in a coarse-to-fine manner.

Speaker: Chen Chen

View video
View presentation

Cross-View Geo-Localization: Ground-to-Aerial Image Matching

3:30 PM – 4:15 PM USA EST

Speakers: Mubarak Shah

View video
View presentation

Rakesh “Teddy” Kumar

Vice President, Information and Computing Sciences and Director of the Center for Vision Technologies

Teddy is responsible for leading research and development of innovative end-to-end vision solutions from image capture to situational understanding that translate into real-world applications such as robotics, intelligence extraction and human computer interaction.

View profile

Mubarak Shah

Trustee Chair Professor of Computer Science, founding director of the Center for Research in Computer Vision at UCF

Mubarak’s research interests include: video surveillance, visual tracking, human activity recognition, visual analysis of crowded scenes, video registration, UAV video analysis, etc.

View profile

Han-Pang Chiu

Senior Technical Manager of the Center for Vision Technologies

Han-Pang leads a research group to develop innovative solutions for real-world applications to navigation, mobile augmented reality and robotics.

View profile

Chen Chen

Department of Electrical and Computer Engineering University of North Carolina at Charlotte

Dr. Chen Chen’s research interests include signal and image processing, computer vision and deep learning. He published over 80 papers in refereed journals and conferences in these areas.

View profile

Virtual CVPR | June 19-25th, 2021

Virtual CVPR 2021

Relevant publications

ACM Multimedia Conference (MM), 2020, Best Paper finalist

October 12, 2020

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

Authors: Niluthpol Mithun, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar,

International Conference on Computer Vision (ICCV)

April 24, 2019

Bridging the Domain Gap for Ground-to-Aerial Image Matching

Authors: Krishna Regmi and Mubarak Shah

IEEE transactions on pattern analysis and machine intelligence

February 4th, 2017

Largescale image geo-localization using dominant sets.

Authors: Eyasu Zemene, Yonatan Tariku Tesfaye, Haroon Idrees, Andrea Prati, Marcello Pelillo, and Mubarak Shah

2019

British Machine Vision Conference (BMVC)

Semantically-Aware Attentive Neural Embeddings for Long-Term 2D Visual Localization

Authors: Zachary Seymour, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

2018

IEEE International Conference on Virtual Reality (VR)

Augmented Reality Driving Using Semantic Geo-Registration

Authors: Han-Pang Chiu, Varun Murali, Ryan Villamil, G. Drew Kessler, Supun Samarasekera, Rakesh Kumar

2017

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Cross-view image matching for geo-localization in urban environments

Authors: Yicong Tian, Chen Chen, and Mubarak Shah

2017

IEEE International Conference on Intelligent Transportation Systems (ITSC)

Utilizing semantic visual landmarks for precise vehicle navigation

Authors: Varun Murali, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

2016

IEEE International Conference on Intelligent Transportation Systems (ITSC)

Sub-meter vehicle navigation using efficient pre-mapped visual landmarks

Authors: Han-Pang Chiu, Mikhail Sizintsev, Xun S. Zhou, Philip Miller, Supun Samarasekera, Rakesh Kumar

2014

IEEE transactions on pattern analysis and machine intelligence

Image geolocalization based on multiplenearest neighbor feature matching using generalized graphs

Authors: Amir Roshan Zamir and Mubarak Shah

2011

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)

High-precision localization using visual landmarks fused with range data

Authors: Zhiwei Zhu, Han-Pang Chiu, Taragay Oskiper, Saad Ali, Raia Hadsell, Supun Samarasekera, Rakesh Kumar

Best Paper Award

2011

IEEE International Conference on Virtual Reality (VR)

Stable vision-aided navigation for large-scale augmented reality

Authors: Taragay Oskiper, Han-Pang Chiu, Zhiwei Zhu, Supun Samarasekera, Rakesh Kumar

2010

European Conference on Computer Vision (ECCV)

Accurate image localization based on google maps street view

Authors: Amir Roshan Zamir and Mubarak Shah

2008

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)

Building segmentation for densely built urban regions using aerial LIDAR data

Authors: Bogdan C. Matei, Harpreet S. Sawhney, Supun Samarasekera, Janet Kim, Rakesh Kumar

2006

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)

3D Building Detection and Modeling from Aerial LIDAR Data

Authors: Vivek Verma, Rakesh Kumar, Stephen C. Hsu

2006

IEEE International Conference on Pattern Recognition (ICPR)

A Heterogeneous Feature-based Image Alignment Method

Authors: Cen Rao, Yanlin Guo, Harpreet S. Sawhney, Rakesh Kumar

2001

International Conference on Computer Vision (ICCV)

Video Georegistration: Algorithm and Quantitative Evaluation

Authors: Richard P. Wildes, David J. Hirvonen, Steven C. Hsu, Rakesh Kumar, W. Brian Lehman, Bogdan Matei, Wen-Yi Zhao

2001

IEEE International Conference on Pattern Recognition (ICPR)

Registration of Highly-Oblique and Zoomed in Aerial Video to Reference Imagery

Authors: Rakesh Kumar, Supun Samarasekera, Steven C. Hsu, Keith J. Hanna

1998

EEE International Conference on Pattern Recognition (ICPR)

Registration of video to geo-referenced imagery

Authors: Rakesh Kumar, Harpreet S. Sawhney, Jane C. Asmuth, Art Pope, S. Hsu

2021

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval

Authors: Sijie Zhu, Taojiannan Yang, Chen Chen