• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
    • Our history
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close

CVPR 2021 tutorial on Cross-view and Cross-modal Visual Geo-Localization

CONTACT US
Architecture background white city buildings models 3d rendering

Cross-view and cross-modal visual geo-localization has become a new research field to address the problem of image-based geo-localization. We describe state-of-the-art methods utilizing hand-designed feature descriptors, pre-trained CNN based features and learning-based approaches.

A new research field to address image-based geo-localization

Image-based geo-localization is the problem of estimating the precise geo-location of a new captured image, by searching and matching this image against a geo-referenced 2D-3D database. Localizing a ground image within a large-scale environment is crucial to many applications, including autonomous vehicles, robotics, wide area augmented reality etc.. It typically involves two steps: (1) coarse search (or geo-tagging) of the 2D input image to find a set of candidate matches from the database, (2) fine alignment performs 2D-3D verification for each candidate and returns the best match with refined 3D pose. Most works consider this problem by matching the input image to a database collected from similar ground viewpoints and same sensor modality (camera). Although these methods show good localization performance, their applications are limited by the difficulty in collecting and updating reference ground images covering a large area from all ground view-points and for all different weather and time of day/ year conditions.

Cross-view and cross-modal visual geo-localization has become a new research field to address this problem. Real-world can be represented in many data modalities that are sensed by disparate sensing devices from different viewpoints. For example, the same scene perceived from a ground camera can be captured as an RGB image or a set of 3D point cloud from an aerial vehicle using LIDAR or motion-imagery or from satellite. Localizing a ground image using an aerial/ overhead geo-referenced database has gained noticeable momentum in recent years, due to significant growth in the availability of public aerial/ overhead data with multiple modalities (such as aerial images from google maps, and USGS 2D and 3D data, Aerial LiDAR data, Satellite 3D Data etc.). Matching a ground image to aerial/ overhead data, whose acquisition is simpler and faster, also opens more opportunities to industrial and consumer applications. However, cross-view and cross-modal visual geo-localization comes with additional technical challenges due to dramatic changes in appearance between the ground image and aerial database, which capture the same scene differently in viewpoints or/and sensor modalities.

This tutorial will offer a) an overview of cross-view and cross-modal visual geo-localization and will stress the related algorithmic aspects, such as: b) ground-to-aerial image matching, and c) image-to-3D coarse search and fine alignment. For each topic, this tutorial will describe state-of-the-art methods utilizing hand-designed feature descriptors, pre-trained CNN based features and learning-based approaches (such as deep embeddings, graph based matching techniques, semantic segmentation, and Generative Adversarial Networks).  For practical applications, the vision-based geo-localization techniques must work for images taken at different times of the day and year, and across different seasons and weather conditions.

Program agenda

TUTORIALS TO TAKE PLACE ON JUNE 20th, 2021

+

Cross-view and Cross-modal Visual Geo-Localization

2:15 PM – 3:30 PM USA EST

+

Visual Localization Cross time CVPR 2021 Tutorial

3:30 PM – 4:15 PM USA EST

+

Cross-View Geo-Localization: Ground-to-Aerial Image Matching

4:15 PM – 4:45 PM USA EST

+

Large Scale Cross View Image Geo-localization

4:45 PM – 6:00 PM USA EST

+

Cross-Modal Geo-Localization: Image-to-3D Coarse Search & Fine Alignment

Tutorials

The tutorials will take place on June 20th, 2021, and will consist of 5 lectures, as detailed below:

CVPR 2021 tutorial on Cross-view and Cross-modal Visual Geo-Localization

2:00 PM – 2:15 PM USA EST

Abstract: This lecture will provide the general context for this new and emerging topic. It presents the aims of image-based geo-localization, the challenges and the important issues to be tackled for matching images to geo-reference databases across viewpoints, weather/ long-term time differences and modalities. It will also describe a typical visual geo-localization system in details, for both industrial and consumer applications such as autonomous navigation and wide area, outdoor augmented reality.

Speakers: Han-Pang Chiu and Rakesh (Teddy) Kumar

View video
View presentation

Visual Localization Cross time CVPR 2021 Tutorial

2:15 PM – 3:30 PM USA EST

Abstract: This lecture will discuss techniques for long-term Visual Localization in real-world environments, which are required to operate under (1) extreme perceptual changes such as weather and illumination (matching day images to night reference etc.), and (2) dynamic scene changes and occlusions such as from moving vehicles and people. The lecture will focus on state-of-the-art matching techniques using deep learning based methods including semantic scene segmentation, attention networks, triplet loss and embeddings.

Speaker: Rakesh (Teddy) Kumar

View video
View presentation

Cross-View Geo-Localization: Ground-to-Aerial Image Matching

3:30 PM – 4:15 PM USA EST

Abstract:  The lecture includes the essential knowledge about how we estimate the geo-location of a query street view image in a structured database of city-wide aerial reference images with known GPS coordinates.  It will start with a brief review of traditional geo-localization involving DOQ, DEM and meta data. Then it will present computer vision methods employing either traditional hard-craft features or learning based features with view-invariant descriptors for ground-to-aerial image matching. It will end with recent methods employing deep learning for cross-view image geo-localization employing Generative Adversarial Networks.

Speakers: Mubarak Shah

View video
View presentation

Large Scale Cross View Image Geo-localization

4:15 PM – 4:45 PM USA EST

Abstract: Image-based geo-localization aims at providing image-level GPS location by matching a query street/ground image with the GPS-tagged images in a reference dataset.  I will present our recent effort on cross-view image geo-localization using advanced deep learning algorithms and provide a visual explanation method to interpret the learning outcome for cross-view image matching. Then I will present the cross-view image geo-localization in a more realistic and challenging setting that breaks the one-to-one retrieval setting of existing datasets. To bridge the gap between this realistic setting and existing datasets, we propose a new large-scale benchmark –VIGOR– for cross View Image Geo-localization beyond One-to-one Retrieval. We benchmark existing state-of-the-art methods and present a novel end-to-end framework to localize the query in a coarse-to-fine manner.

Speaker: Chen Chen

View video
View presentation

Cross-Modal Geo-Localization- Image-to-3D Coarse Search & Fine Alignment

4:45 PM – 6:00 PM USA EST

Abstract:  Matching an image to different data modalities, that are simpler to be collected over a large area, opens more opportunities and possibilities for visual geo-localization. This lecture will focus on a practical data choice for cross-modal geo-localization: matching ground RGB images to aerial/ satellite based geo-referenced 3D data. It will describe both coarse search and fine alignment steps to this problem. For coarse search, it will illustrate both traditional hand-designed feature methods and new learning-based methods, including recent methods employing deep joint embedding for cross-model database retrieval. For fine alignment, it will go through details for various methods using different kinds of low-level appearance features and high-level semantic information.

Speaker: Han-Pang Chiu

View video
View presentation

rakesh-teddy-kumar-bio-pic

Rakesh “Teddy” Kumar

Vice President, Information and Computing Sciences and Director of the Center for Vision Technologies

Teddy is responsible for leading research and development of innovative end-to-end vision solutions from image capture to situational understanding that translate into real-world applications such as robotics, intelligence extraction and human computer interaction.

View profile

Dr_Mubarak_Shah

Mubarak Shah

Trustee Chair Professor of Computer Science, founding director of the Center for Research in Computer Vision at UCF

Mubarak’s research interests include: video surveillance, visual tracking, human activity recognition, visual analysis of crowded scenes, video registration, UAV video analysis, etc.

View profile

han-pang-chiu-bio-pic

Han-Pang Chiu

Senior Technical Manager of the Center for Vision Technologies

Han-Pang leads a research group to develop innovative solutions for real-world applications to navigation, mobile augmented reality and robotics.

View profile

Dr_Chen_Chen

Chen Chen

Department of Electrical and Computer Engineering University of North Carolina at Charlotte

Dr. Chen Chen’s research interests include signal and image processing, computer vision and deep learning. He published over 80 papers in refereed journals and conferences in these areas.

View profile


Virtual CVPR | June 19-25th, 2021

Virtual CVPR 2021

+


Relevant publications

ACM Multimedia Conference (MM), 2020, Best Paper finalist
October 12, 2020

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

Authors: Niluthpol Mithun, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar,

International Conference on Computer Vision (ICCV)
April 24, 2019

Bridging the Domain Gap for Ground-to-Aerial Image Matching

Authors: Krishna Regmi and Mubarak Shah

IEEE transactions on pattern analysis and machine intelligence
February 4th, 2017

Largescale image geo-localization using dominant sets.

Authors: Eyasu Zemene, Yonatan Tariku Tesfaye, Haroon Idrees, Andrea Prati, Marcello Pelillo, and Mubarak Shah

2019
British Machine Vision Conference (BMVC)

Semantically-Aware Attentive Neural Embeddings for Long-Term 2D Visual Localization

Authors: Zachary Seymour, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

2018
IEEE International Conference on Virtual Reality (VR)

Augmented Reality Driving Using Semantic Geo-Registration

Authors: Han-Pang Chiu, Varun Murali, Ryan Villamil, G. Drew Kessler, Supun Samarasekera, Rakesh Kumar

2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Cross-view image matching for geo-localization in urban environments

Authors: Yicong Tian, Chen Chen, and Mubarak Shah

2017
IEEE International Conference on Intelligent Transportation Systems (ITSC)

Utilizing semantic visual landmarks for precise vehicle navigation

Authors: Varun Murali, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

2016
IEEE International Conference on Intelligent Transportation Systems (ITSC)

Sub-meter vehicle navigation using efficient pre-mapped visual landmarks

Authors: Han-Pang Chiu, Mikhail Sizintsev, Xun S. Zhou, Philip Miller, Supun Samarasekera, Rakesh Kumar

2014
IEEE transactions on pattern analysis and machine intelligence

Image geolocalization based on multiplenearest neighbor feature matching using generalized graphs

Authors: Amir Roshan Zamir and Mubarak Shah

2011
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)

High-precision localization using visual landmarks fused with range data

Authors: Zhiwei Zhu, Han-Pang Chiu, Taragay Oskiper, Saad Ali, Raia Hadsell, Supun Samarasekera, Rakesh Kumar

Best Paper Award
2011
IEEE International Conference on Virtual Reality (VR)

Stable vision-aided navigation for large-scale augmented reality

Authors: Taragay Oskiper, Han-Pang Chiu, Zhiwei Zhu, Supun Samarasekera, Rakesh Kumar

2010
European Conference on Computer Vision (ECCV)

Accurate image localization based on google maps street view

Authors: Amir Roshan Zamir and Mubarak Shah

2008
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)

Building segmentation for densely built urban regions using aerial LIDAR data

Authors: Bogdan C. Matei, Harpreet S. Sawhney, Supun Samarasekera, Janet Kim, Rakesh Kumar

2006
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)

3D Building Detection and Modeling from Aerial LIDAR Data

Authors: Vivek Verma, Rakesh Kumar, Stephen C. Hsu

2006
IEEE International Conference on Pattern Recognition (ICPR)

A Heterogeneous Feature-based Image Alignment Method

Authors: Cen Rao, Yanlin Guo, Harpreet S. Sawhney, Rakesh Kumar

2001
International Conference on Computer Vision (ICCV)

Video Georegistration: Algorithm and Quantitative Evaluation

Authors: Richard P. Wildes, David J. Hirvonen, Steven C. Hsu, Rakesh Kumar, W. Brian Lehman, Bogdan Matei, Wen-Yi Zhao

2001
IEEE International Conference on Pattern Recognition (ICPR)

Registration of Highly-Oblique and Zoomed in Aerial Video to Reference Imagery

Authors: Rakesh Kumar, Supun Samarasekera, Steven C. Hsu, Keith J. Hanna

1998
EEE International Conference on Pattern Recognition (ICPR)

Registration of video to geo-referenced imagery

Authors: Rakesh Kumar, Harpreet S. Sawhney, Jane C. Asmuth, Art Pope, S. Hsu

2021
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval

Authors: Sijie Zhu, Taojiannan Yang, Chen Chen

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International