John Cadigan, Karan Sikka, Meng Ye, Martin Graciarena, “Resilient Data Augmentation Approaches to Multimodal Verification in the News Domain” Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops 2021.
With the advent of generative adversarial networks and misinformation in social media, there has been increased interest in multimodal verification. Image-text verification typically involves determining whether a caption and an image correspond with each other. Building on multimodal embedding techniques, we show that data augmentation via two distinct approaches improves results: entity linking and cross-domain local similarity scaling. We refer to the approaches as resilient because we show state-of-the-art results against manipulations specifically designed to thwart the exact multimodal embeddings we are using as the basis for all of our features.