Multi-modal data analytics

SRI develops multi-modal data analytics solutions using machine learning for a variety of applications. These include image and video search, activity and fine grained recognition, social media analytics and RF signal exploitation.

Image and video search

SRI has developed the Computer Vision AI Search Tool (CVAST) for rapidly building searchable-image and user-annotation AI training databases. The CVAST tool supports ingestion and rapid object clustering and annotation of common scene object features. A flexible image/attribute database allows users to search for related features within a vast collection of image sets.

Under the DARPA Visual Media Reasoning (VMR) program, SRI created visual exploitation and indexing tools to rapidly extract mission-relevant visual intelligence from large quantities of diverse, ill-defined, unstructured imagery captured from multiple adversary sources.

Activity recognition

Under the IARPA Automated Low-Level Analysis and Description of Diverse Intelligence Video (ALADDIN) () and Deep Intermodal Video Analytics (DIVA) (TARDIS-V) programs, CVT has developed large-scale video content-retrieval systems using machine learning-based indexing of content through automatic detection of concepts comprising objects, events, actions, locations and similar.

Video recognition software for activities (pushup by subject)

Fine-grained recognition

SRI has developed state of the art algorithms and systems for fine-grained classification of objects such as vehicles, natural objects and other classes.

Fine-grained recognition is a challenging task because small and localized differences between similar looking objects indicate the specific fine-grained label. At the same time, accurate recognition needs to discount spurious changes in appearance caused by occlusions, partial views and proximity to other clutter objects in scenes.

SRI has developed a novel multi-task deep network architecture that jointly optimizes both localization of parts and fine-grained recognition by learning from training data. We have developed memory and computational efficient algorithms for fine-grained recognition that can be easily embedded in mobile applications.

Social media analytics

Under the DARPA Social Media in Strategic Communication (SMISC), Computational Simulation of Online Social Behavior (SocSim) (SBIR M3I system), ONR CEROSS and AFRL Multimedia-Enhanced Social Media Analytics (MESA)programs, CVT has developed social media content analytics for seamless multi-way cross-platform retrieval between images, videos, text, and users using multimodal embedding of users and content in the same geometric space. Furthermore, CVT has developed a system that can detect the intent behind social media postings. Our work provides a framework for tracking the propagation of influence in social media.

Infographic data, technologies, and use cases small image

Recent work

August 14, 2017

Hierarchical Urban and Natural Terrain Exploitation and Reasoning

SRI is developing a system for the semi-automated geolocalization of metadata-free images and videos to find a location of interest.
August 14, 2017

SRI’s MatchStax Cross Platform Social media content retrieval platform
August 14, 2017

AURORA: Content-Guided Search of Diverse Videos

SRI is developing a novel search technology to quickly find events of interest in very large video collections.

Recent publications

more +

Published May 27, 2022

By Yi Yao

Time-Space Processing for Small Ship Detection in SAR

This paper presents a new 3D time-space detector for small ships in single look complex (SLC) synthetic aperture radar (SAR) imagery, optimized for small targets around 5-15 m long that are unfocused due to target motion induced by ocean surface waves.
Published March 16, 2020

By Andrew Silberfarb, John Byrnes, Ajay Divakaran

Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge into Deep Neural Networks

We introduce Deep Adaptive Semantic Logic (DASL), a novel framework for automating the generation of deep neural networks that incorporates user-provided formal knowledge to improve learning from data.
Published June 15, 2019

By Ajay Divakaran, Yi Yao

Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation

We propose novel Stacked Spatio-Temporal Graph Convolutional Networks (Stacked-STGCN) for action segmentation, i.e., predicting and localizing a sequence of actions over long videos.

Multi-modal data analytics

Image and video search

Activity recognition

Fine-grained recognition

Social media analytics

Recent work

Hierarchical Urban and Natural Terrain Exploitation and Reasoning

SRI’s MatchStax Cross Platform Social media content retrieval platform

AURORA: Content-Guided Search of Diverse Videos

Recent publications

Time-Space Processing for Small Ship Detection in SAR

Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge into Deep Neural Networks

Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation