Ashish Gehani

Senior Principal Computer Scientist, Computer Science Laboratory

Ashish Gehani has transformed data provenance from an academic concept into a practical foundation for cyber threat detection, digital forensics, and trustworthy computing. Gehani created an open-source infrastructure that has become widely used for data provenance collection and management research. With NSF’s 2007–2021 support, SPADE-v2 introduced a novel provenance kernel that decouples collection, storage, and querying of lineage metadata. The system runs on multiple operating systems, collects provenance from diverse sources, and enables storage in a range of formats and database types. It won the ACM/IFIP Middleware 2022 Test of Time Award.

As provenance graphs grew to billions of vertices and edges, Gehani developed novel techniques for handling the “big provenance” in domains such as operating systems and blockchains. These innovations included provenance sketches for accelerating distributed queries, compression techniques that reduce storage cost and time by an order of magnitude, and efficient querying during graph ingestion of terabyte-scale provenance repositories.

Building on SPADE’s foundation, Gehani architected the TRACE system in DARPA’s 2015–2020 Transparent Computing program, which tackled Advanced Persistent Threat detection. This work increased the precision of data provenance graphs to enable tracking of sophisticated multi-stage attacks that evaded earlier security tools. Gehani then led the adaptation of SPADE to microservice environments, where containerization can lead to false and missing dependencies. These extensions ensured sound and complete provenance tracking. A variant of SPADE was licensed to AccuKnox, a cloud-native security venture.

Gehani has conducted a decade-long research program in software specialization and data debloating, creating tools to reduce the attack surface and storage footprint of deployed applications. Noting that the modern software stack offers much functionality that is never used in a specific deployment, his 2013–2021 ONR-sponsored OCCAM-v2 and Trimmer projects reduced the space of possible runtime behavior in targeted code. Additionally, Gehani’s 2022–2026 NASA-supported research exploited application I/O access patterns to eliminate unused content. By leveraging the semantics of self-describing data formats, subsets can be carved that provide significant storage reduction as well as robust reuse in the face of varied program input.

Gehani designed the security architecture and cryptographic framework of the ENCODERS peer-to-peer publish-subscribe service. Developed in DARPA’s 2011–2014 Content-Based Mobile Edge Networking program, the system ensures resilient group communication even when disconnected from the Internet. Gehani designed protocols to enable decentralized secure content routing with dynamic provisioning of cryptographic attributes. This is essential for operations where centralized services may be compromised or unavailable. Gehani created a framework that lets publishers and subscribers scope access to content tags and interests, respectively, using cryptographic access policies to balance privacy and performance concerns. This allows the set of devices that can serve as relays to be adjusted dynamically, to better support network resilience.

Prior to joining SRI, Gehani studied super-resolution video, DNA computing, and risk-based intrusion prevention at Duke University, and decentralized authentication and authorization at the University of Notre Dame. He holds a PhD in computer science from Duke University and a BS in mathematics from the University of Chicago. In 2025, Gehani became an SRI Fellow.

A list of peer-reviewed publications is accessible here.