Open Language Interface for Voice Exploitation (OLIVE)


The Open Language Interface for Voice Exploitation (OLIVE) speech processing system provides robust speech information extraction amid high levels of noise and distortion in real-world data.

AI algorithms underlying OLIVE enable the technology to:

  • Detect the presence of speech, not just an open channel (speech activity detection);
  • Find and/or track speakers of interest (speaker identification);
  • Detect languages and dialects from a set of languages of interest (language and dialect identification); and
  • Detect specific keywords and phrases (keyword spotting).

Graphical user interfaces in OLIVE enable close editing of audio files, enrollment of new speakers, scoring of segments, speech activity segmentation, and semi-supervised speaker diarization (identification of an individual person based on voice qualities).

Initially developed under the DARPA Robust Automatic Transcription of Speech (RATS) program, OLIVE is designed for easy integration into end-user applications. The technology is under continuous development and refinement based on user feedback.


  • Automatic detection of speech, speaker, keywords, and languages of interest from live streaming input or file-based audio
  • Automatic speaker segmentation of audio, labeling where each person speaksFunctions with high accuracy in tactical communications with high noise and across multiple channels

Key technologies

Speech Activity Detection (SAD)

  • Accurate on noisy operational audio
  • Detect speech, not just an open channel
  • Process hundreds of channels on low-powered hardware

Language Identification (LID)

  • Detect languages/dialects
  • Add new languages using collected audio

Speaker Identification (SID)

  • Find/track speakers of interest across time and channels
  • Add new speakers offline or live, with as little as 8 seconds of speech

Query by Example (QBE)

  • Language agnostic keyword spotting
  • Enroll key words/phrases offline or live with as little as a single example

Other technologies

Keyword Spotting (KWS)

Word detection in Spanish, Mandarin, Iraqi Arabic

Acoustic Event Detection (AED)

Detection of non-speech acoustic events including     whistling, barking, vehicles, gunshots

Speaker Diarization (DIA)

Separation and labeling of unknown speakers in multi-speaker conversations

Forensic Speaker Identification 

Close analysis speaker identification for forensic use


Batch processing/ data triage

  • Automatically discard files with no speech
  • Find files most likely to contain target language, speaker or keywords
  • Processing speed scales with available CPUs

Live streaming

  • Live monitoring of incoming audio streams
  • Save, search and review past audio
  • On-the-fly enrollment of new speakers
  • Up to 16 channels running SAD, SID, LID

Close waveform analysis

  • Forensic analysis of speech
  • Simple but powerful GUI for selecting and reviewing audio segments
  • Run any plugin on any selected selected segments

Read more from SRI