Research
Commercialization
About
- People
- News & Stories
- Events
- Our History
- Contact
Innovating In
Careers
日本支社

Research
Commercialization
About
- People
- News & Stories
- Events
- Our History
- Contact
Innovating In
Careers
日本支社

Search sri.com

December 25, 2019

SRI Language Modeling toolkit

The SRI Language Modeling (SRILM) toolkit offers tools for building and applying statistical language models for use in speech recognition, statistical tagging and segmentation, and machine translation. The toolkit can be downloaded and used free of charge.

Components

A set of C++ class libraries implementing language models, supporting data structures, and miscellaneous utility functions
A set of executable programs built on top of the libraries to perform standard tasks such as training LMs and testing them on data, tagging, or segmenting text
A collection of miscellaneous scripts facilitating minor related tasks
The SRILM toolkit runs on UNIX and Windows platforms. It has been used in a great variety of statistical modeling applications. Others have published extensions to it that add new functionality. The toolkit greatly benefitted from its use and enhancements during workshops sponsored by Johns Hopkins University and Oregon Health and Science University’s Center for Spoken Language Understanding in 1995, 1996, 1997, and 2002.

Applications

The SRILM toolkit is widely used in the research community for tasks requiring statistical language modeling. Examples include:

SRI’s automatic speech recognition engine, DynaSpeak®
Freely available recognition systems such as RWTH-ASR, Julius, Sphinx, and LIUM
SailAlign, a tool for robust long speech-text alignment

Language modeling support for the Moses SMT system and the Joshua hierarchical phrase-based SMT system
Cunei machine translation platform
Integration with SYSTRAN’s Enterprise Server 7 machine translation product

MorphTagger, an HMM-based part-of-speech tagger for Semitic languages
Chinese word segmentation

Handwriting recognition
GiDoc (Gimp-based Interactive transcription of old text DOCuments)

RANKPEP: prediction of MHC-restricted ligands
Visualization of uncertainty

Carnegie Mellon University’s (CMU’s) Advanced Lab Speech Recognition and Understanding
CMU’s Machine Translation
Center for Spoken Language Understanding’s Automatic Speech Recognition and Hidden Markov Models
Mississippi State University’s Fundamentals of Speech Recognition
Stanford University’s Speech Recognition and Synthesis
University of Washington’s Introduction to Computational Linguistics

Visual Studio
Python
OCaml

Documentation
The SRILM toolkit is still under development. The documentation is also a work in progress. Best documented are the executable programs, scripts, and file formats, in the form of UNIX-style manual pages. The libraries are documented mostly in the source code.

An overview of what the software can do and its design philosophy can be found in the paper, “SRILM – An Extensible Language Modeling Toolkit”“, in Procedures of the International Conference on Spoken Language Processing, Denver, Colorado, September 2002. A 2011 paper, “SRILM at Sixteen: Update and Outlook”, summarizes updates to SRILM since the 2002 paper.

Get links to other papers and tutorials, and answers to frequently asked questions.

Terms of use
SRI has released the SRILM toolkit mainly because we have received requests for the software from several researchers around the world. There is no guarantee of support. It can be downloaded free of charge under an “open source community license”, meaning that you can use it freely for nonprofit purposes, as long as you share any changes you make with the rest of the user community. For other uses, inquire about commercial licensing opportunities.

Mailing list
Check the mailing list archive for past contributions.

Join Our Team

Build your own legacy

Explore careers

Hire Us

Solutions to your most complex challenges

Send an inquiry

Contact Us

General inquiries

Reach out

Get the latest news from SRI

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

DMCA

SRI Language Modeling toolkit

The SRI Language Modeling (SRILM) toolkit offers tools for building and applying statistical language models for use in speech recognition, statistical tagging and segmentation, and machine translation. The toolkit can be downloaded and used free of charge.

Components

Applications

Speech recognition .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

Machine translation .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

Tagging and segmentation .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

Document processing .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

Outside computational linguistics .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

Teaching .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

IDEs and language bindings .cls-1, .cls-2 { stroke-width: 0px; } .cls-2 { fill: #231f20; } .cls-1 { stroke-width: 0px; }

Read more from SRI

SRI-backed Valence AI raises $5M to integrate emotional intelligence into the trust stack

Tackling quantum scalability with NIST-backed QMEC

A thousand qubits in bloom, now let’s scale