• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • QED-C
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Home » Archives for Victor Abrash
Victor Abrash

Victor Abrash

Senior Computer Scientist, Speech Technology and Research Laboratory
Go to bio page

Publications

Biomedical sciences publications May 1, 2015 Article

Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems

Harry Bratt, Colleen Richey, Horacio Franco, Victor Abrash, Kristin Precoda

We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features.

Speech & natural language publications May 1, 2014 Conference Paper

Lexical Stress Classification for Language Learning Using Spectral and Segmental Features

Victor Abrash, Kristin Precoda, Horacio Franco, Harry Bratt, Colleen Richey

We present a system for detecting lexical stress in English words spoken by English learners.  The system uses both spectral and segmental features to detect three levels of stress for each syllable in a word. 

Speech & natural language publications December 1, 2011 Conference Paper

SRILM at sixteen: Update and outlook

Victor Abrash

We review developments in the SRI Language Modeling Toolkit (SRILM) since 2002, when a previous paper on SRILM was published. 

Information & computer science publications July 1, 2010 Article

EduSpeak®: A Speech Recognition and Pronunciation Scoring Toolkit for Computer-Aided Language Learning Applications

Horacio Franco, Harry Bratt, Victor Abrash, Kristin Precoda

SRI International’s EduSpeak® system is a SDK that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology.

Speech & natural language publications September 1, 2005 Conference Paper

Robust Feature Compensation in Nonstationary and Multiple Noise Environments

Martin Graciarena, Horacio Franco, Victor Abrash

The probabilistic optimum filtering (POF) algorithm is a piece wise linear transformation of the noisy speech feature space into the clean speech feature space. In this work we extend the POF algorithm to allow a more accurate way to select noisy-to-clean feature mappings, by allowing different combinations of speech and noise to have combination-specific mappings selected depending on the observation. This is especially important in nonstationary environments, where different noise segments will result in different observations in the noisy feature space. Experimental results using stationary and nonstationary noises show the effectiveness of the proposed technique compared to the old approach. We also explored the use of the extended POF method to train a map with multiple noises in order to gain generalization over different noise types and be able to tackle unknown noise environments.

Speech & natural language publications September 1, 2003 Conference Paper

Development of Phrase Translation Systems for Handheld Computers: from Concept to Field

Horacio Franco, Kristin Precoda, Victor Abrash, Dimitra Vergyri

We describe the developement and conceptual evolution of handheld spoken phrase translation systems, beginning with an initial undirectional system for translation of English phrases, and later extending to a limited bidirectional phrase translation system between English and Pashto, a major language of Afghanistan. We review the challenges posed by such projects, such as the constraints imposed by the computational platform, to the limitations of the phrase translation approach when dealing with naive respondents. We discuss our proposed solutions, in terms of architecture, algorithms, and software features, as well as some field experience by users of initial prototypes.

Speech & natural language publications March 1, 2002 Conference Paper

DynaSpeak: SRI’s Scalable Speech Recognizer for Embedded and Mobile Systems

Horacio Franco, Victor Abrash

We introduce SRI’s new speech recognition engine,DynaSpeak(TM), which is characterized by its scalability and flexibility, high recognition accuracy, memory and speedefficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operationbased on integer arithmetic. These features are designed to address the needs of the fast-developing and changing domain ofembedded and mobile computing platforms.

Speech & natural language publications August 1, 2000 Conference Paper

The SRI EduSpeak(TM) System: Recognition and Pronunciation Scoring for Language Learning

Horacio Franco, Harry Bratt, Kristin Precoda, Victor Abrash

The EduSpeak(TM) system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology.

Speech & natural language publications September 1, 1997 Conference Paper

Mixture Input Transformations for Adaptation of Hybrid Connectionist Speech Recognizers

Victor Abrash

We extend the input transformation approach for adapting hybrid connectionist speech recognizers to allow multiple transformations to be trained. Previous work has shown the efficacy of the linear input transformation approach for speaker adaptation [1] [2] [3], but has focused only on training global transformations. This approach is clearly suboptimal since it assumes that a single transformation is appropriate for every region in the acoutic feature input space, that is, for every phonetic class, microphone, and noise level. In this paper, we propose a new algorithm to train mixtures of transformation networks (MTNs) in the hybrid connectionist recognition framework. This approach is based on the idea of partitioning the acoustic feature space into R regions and training an input transformation for each region. The transformations are combined probabilistically according to the degree to which the acoustic features belong to each region, where the combination weights are derived from a separate acoustic gating network (AGN). We apply the new algorithm to nonnative speaker adaptation, and present recognition results for the 1994 WSJ Spoke 3 development set. The MTN technique can also be used for noise or microphone robust recognition or for other nonspeech neural network pattern recognition problems.

  • Go to page 1
  • Go to page 2
  • Go to Next Page »

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2022 SRI International