Acoustic data sharing for Afghan and Persian languages

Citation

A. Mandal, D. Vergyri, M. Akbacak, C. Richey and A. Kathol, “Acoustic data sharing for Afghan and Persian languages,” in Proc. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011), pp.  4996–4999.

Abstract

In this work, we compare several known approaches for multilingual acoustic modeling for three languages, Dari, Farsi and Pashto, which are of recent geo-political interest.  We demonstrate that we can train a single multilingual acoustic model for these languages and achieve recognition accuracy close to that of monolingual (or language-dependent) models.  When only a small amount of training data is available for each of these languages, the multilingual model may even outperform the monolingual ones.  We also explore adapting the multilingual model to target language data, which are able to achieve improved automatic speech recognition (ASR) performance compared to the monolingual models for both large and small amounts of training data by 3% relative word error rate (WER).
Index Terms— multilingual acoustic modeling, language-independent acoustic modeling, languages of Afghanistan


Read more from SRI

  • An arid, rural Nevada landscape

    Can AI help us find valuable minerals?

    SRI’s machine learning-based geospatial analytics platform, already adopted by the USGS, is poised to make waves in the mining industry.

  • Two students in a computer lab

    Building a lab-to-market pipeline for education

    The SRI-led LEARN Network demonstrates how we can get the best evidence-based educational programs to classrooms and students.

  • Code reflected in a man's eyeglasses

    LLM risks from A to Z

    A new paper from SRI and Brazil’s Instituto Eldorado delivers a comprehensive update on the security risks to large language models.