Abstract
This paper describes the development of a concept annotation method for evaluating a narrow domain speech-to-speech translation system and discusses how the scores produced by that method relate to naïve human judgements about the quality of translations. The method is being developed as both a diagnostic tool and a performance metric for final system evaluation for the DARPA CAST program. The goal of this program is the creation of two-way, speech-to-speech language translation systems for narrow domains, including ‘first encounter’ medical care in field environments for Pashto, Farsi, Thai, and Mandarin.
Share this



