Strategies for building a Farsi-English SMT system from limited resources


A. Kathol and J. Zheng, “Strategies for building a Farsi-English SMT system from limited resources,” in Proc. 9th Annual Conference of the International Speech Communication Association 2008 (INTERSPEECH 2008), pp. 2731–2731.


One of the recent tasks for machine translation research has been development of translation capabilities in a time frame as short as 100 days. Such a task requires developers to consider what can be done with relatively small amounts of data in a small time frame. This inherently limits the type and complexity of the effort to be devoted to this task. In this paper we will focus on the kinds of improvements for a Farsi-to- English translation system achieved by means of algorithmic changes, adding raw, domain-unspecific resources, and unsupervised morphological segmentation. The cumulative effect of these measures has been an improvement in BLEU scores of about 25 pct. relative on an internal test set.

