Hwang, M. Y., Lei, X., Wang, W., & Shinozaki, T. (2006). Investigation on Mandarin broadcast news speech recognition. WASHINGTON UNIV SEATTLE DEPT OF ELECTRICAL ENGINEERING.
This paper describes our efforts in building a competitive Mandarin broadcast news speech recognizer. We successfully incorporated the most popular speech technologies into our system. More importantly, we present two novel algorithms in smoothing pitch features and segmenting Chinese characters into word units. In addition, we propose to borrow the idea of mutual information greedy merge algorithm for creating a Chinese word lexicon automatically. Our final system achieved 6.0% character error rate (CER) on dev04 and 16.0% on eval04, with simpler acoustic models, less data, and simpler decoding architecture compared with other state-of-the-art systems, yet was equally competitive.
Index terms: Mandarin speech recognition, character error rate, pitch smoothing, word segmentation.