W. Wang, “Weakly supervised training for parsing Mandarin broadcast transcripts,” in Proc. 9th Annual Conference of the International Speech Communication Association 2008 (INTERSPEECH 2008), pp. 2446–2449.
We present a systematic investigation of applying weakly supervised co-training approaches to improve parsing performance for parsing Mandarin broadcast news (BN) and broadcast conversation (BC) transcripts, by iteratively retraining two competitive Chinese parsers from a small set of treebanked data and a large set of unlabeled data. We compare co-training to self-training, and our results show that performance using co-training is significantly better than with self-training and both co-training and self-training with a small seed labeled corpus can improve parsing accuracy significantly over training on the mismatching newswire treebank.