U. Guz, S. Cuendet, D. Hakkani-Tür, and G. Tur, “Multi-view semi-supervised learning for dialog act segmentation of speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, pp. 320–329, Feb. 2010.
Sentence segmentation of speech aims at determining sentence boundaries in a stream of words as output by the speech recognizer. Typically, statistical methods are used for sentence segmentation. However, they require significant amounts of labeled data, preparation of which is time-consuming, labor-intensive, and expensive. This work investigates the application of multi-view semi-supervised learning algorithms on the sentence boundary classification problem by using lexical and prosodic information. The aim is to find an effective semi-supervised machine learning strategy when only small sets of sentence boundary-labeled data are available. We especially focus on two semi-supervised learning approaches, namely, self-training and co-training. We also compare different example selection strategies for co-training, namely, agreement and disagreement.[…]