Venkataraman, A., Stolcke, A., & Shriberg, E. (2002). Automatic dialog act labeling with minimal supervision.
For many natural language applications it is desirable to be able to automatically tag utterances according to their discourse function (dialog act), such as statement, question or acknowledgment. We investigate the problem of automatically tagging dialog acts when hand labeled training data is scarce. The tagging paradigm employed is a hidden Markov model in which dialog acts are states and utterances are observations, with N-gram language models as observation models. We show that bootstrapping from a small hand-labeled training set, combined with iterative relabeling of a larger unlabeled data set, is an effective approach for preserving accuracy under conditions of limited hand-labeled training data. The dialog act grammar that models the sequencing of dialog acts is found to be of paramount importance in this approach. We analyze the effect that lack of training data has on different dialog act types, and discuss implications for efficient data annotation.