D. Hakkani-Tur and G. Tur, “Statistical Sentence Extraction for Information Distillation,” 2007 IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP ’07, 2007, pp. IV-1-IV-4, doi: 10.1109/ICASSP.2007.367148.
Information distillation aims to extract the most useful pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. One critical component in a distillation engine is detecting sentences to be extracted from each relevant document. In this paper,we present a statistical sentence extraction approach for distillation. Basically, we frame this task as a classification problem, where each candidate sentence in documents is classified as relevant to the query or not. These documents may be in textual or audio format and in a number of languages. For audio documents,we use both manual and automatic transcriptions, for non-English documents,we use automatic translations. In this work, we use AdaBoost, a discriminative classification method with both lexical and semantic features. […]