Abstract
Information distillation aims to extract the most useful pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. One critical component in a distillation engine is detecting sentences to be extracted from each relevant document. In this paper,we present a statistical sentence extraction approach for distillation. Basically, we frame this task as a classification problem, where each candidate sentence in documents is classified as relevant to the query or not. These documents may be in textual or audio format and in a number of languages. For audio documents,we use both manual and automatic transcriptions, for non-English documents,we use automatic translations. In this work, we use AdaBoost, a discriminative classification method with both lexical and semantic features. […]
Share this



