An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities


Stolcke, A. (1994). An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. arXiv preprint cmp-lg/9411029.


We describe an extension of Earley’s parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (Viterbi) parse of the string; d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. Probabilities (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earley’s top-down control structure. It can process any context-free rule format without conversion to some normal form, and combine computations for (a) through(d) in a single algorithm. Finally, the algorithm has simple extensions for processing partially bracked inputs, and for finding partial parses and their likelihoods on ungrammatical inputs. 

Read more from SRI