Oregon Health & Science University
This thesis addresses the problem of efficient search within the exponential space of possible parse trees generated by a weighted context-free grammar. Four novel probabilistic methods are presented to prioritize and prune the search space, all of which are agnostic to the structure and linguistic annotations of the underlying grammar. Each of the four models are conditioned on lexical information from the input sentence, and we argue that these cues are vital to effectively guide the search through the large solution space. We present empirical results for each of the four efficient search methods applied to multiple parsing architectures, including exact search with the CKY dynamic programming algorithm, globally prioritized search with a best-first graph-based algorithm, and inexact pruned beam-search. The combination of all four efficient search methods results in the fastest reported parsing time for high-accuracy English, Chinese, and German constituent parsing, at over 1,500 words per second with no F[subscript 1] labeled accuracy loss relative to the maximum likelihood solution. Observed run-time complexity for a sentence of length N is reduced from O(N[superscript 3) to O(N[superscript 1.5]) and our efficient search methods are over an order of magnitude faster than multi-level coarse-to-fine pruning.
Center for Spoken Language Understanding
School of Medicine
Bodenstab, Nathan Matthew, "Prioritization and pruning : efficient inference with weighted context-free grammars : a dissertaion" (2012). Scholar Archive. 859.