July 2012

Document Type


Degree Name



Oregon Health & Science University


This thesis addresses the problem of efficient search within the exponential space of possible parse trees generated by a weighted context-free grammar. Four novel probabilistic methods are presented to prioritize and prune the search space, all of which are agnostic to the structure and linguistic annotations of the underlying grammar. Each of the four models are conditioned on lexical information from the input sentence, and we argue that these cues are vital to effectively guide the search through the large solution space. We present empirical results for each of the four efficient search methods applied to multiple parsing architectures, including exact search with the CKY dynamic programming algorithm, globally prioritized search with a best-first graph-based algorithm, and inexact pruned beam-search. The combination of all four efficient search methods results in the fastest reported parsing time for high-accuracy English, Chinese, and German constituent parsing, at over 1,500 words per second with no F[subscript 1] labeled accuracy loss relative to the maximum likelihood solution. Observed run-time complexity for a sentence of length N is reduced from O(N[superscript 3) to O(N[superscript 1.5]) and our efficient search methods are over an order of magnitude faster than multi-level coarse-to-fine pruning.




Center for Spoken Language Understanding


School of Medicine



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.