Date

July 2012

Document Type

Dissertation

Degree Name

Ph.D.

Institution

Oregon Health & Science University

Abstract

This thesis addresses the problem of efficient search within the exponential space of possible parse trees generated by a weighted context-free grammar. Four novel probabilistic methods are presented to prioritize and prune the search space, all of which are agnostic to the structure and linguistic annotations of the underlying grammar. Each of the four models are conditioned on lexical information from the input sentence, and we argue that these cues are vital to effectively guide the search through the large solution space. We present empirical results for each of the four efficient search methods applied to multiple parsing architectures, including exact search with the CKY dynamic programming algorithm, globally prioritized search with a best-first graph-based algorithm, and inexact pruned beam-search. The combination of all four efficient search methods results in the fastest reported parsing time for high-accuracy English, Chinese, and German constituent parsing, at over 1,500 words per second with no F[subscript 1] labeled accuracy loss relative to the maximum likelihood solution. Observed run-time complexity for a sentence of length N is reduced from O(N[superscript 3) to O(N[superscript 1.5]) and our efficient search methods are over an order of magnitude faster than multi-level coarse-to-fine pruning.

Identifier

doi:10.6083/M40P0X2S

Division

Center for Spoken Language Understanding

School

School of Medicine

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.