Date

August 2010

Document Type

Dissertation

Degree Name

Ph.D.

Department

Dept. of Science & Engineering

Institution

Oregon Health & Science University

Abstract

Pipeline systems, in which data is processed in stages with the output of one stage providing input to the next, are ubiquitous in the field of natural language processing (NLP) as well as many other research areas. The popularity of the pipeline system architecture is due to the utility of pipelines in reducing search complexity, increasing efficiency, and re-using system components. Despite the widespread use of pipelines, there has been little effort toward understanding their functionality and establishing a set of best practices for defining and improving these systems. Improvement techniques are usually discovered in an ad-hoc manner, difficult to generalize for implementation in other systems, and lacking in thorough systematic evaluation of the effects of the technique on the pipeline system. This dissertation identifies and generalizes shared aspects of pipeline systems across several different application areas, including parsing, speech recognition, machine translation, and image classification. A formal framework of pipeline systems is defined, based on these shared aspects, and the argument is made that pipeline improvement techniques derived from this framework will be more easily generalized for application to other pipeline systems. A systematic and thorough examination of the characteristics of constraints used within several different pipelines is conducted to determine the effects of these characteristics on pipeline performance. This dissertation will define quantitative metrics of constraint characteristics including diversity, regularity, density, and peakedness. Results will demonstrate that 1) current metrics of constraint quality (typically intrinsic one-best and oracle evaluations) are insufficient predictors of pipeline performance, and 2) several quantitative characteristics of the constraints can be systematically altered to a affect pipeline performance. The framework, general improvement techniques, and quantitative measures of the search space as provided by this dissertation, are important steps towards improving the comparison, analysis, and understanding of pipeline systems.

Identifier

doi:10.6083/M49K4860

Division

Div. of Biomedical Computer Science

School

School of Medicine

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.