November 2007

Document Type


Degree Name



Dept. of Computer Science and Engineering


Oregon Health & Science University


Genome-wide studies are sensitive to the quality of annotation data included for analyses and they often involve overlaying both computationally derived and experimentally generated data onto a genomic scaffold. A framework for successful integration of data from diverse sources needs to address, at a minimum, the conceptualization of the biological identity in the data sources, the relationship between the sources in terms of the data present, the independence of the sources and, any discrepancies in the data. The outcome of the process should either resolve or incorporate these discrepancies into downstream analyses. In this thesis we identify factors that are important in detecting errors within and between sources and present a generalized framework to detect discrepancies. An implementation of our workflow is used to demonstrate the utility of the approach in the construction of a genome-wide mouse transcription factor binding map and in the classification of Single nucleotide polymorphisms. We also present the impact of these discrepancies on downstream analyses. The framework is extensible and we discuss future directions including summarization of the discrepancies in a biological relevant manner.




OGI School of Science and Engineering



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.