September 2010

Document Type


Degree Name



Dept. of Medical Informatics and Clinical Epidemiology


Oregon Health & Science University


Identification of biologically relevant high-occupancy transcription factor binding sites (TFBS) in silico has historically been a difficult problem with a high error rate. Methods which utilize information in addition to the sequence of binding sites (e.g. chromatin information) have been shown to improve performance over strictly sequence-based methods; however, a number of questions about such methods remain unanswered: whether such models are suitable for multiple transcription factors, whether a general model or generalizable approach to the problem is possible, and what the effect of such prediction on biological inference is. In this work, we construct and evaluate a number of classifiers of position weight matrix-predicted TFBS (“occupancy classifiers”) based on four distinct transcription factors and demonstrate that such classifiers identify biochemically confirmed high-occupancy sites at a high rate. I contrast and compare the algorithms and predictors used by these classifi




School of Medicine



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.