Ted Laderas


June 2004

Document Type


Degree Name



Dept. of Medical Informatics and Clinical Epidemiology


Oregon Health & Science University


Microarray experiments offer the user the potential to monitor gene expression across thousands of genes at once. However, researchers are often left with a dimensionality problem - too few technological and biological replicates, and thousands of genes to monitor for differential expression. [1] The problem of finding interesting and novel genes within the thousands of genes on a microarray can seem akin looking for a needle in a haystack of needles. One highly popular approach to finding interesting genes for further study is finding similar patterns of expression within the data. A common hypothesis-generating approach, clustering, has been shown to have much potential in finding genes with similar function. However, there are a variety of clustering methods, and each has different strengths and weaknesses in finding patterns within microarray data. In this thesis, I will first discuss some issues with acquiring and normalizing microarray data, which will be useful in discussing clustering methods. I will then discuss three types of clustering methods, namely hierarchical, partitional, and model-based clustering methods, highlighting the strengths and weaknesses of each approach. For microarray data that has subtle changes in expression across samples, different methods may give different answers, a point that is often overlooked. The underlying research question is to determine an effective way of comparing results across methods. This leads to my project goal, which is to develop and validate an evaluation framework for comparing clustering methods. Clustering methods used to develop this framework will be discussed, including program design and a review of the metrics used to evaluate the clusterings. Validation of this evaluation framework will also be discussed, utilizing both simulated and real data. Interpretation of the results indicate that the tool has potential in finding consensus between clustering methods. The tool allows users to take appropriate caution interpreting a cluster if a gene of interest is only clustering with other known genes in one method. Finally, some possible future directions outside of the scope of the present study will be discussed.




School of Medicine



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.