April 2011

Document Type


Degree Name



Dept. of Medical Informatics and Clinical Epidemiology


Oregon Health & Science University


The Portland Alcohol Research Center (PARC) was established to investigate the genetic basis of alcohol dependence. One line of inquiry utilizes mouse strains that are widely divergent in alcohol-related behaviors. Decades of genetics research comparing mouse strains has identified many regions of the genome associated with such quantitative traits. These regions are called Quantitative Trait Loci (QTLs). Microarrays have been used to identify which genes within the QTLs are differentially expressed and are therefore potentially causal; however, genetic variants that affect probe hybridization lead to many false conclusions. Here, we used quantitative proteomics to compare brain striata between two mouse strains for which abundant QTL and transcriptomic data is available. The primary aims of this research were to (1) identify differentially expressed proteins that lie within QTLs and are therefore candidate causal proteins, (2) determine if genetic variants also lead to spurious results in quantitative proteomics, and (3) compare transcriptomic and proteomic datasets to determine their agreement. Of the 4,563 identified proteins (2.1% FDR), there were 1,807 quantifiable proteins families that exceeded minimum count cutoffs (Chapter 2). With four pooled biological replicates per strain, we used quantile normalization, ComBat (a package that adjusts for batch effects), and edgeR (a package for differential expression analysis of count data) to identify 101 differentially expressed families (Chapter 5), 84 of which had a coding region within one of genomic regions of interest identified by the Portland Alcohol Research Center (Chapter 7). Using stain-specific protein databases, we conclude that proteomics is more robust to sequence differences than microarrays; however, some proteins are significantly affected (Chapter 6). To generate stain-specific databases, we used genome sequence data combined with a complete protein database that contained all the putative genetic isoforms for each protein. While the increased proteome coverage in the databases led to 6.8% gain in peptide assignments compared to a non-redundant database (Chapter 4), it also necessitated the development of a strategy for grouping similar proteins due to a large number of shared peptides. Choosing an appropriate method for managing shared peptides was necessary before normalization and differential expression analysis could proceed (Chapter 3). In the final chapter (Chapter 8), we compared the proteomic data to transcriptomic data from three platforms: RNA-seq, Affymetrix microarray, and Illumina microarray, and found that absolute expression, fold changes, and significance levels observed in the protein data had low but significant correlations with those found in the transcript data. More than half of the differentially expressed proteins were also found to be differentially expressed at the transcript level.




School of Medicine



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.