Date

March 2013

Document Type

Capstone

Degree Name

M.B.I.

Department

Dept. of Medical Informatics and Clinical Epidemiology

Institution

Oregon Health & Science University

Abstract

Today’s genomics data requires a great deal of preprocessing before it can be utilized in analysis of biological questions. This work details the steps and requirements for processing genome wide association studies (GWAS) in preparation for analysis. The scripting language ‘Python’ is employed to open and read files of genomic datasets including phenotypic data, genotypic data, and demographics data, of a GWAS performed by the Harvard Brain Tissue Resource Center (HBTRC) as well as the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The data files’ subjects remain deidentified. These raw files are then processed by the scripting language ‘Python’ to create hypothesis dependent edited versions of those files suitable for use in a bioinformatics investigative genomics study. Exploratory data analysis (EDA) is performed using ‘R’ to describe the datasets and explore their suitability for the investigative study, including simple graphs. Reasons for dataset rejection as well as accept

Identifier

doi:10.6083/M46W983Q

School

School of Medicine

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.