Dept. of Medical Informatics and Clinical Epidemiology
Oregon Health & Science University
Todayâs genomics data requires a great deal of preprocessing before it can be utilized in analysis of biological questions. This work details the steps and requirements for processing genome wide association studies (GWAS) in preparation for analysis. The scripting language âPythonâ is employed to open and read files of genomic datasets including phenotypic data, genotypic data, and demographics data, of a GWAS performed by the Harvard Brain Tissue Resource Center (HBTRC) as well as the Alzheimerâs Disease Neuroimaging Initiative (ADNI). The data filesâ subjects remain deidentified. These raw files are then processed by the scripting language âPythonâ to create hypothesis dependent edited versions of those files suitable for use in a bioinformatics investigative genomics study. Exploratory data analysis (EDA) is performed using âRâ to describe the datasets and explore their suitability for the investigative study, including simple graphs. Reasons for dataset rejection as well as accept
School of Medicine
Williamson, Rex M., "Preparation of a genomic dataset for an investigative project : ""discovery of sporatic Alzheimer's disease implicated gene variant through analysis of epistasis within pathways""" (2013). Scholar Archive. 927.