Dept. of Public Health & Preventive Medicine
Oregon Health & Science University
The U.S. government Behavioral Risk Factor Surveillance (BRFSS) survey is an important source of demographic and health data. As with many surveys, BRFSS has missing data resulting from non-response. Because it is impossible to know the true value of missing data, the accuracy of imputation methods for real missing data cannot be known. To solve this problem, I created artificially missing data for two demographic variables for which the originally missing amounts were relatively small: age and race/ethnicity. Proportion estimates for imputation methods at 5%, 10%, and 20% artificially missing were compared against proportion estimates for the same variables from other governmental surveys and against the baseline imputation estimates made at the originally missing amounts, which were between 1% and 3%. I compared and contrasted no imputation, BRFSS imputation methods, multiply imputed hotdeck, and multiply imputed model-based imputation. At each level, missing data were artificially created where the missingness depended on the missing value, where it depended on the value of covariates, and where it did not depend on anything measured by the survey. I found that no imputation was by some measures no worse and even marginally better than any imputation method compared. This thesis has limited scope, however, and caution is recommended before researchers using BRFSS or other survey data forego any attempt at using an imputation method.
School of Medicine
Moll, Philip Andrew, "A Comparison of Imputation Methods in the 2012 Behavioral Risk Factor Surveillance Survey" (2014). Scholar Archive. 3503.