Dept. of Medical Informatics and Clinical Epidemiology
Oregon Health & Science University
Objective: To compare the deduplication rate and accuracy of demographic record deduplication processes implemented by Oregon Immunization Alert's current and new deduplication systems. Methods: Evaluate the capabilities of the two demographic deduplication systems using a test set crated by the Centers for Disease Control (CDC) with known duplicate and non-duplicate records. Measure the duplicate record detection rate and accuracy along with the amount of time required by human intervention. Results: In the evaluation of the current system, we were able to deduplicate 84% of the duplicates in the test set. The accuracy of the current system was 99.7%. This process took a total time of 3.5 hours. In the evaluation of the new system, we were able to deduplicate 93% of the duplicates in the test set. The accuracy of the new system was 97.25%. The deduplication process took a total time of 12 seconds. Conclusion: Data quality is extremely important for the Oregon Immunization Alert Registry. The current system, which requires a great deal of human intervention, is a legacy system which is an amalgamation of several processes, performed by different staff members, all working with in-house tools that have become less and less effective as the workload had increased. The new system is an automated system which requires far less human intervention to maintain data quality and handle the deduplication process. In comparing the test results for both systems, the superior performance improvement of the new system should meet the registery's current and future data quality requirements while significantly lowering the need for human intervention.
School of Medicine
Goodman, Mark, "Comparision of deduplication methods between two immunization information systems" (2010). Scholar Archive. 359.