May 2012

Document Type


Degree Name



Dept. of Medical Informatics and Clinical Epidemiology


Oregon Health & Science University


Entity identification is the process of finding semantically related records in disparate databases. In the absence of a global unique identifier, determining which of the different records pertain to the same entity can be difficult. Disparate databases within an organization represent a significant barrier to the use of that organization’s data. Central City Concern is a multifaceted service organization which assists the homeless population of Portland, Oregon. Multiple different services are provided by and at different facilities. Over time, each facility independently developed individual mechanisms and procedures for collecting and storing client data. As a result, no cohesive method exists either to aggregate the organization’s data or to identify multiple records for an individual across facilities. An algorithm was developed that uses deterministic matching techniques to solve the problem of entity identification in the organization’s different databases. This algorithm will be used to construct a master index that will link each of the facilities’ internal identifiers for an individual client. The algorithm was used to classify a typical dataset against the organization’s electronic health record data, and manual review demonstrated that the algorithm correctly categorized more than 99% of the records.




School of Medicine



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.