Assessing race and ethnicity data quality across cancer registries and EMRs in two hospitals

Simon J Craddock Lee, James E. Grobe, Jasmin A. Tiro

Research output: Contribution to journalArticlepeer-review

34 Scopus citations


Background Measurement of patient race/ethnicity in electronic health records is mandated and important for tracking health disparities. Objective Characterize the quality of race/ethnicity data collection efforts. Methods For all cancer patients diagnosed (2007-2010) at two hospitals, we extracted demographic data from five sources: 1) a university hospi- tal cancer registry, 2) a university electronic medical record (EMR), 3) a community hospital cancer registry, 4) a community EMR, and 5) a joint clinical research registry. The patients whose data we examined (N = 17 834) contributed 41 025 entries (range: 2-5 per patient across sources), and the source comparisons generated 1-10 unique pairs per patient. We used generalized estimating equations, chi-squares tests, and kappas estimates to assess data availability and agreement. Results Compared to sex and insurance status, race/ethnicity information was significantly less likely to be available (x2 >8043, P<.001), with variation across sources (x2 >10 589, P<.001). The university EMR had a high prevalence of "Unknown" values. Aggregate kappa estimates across the sources was 0.45 (95% confidence interval, 0.45-0.45; N = 31 276 unique pairs), but improved in sensitivity analyses that excluded the university EMR source (k = 0.89). Race/ethnicity data were in complete agreement for only 6988 patients (39.2%). Pairs with a "Black" data value in one of the sources had the highest agreement (95.3%), whereas pairs with an "Other" value exhibited the lowest agreement across sour- ces (11.1%). Discussion Our findings suggest that high-quality race/ethnicity data are attainable. Many of the "errors" in race/ethnicity data are caused by missing or "Unknown" data values. Conclusions To facilitate transparent reporting of healthcare delivery outcomes by race/ethnicity, healthcare systems need to monitor and enforce race/ethnicity data collection standards.

Original languageEnglish (US)
Pages (from-to)627-634
Number of pages8
JournalJournal of the American Medical Informatics Association
Issue number3
StatePublished - May 2016


  • Cancer registry
  • Data quality
  • Electronic medical record
  • Race and ethnicity

ASJC Scopus subject areas

  • Health Informatics


Dive into the research topics of 'Assessing race and ethnicity data quality across cancer registries and EMRs in two hospitals'. Together they form a unique fingerprint.

Cite this