Using machine learning to impute legal status of immigrants in the National Health Interview Survey

Simon A. Ruhnke, Fernando A. Wilson, Jim P. Stimpson

Research output: Contribution to journalArticlepeer-review

Abstract

We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods.

Original languageEnglish (US)
Article number101848
JournalMethodsX
Volume9
DOIs
StatePublished - Jan 2022
Externally publishedYes

Keywords

  • Demography
  • Immigrant
  • Machine Learning
  • Population Health
  • Random Forest machine learning
  • Undocumented Immigrants
  • United States

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Medical Laboratory Technology

Fingerprint

Dive into the research topics of 'Using machine learning to impute legal status of immigrants in the National Health Interview Survey'. Together they form a unique fingerprint.

Cite this