Natural language processing for cohort discovery in a discharge prediction model for the neonatal ICU

Michael W. Temple, Christoph U. Lehmann, Daniel Fabbri

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


Objectives: Discharging patients from the Neonatal Intensive Care Unit (NICU) can be delayed for non-medical reasons including the procurement of home medical equipment, parental education, and the need for children’s services. We previously created a model to identify patients that will be medically ready for discharge in the subsequent 2–10 days. In this study we use Natural Language Processing to improve upon that model and discern why the model performed poorly on certain patients. Methods: We retrospectively examined the text of the Assessment and Plan section from daily progress notes of 4,693 patients (103,206 patient-days) from the NICU of a large, academic children’s hospital. A matrix was constructed using words from NICU notes (single words and bigrams) to train a supervised machine learning algorithm to determine the most important words differentiating poorly performing patients compared to well performing patients in our original discharge prediction model. Results: NLP using a bag of words (BOW) analysis revealed several cohorts that performed poorly in our original model. These included patients with surgical diagnoses, pulmonary hypertension, retinopathy of prematurity, and psychosocial issues. Discussion: The BOW approach aided in cohort discovery and will allow further refinement of our original discharge model prediction. Adequately identifying patients discharged home on g-tube feeds alone could improve the AUC of our original model by 0.02. Additionally, this approach identified social issues as a major cause for delayed discharge. Conclusion: A BOW analysis provides a method to improve and refine our NICU discharge prediction model and could potentially avoid over 900 (0.9%) hospital days.

Original languageEnglish (US)
Pages (from-to)101-115
Number of pages15
JournalApplied Clinical Informatics
Issue number1
StatePublished - Feb 24 2016
Externally publishedYes


  • Area under curve
  • Neonatal intensive care units
  • Patient discharge
  • ROC curve

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications
  • Health Information Management


Dive into the research topics of 'Natural language processing for cohort discovery in a discharge prediction model for the neonatal ICU'. Together they form a unique fingerprint.

Cite this