Abstract
Background: Peripheral artery disease (PAD) is underdiagnosed due to poor patient and clinician awareness. Despite this, no widely accepted PAD screening is recommended. Objectives: The authors used machine learning to develop an automated risk stratification tool for identifying patients with a high likelihood of PAD. Methods: Using data from the electronic health record (EHR), ankle-brachial indices (ABIs) were extracted for 3,298 patients. In addition to ABI, we extracted 60 other patient characteristics and used a random forest model to rank the features by association with ABI. The model identified several features independently correlated with PAD. We then built a logistic regression model to predict PAD status on a validation set of patients (n = 1,089), an external cohort of patients (n = 2,922), and a national database (n = 2,488). The model was compared to an age-based and random forest model. Results: The model had an area under the curve (AUC) of 0.68 in the validation set. When evaluated on an external population using EHR data, it performed similarly with an AUC of 0.68. When evaluated on a national database, it had an AUC of 0.72. The model outperformed an age-based model (AUC: 0.62; P < 0.001). A random forest model with inclusion of all 60 features did not perform significantly better (AUC: 0.71; P = 0.31). Conclusions: Statistical techniques can be used to build models which identify individuals at high risk for PAD using information accessible from the EHR. Models such as this may allow large health care systems to efficiently identify patients that would benefit from aggressive preventive strategies or targeted-ABI screening.
Original language | English (US) |
---|---|
Article number | 100566 |
Journal | JACC: Advances |
Volume | 2 |
Issue number | 7 |
DOIs | |
State | Published - Sep 2023 |
Keywords
- ABI
- linear models
- machine learning
- prediction
- risk assessment
ASJC Scopus subject areas
- Cardiology and Cardiovascular Medicine
- Dentistry (miscellaneous)