TY - JOUR
T1 - Machine Learning Improves Prediction Over Logistic Regression on Resected Colon Cancer Patients
AU - Leonard, Grey
AU - South, Charles
AU - Balentine, Courtney J
AU - Porembka, Matthew
AU - Mansour, John
AU - Wang, Sam
AU - Yopp, Adam
AU - Polanco, Patricio
AU - Zeh, Herbert
AU - Augustine, Mathew
N1 - Funding Information:
We would like to thank the Department of Statistical Science at Southern Methodist University for their constant feedback, expert training, and mentoring in this project.
Publisher Copyright:
© 2022 Elsevier Inc.
PY - 2022/7
Y1 - 2022/7
N2 - Introduction: Despite advances, readmission and mortality rates for surgical patients with colon cancer remain high. Prediction models using regression techniques allows for risk stratification to aid periprocedural care. Technological advances have enabled large data to be analyzed using machine learning (ML) algorithms. A national database of colon cancer patients was selected to determine whether ML methods better predict outcomes following surgery compared to conventional methods. Methods: Surgical colon cancer patients were identified using the 2013 National Cancer Database (NCDB). The negative outcome was defined as a composite of 30-d unplanned readmission and 30- and 90-d mortality. ML models, including Random Forest and XGBoost, were built and compared with conventional logistic regression. For the accounting of unbalanced outcomes, a synthetic minority oversampling technique (SMOTE) was implemented and applied using XGBoost. Results: Analysis included 528,060 patients. The negative outcome occurred in 11.6% of patients. Model building utilized 30 variables. The primary metric for model comparison was area under the curve (AUC). In comparison to logistic regression (AUC 0.730, 95% CI: 0.725-0.735), AUC's for ML algorithms ranged between 0.748 and 0.757, with the Random Forest model (AUC 0.757, 95% CI: 0.752-0.762) outperforming XGBoost (AUC 0.756, 95% CI: 0.751-0.761) and XGBoost using SMOTE data (AUC 0.748, 95% CI: 0.743-0.753). Conclusions: We show that a large registry of surgical colon cancer patients can be utilized to build ML models to improve outcome prediction with differential discriminative ability. These results reveal the potential of these methods to enhance risk prediction, leading to improved strategies to mitigate those risks.
AB - Introduction: Despite advances, readmission and mortality rates for surgical patients with colon cancer remain high. Prediction models using regression techniques allows for risk stratification to aid periprocedural care. Technological advances have enabled large data to be analyzed using machine learning (ML) algorithms. A national database of colon cancer patients was selected to determine whether ML methods better predict outcomes following surgery compared to conventional methods. Methods: Surgical colon cancer patients were identified using the 2013 National Cancer Database (NCDB). The negative outcome was defined as a composite of 30-d unplanned readmission and 30- and 90-d mortality. ML models, including Random Forest and XGBoost, were built and compared with conventional logistic regression. For the accounting of unbalanced outcomes, a synthetic minority oversampling technique (SMOTE) was implemented and applied using XGBoost. Results: Analysis included 528,060 patients. The negative outcome occurred in 11.6% of patients. Model building utilized 30 variables. The primary metric for model comparison was area under the curve (AUC). In comparison to logistic regression (AUC 0.730, 95% CI: 0.725-0.735), AUC's for ML algorithms ranged between 0.748 and 0.757, with the Random Forest model (AUC 0.757, 95% CI: 0.752-0.762) outperforming XGBoost (AUC 0.756, 95% CI: 0.751-0.761) and XGBoost using SMOTE data (AUC 0.748, 95% CI: 0.743-0.753). Conclusions: We show that a large registry of surgical colon cancer patients can be utilized to build ML models to improve outcome prediction with differential discriminative ability. These results reveal the potential of these methods to enhance risk prediction, leading to improved strategies to mitigate those risks.
KW - Colon cancer
KW - Machine learning
KW - Outcomes
KW - Prediction
KW - Risk
UR - http://www.scopus.com/inward/record.url?scp=85126127848&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126127848&partnerID=8YFLogxK
U2 - 10.1016/j.jss.2022.01.012
DO - 10.1016/j.jss.2022.01.012
M3 - Article
C2 - 35287027
AN - SCOPUS:85126127848
SN - 0022-4804
VL - 275
SP - 181
EP - 193
JO - Journal of Surgical Research
JF - Journal of Surgical Research
ER -