TY - JOUR
T1 - Performance deterioration of deep learning models after clinical deployment
T2 - a case study with auto-segmentation for definitive prostate cancer radiotherapy
AU - Wang, Biling
AU - Dohopolski, Michael
AU - Bai, Ti
AU - Wu, Junjie
AU - Hannan, Raquibul
AU - Desai, Neil
AU - Garant, Aurelie
AU - Yang, Daniel X
AU - Nguyen, Dan
AU - Lin, Mu Han
AU - Timmerman, Robert
AU - Wang, Xinlei
AU - Jiang, Steve B.
N1 - Publisher Copyright:
© 2024 The Author(s). Published by IOP Publishing Ltd.
PY - 2024/6/1
Y1 - 2024/6/1
N2 - Our study aims to explore the long-term performance patterns for deep learning (DL) models deployed in clinic and to investigate their efficacy in relation to evolving clinical practices. We conducted a retrospective study simulating the clinical implementation of our DL model involving 1328 prostate cancer patients treated between January 2006 and August 2022. We trained and validated a U-Net-based auto-segmentation model on data obtained from 2006 to 2011 and tested on data from 2012 to 2022, simulating the model’s clinical deployment starting in 2012. We visualized the trends of the model performance using exponentially weighted moving average (EMA) curves. Additionally, we performed Wilcoxon Rank Sum Test and multiple linear regression to investigate Dice similarity coefficient (DSC) variations across distinct periods and the impact of clinical factors, respectively. Initially, from 2012 to 2014, the model showed high performance in segmenting the prostate, rectum, and bladder. Post-2015, a notable decline in EMA DSC was observed for the prostate and rectum, while bladder contours remained stable. Key factors impacting the prostate contour quality included physician contouring styles, using various hydrogel spacers, CT scan slice thickness, MRI-guided contouring, and intravenous (IV) contrast (p < 0.0001, p < 0.0001, p = 0.0085, p = 0.0012, p < 0.0001, respectively). Rectum contour quality was notably influenced by factors such as slice thickness, physician contouring styles, and the use of various hydrogel spacers. The quality of the bladder contour was primarily affected by IV contrast. The deployed DL model exhibited a substantial decline in performance over time, aligning with the evolving clinical settings.
AB - Our study aims to explore the long-term performance patterns for deep learning (DL) models deployed in clinic and to investigate their efficacy in relation to evolving clinical practices. We conducted a retrospective study simulating the clinical implementation of our DL model involving 1328 prostate cancer patients treated between January 2006 and August 2022. We trained and validated a U-Net-based auto-segmentation model on data obtained from 2006 to 2011 and tested on data from 2012 to 2022, simulating the model’s clinical deployment starting in 2012. We visualized the trends of the model performance using exponentially weighted moving average (EMA) curves. Additionally, we performed Wilcoxon Rank Sum Test and multiple linear regression to investigate Dice similarity coefficient (DSC) variations across distinct periods and the impact of clinical factors, respectively. Initially, from 2012 to 2014, the model showed high performance in segmenting the prostate, rectum, and bladder. Post-2015, a notable decline in EMA DSC was observed for the prostate and rectum, while bladder contours remained stable. Key factors impacting the prostate contour quality included physician contouring styles, using various hydrogel spacers, CT scan slice thickness, MRI-guided contouring, and intravenous (IV) contrast (p < 0.0001, p < 0.0001, p = 0.0085, p = 0.0012, p < 0.0001, respectively). Rectum contour quality was notably influenced by factors such as slice thickness, physician contouring styles, and the use of various hydrogel spacers. The quality of the bladder contour was primarily affected by IV contrast. The deployed DL model exhibited a substantial decline in performance over time, aligning with the evolving clinical settings.
KW - deep learning
KW - model performance deterioration
KW - radiotherapy
KW - segmentation
UR - http://www.scopus.com/inward/record.url?scp=85197458039&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197458039&partnerID=8YFLogxK
U2 - 10.1088/2632-2153/ad580f
DO - 10.1088/2632-2153/ad580f
M3 - Article
AN - SCOPUS:85197458039
SN - 2632-2153
VL - 5
JO - Machine Learning: Science and Technology
JF - Machine Learning: Science and Technology
IS - 2
M1 - 025077
ER -