Vision Transformer-Based Multilabel Survival Prediction for Oropharynx Cancer After Radiation Therapy

Meixu Chen; Kai Wang; Jing Wang

doi:10.1016/j.ijrobp.2023.10.022

Vision Transformer-Based Multilabel Survival Prediction for Oropharynx Cancer After Radiation Therapy

Meixu Chen, Kai Wang, Jing Wang

Research output: Contribution to journal › Article › peer-review

Abstract

Purpose: A reliable and comprehensive cancer prognosis model for oropharyngeal cancer (OPC) could better assist in personalizing treatment. In this work, we developed a vision transformer-based (ViT-based) multilabel model with multimodal input to learn complementary information from available pretreatment data and predict multiple associated endpoints for radiation therapy for patients with OPC. Methods and Materials: A publicly available data set of 512 patients with OPC was used for both model training and evaluation. Planning computed tomography images, primary gross tumor volume masks, and 16 clinical variables representing patient demographics, diagnosis, and treatment were used as inputs. To extract deep image features with global attention, we used a ViT module. Clinical variables were concatenated with the learned image features and fed into fully connected layers to incorporate cross-modality features. To learn the mapping between the features and correlated survival outcomes, including overall survival, local failure-free survival, regional failure-free survival, and distant failure-free survival, we employed 4 multitask logistic regression layers. The proposed model was optimized by combining the multitask logistic regression negative-log likelihood losses of different prediction targets. Results: We employed the C-index and area under the curve metrics to assess the performance of our model for time-to-event prediction and time-specific binary prediction, respectively. Our proposed model outperformed corresponding single-modality and single-label models on all prediction labels, achieving C-indices of 0.773, 0.765, 0.776, and 0.773 for overall survival, local failure-free survival, regional failure-free survival, and distant failure-free survival, respectively. The area under the curve values ranged between 0.799 and 0.844 for different tasks at different time points. Using the medians of predicted risks as the thresholds to identify high-risk and low-risk patient groups, we performed the log-rank test, the results of which showed significantly larger separations in different event-free survivals. Conclusion: We developed the first model capable of predicting multiple labels for OPC simultaneously. Our model demonstrated better prognostic ability for all the prediction targets compared with corresponding single-modality models and single-label models.

Original language	English (US)
Pages (from-to)	1123-1134
Number of pages	12
Journal	International Journal of Radiation Oncology Biology Physics
Volume	118
Issue number	4
DOIs	https://doi.org/10.1016/j.ijrobp.2023.10.022
State	Published - Mar 15 2024

ASJC Scopus subject areas

Radiation
Oncology
Radiology Nuclear Medicine and imaging
Cancer Research

Access to Document

10.1016/j.ijrobp.2023.10.022

Cite this

@article{e9ed829d3c8d42679cf9a84799d24bac,

title = "Vision Transformer-Based Multilabel Survival Prediction for Oropharynx Cancer After Radiation Therapy",

abstract = "Purpose: A reliable and comprehensive cancer prognosis model for oropharyngeal cancer (OPC) could better assist in personalizing treatment. In this work, we developed a vision transformer-based (ViT-based) multilabel model with multimodal input to learn complementary information from available pretreatment data and predict multiple associated endpoints for radiation therapy for patients with OPC. Methods and Materials: A publicly available data set of 512 patients with OPC was used for both model training and evaluation. Planning computed tomography images, primary gross tumor volume masks, and 16 clinical variables representing patient demographics, diagnosis, and treatment were used as inputs. To extract deep image features with global attention, we used a ViT module. Clinical variables were concatenated with the learned image features and fed into fully connected layers to incorporate cross-modality features. To learn the mapping between the features and correlated survival outcomes, including overall survival, local failure-free survival, regional failure-free survival, and distant failure-free survival, we employed 4 multitask logistic regression layers. The proposed model was optimized by combining the multitask logistic regression negative-log likelihood losses of different prediction targets. Results: We employed the C-index and area under the curve metrics to assess the performance of our model for time-to-event prediction and time-specific binary prediction, respectively. Our proposed model outperformed corresponding single-modality and single-label models on all prediction labels, achieving C-indices of 0.773, 0.765, 0.776, and 0.773 for overall survival, local failure-free survival, regional failure-free survival, and distant failure-free survival, respectively. The area under the curve values ranged between 0.799 and 0.844 for different tasks at different time points. Using the medians of predicted risks as the thresholds to identify high-risk and low-risk patient groups, we performed the log-rank test, the results of which showed significantly larger separations in different event-free survivals. Conclusion: We developed the first model capable of predicting multiple labels for OPC simultaneously. Our model demonstrated better prognostic ability for all the prediction targets compared with corresponding single-modality models and single-label models.",

author = "Meixu Chen and Kai Wang and Jing Wang",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier Inc.",

year = "2024",

month = mar,

day = "15",

doi = "10.1016/j.ijrobp.2023.10.022",

language = "English (US)",

volume = "118",

pages = "1123--1134",

journal = "International Journal of Radiation Oncology Biology Physics",

issn = "0360-3016",

publisher = "Elsevier Inc.",

number = "4",

}

TY - JOUR

T1 - Vision Transformer-Based Multilabel Survival Prediction for Oropharynx Cancer After Radiation Therapy

AU - Chen, Meixu

AU - Wang, Kai

AU - Wang, Jing

PY - 2024/3/15

Y1 - 2024/3/15

N2 - Purpose: A reliable and comprehensive cancer prognosis model for oropharyngeal cancer (OPC) could better assist in personalizing treatment. In this work, we developed a vision transformer-based (ViT-based) multilabel model with multimodal input to learn complementary information from available pretreatment data and predict multiple associated endpoints for radiation therapy for patients with OPC. Methods and Materials: A publicly available data set of 512 patients with OPC was used for both model training and evaluation. Planning computed tomography images, primary gross tumor volume masks, and 16 clinical variables representing patient demographics, diagnosis, and treatment were used as inputs. To extract deep image features with global attention, we used a ViT module. Clinical variables were concatenated with the learned image features and fed into fully connected layers to incorporate cross-modality features. To learn the mapping between the features and correlated survival outcomes, including overall survival, local failure-free survival, regional failure-free survival, and distant failure-free survival, we employed 4 multitask logistic regression layers. The proposed model was optimized by combining the multitask logistic regression negative-log likelihood losses of different prediction targets. Results: We employed the C-index and area under the curve metrics to assess the performance of our model for time-to-event prediction and time-specific binary prediction, respectively. Our proposed model outperformed corresponding single-modality and single-label models on all prediction labels, achieving C-indices of 0.773, 0.765, 0.776, and 0.773 for overall survival, local failure-free survival, regional failure-free survival, and distant failure-free survival, respectively. The area under the curve values ranged between 0.799 and 0.844 for different tasks at different time points. Using the medians of predicted risks as the thresholds to identify high-risk and low-risk patient groups, we performed the log-rank test, the results of which showed significantly larger separations in different event-free survivals. Conclusion: We developed the first model capable of predicting multiple labels for OPC simultaneously. Our model demonstrated better prognostic ability for all the prediction targets compared with corresponding single-modality models and single-label models.

AB - Purpose: A reliable and comprehensive cancer prognosis model for oropharyngeal cancer (OPC) could better assist in personalizing treatment. In this work, we developed a vision transformer-based (ViT-based) multilabel model with multimodal input to learn complementary information from available pretreatment data and predict multiple associated endpoints for radiation therapy for patients with OPC. Methods and Materials: A publicly available data set of 512 patients with OPC was used for both model training and evaluation. Planning computed tomography images, primary gross tumor volume masks, and 16 clinical variables representing patient demographics, diagnosis, and treatment were used as inputs. To extract deep image features with global attention, we used a ViT module. Clinical variables were concatenated with the learned image features and fed into fully connected layers to incorporate cross-modality features. To learn the mapping between the features and correlated survival outcomes, including overall survival, local failure-free survival, regional failure-free survival, and distant failure-free survival, we employed 4 multitask logistic regression layers. The proposed model was optimized by combining the multitask logistic regression negative-log likelihood losses of different prediction targets. Results: We employed the C-index and area under the curve metrics to assess the performance of our model for time-to-event prediction and time-specific binary prediction, respectively. Our proposed model outperformed corresponding single-modality and single-label models on all prediction labels, achieving C-indices of 0.773, 0.765, 0.776, and 0.773 for overall survival, local failure-free survival, regional failure-free survival, and distant failure-free survival, respectively. The area under the curve values ranged between 0.799 and 0.844 for different tasks at different time points. Using the medians of predicted risks as the thresholds to identify high-risk and low-risk patient groups, we performed the log-rank test, the results of which showed significantly larger separations in different event-free survivals. Conclusion: We developed the first model capable of predicting multiple labels for OPC simultaneously. Our model demonstrated better prognostic ability for all the prediction targets compared with corresponding single-modality models and single-label models.

UR - http://www.scopus.com/inward/record.url?scp=85180336481&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85180336481&partnerID=8YFLogxK

U2 - 10.1016/j.ijrobp.2023.10.022

DO - 10.1016/j.ijrobp.2023.10.022

M3 - Article

C2 - 37939732

AN - SCOPUS:85180336481

SN - 0360-3016

VL - 118

SP - 1123

EP - 1134

JO - International Journal of Radiation Oncology Biology Physics

JF - International Journal of Radiation Oncology Biology Physics

IS - 4

ER -

Vision Transformer-Based Multilabel Survival Prediction for Oropharynx Cancer After Radiation Therapy

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this