Improving efficiency of training a virtual treatment planner network via knowledge-guided deep reinforcement learning for intelligent automatic treatment planning of radiotherapy

Chenyang Shen; Liyuan Chen; Yesenia Gonzalez; Xun Jia

doi:10.1002/mp.14712

Improving efficiency of training a virtual treatment planner network via knowledge-guided deep reinforcement learning for intelligent automatic treatment planning of radiotherapy

Chenyang Shen, Liyuan Chen, Yesenia Gonzalez, Xun Jia

Research output: Contribution to journal › Article › peer-review

14 Scopus citations

Abstract

Purpose: We previously proposed an intelligent automatic treatment planning framework for radiotherapy, in which a virtual treatment planner network (VTPN) is built using deep reinforcement learning (DRL) to operate a treatment planning system (TPS) by adjusting treatment planning parameters in it to generate high-quality plans. We demonstrated the potential feasibility of this idea in prostate cancer intensity-modulated radiation therapy (IMRT). Despite the success, the process to train a VTPN via the standard DRL approach with an ϵ-greedy algorithm was time-consuming. The required training time was expected to grow with the complexity of the treatment planning problem, preventing the development of VTPN for more complicated but clinically relevant scenarios. In this study, we proposed a novel knowledge-guided DRL (KgDRL) approach that incorporated knowledge from human planners to guide the training process to improve the efficiency of training a VTPN. Method: Using prostate cancer IMRT as a test bed, we first summarized a number of rules in the actions of adjusting treatment planning parameters of our in-house TPS. During the training process of VTPN, in addition to randomly navigating the large state-action space, as in the standard DRL approach using the ϵ-greedy algorithm, we also sampled actions defined by the rules. The priority of sampling actions from rules decreased over the training process to encourage VTPN to explore new policy on parameter adjustment that were not covered by the rules. To test this idea, we trained a VTPN using KgDRL and compared its performance with another VTPN trained using the standard DRL approach. Both networks were trained using 10 training patient cases and five additional cases for validation, while another 59 cases were employed for the evaluation purpose. Results: It was found that both VTPNs trained via KgDRL and standard DRL spontaneously learned how to operate the in-house TPS to generate high-quality plans, achieving plan quality scores of 8.82 (±0.29) and 8.43 (±0.48), respectively. Both VTPNs outperformed treatment planning purely based on the rules, which had a plan score of 7.81 (±1.59). VTPN trained with eight episodes using KgDRL was able to perform similar to that trained using DRL with 100 epochs. The training time was reduced from more than a week to ~13 hrs. Conclusion: The proposed KgDRL framework was effective in accelerating the training process of a VTPN by incorporating human knowledge, which will facilitate the development of VTPN for more complicated treatment planning scenarios.

Original language	English (US)
Pages (from-to)	1909-1920
Number of pages	12
Journal	Medical physics
Volume	48
Issue number	4
DOIs	https://doi.org/10.1002/mp.14712
State	Published - Apr 2021

Keywords

Intelligent automatic treatment planning
deep reinforcement learning
human-knowledge guided deep learning

ASJC Scopus subject areas

Biophysics
Radiology Nuclear Medicine and imaging

Access to Document

10.1002/mp.14712

Cite this

@article{d5427afdb9214ff8a211e2fb99666755,

title = "Improving efficiency of training a virtual treatment planner network via knowledge-guided deep reinforcement learning for intelligent automatic treatment planning of radiotherapy",

abstract = "Purpose: We previously proposed an intelligent automatic treatment planning framework for radiotherapy, in which a virtual treatment planner network (VTPN) is built using deep reinforcement learning (DRL) to operate a treatment planning system (TPS) by adjusting treatment planning parameters in it to generate high-quality plans. We demonstrated the potential feasibility of this idea in prostate cancer intensity-modulated radiation therapy (IMRT). Despite the success, the process to train a VTPN via the standard DRL approach with an ϵ-greedy algorithm was time-consuming. The required training time was expected to grow with the complexity of the treatment planning problem, preventing the development of VTPN for more complicated but clinically relevant scenarios. In this study, we proposed a novel knowledge-guided DRL (KgDRL) approach that incorporated knowledge from human planners to guide the training process to improve the efficiency of training a VTPN. Method: Using prostate cancer IMRT as a test bed, we first summarized a number of rules in the actions of adjusting treatment planning parameters of our in-house TPS. During the training process of VTPN, in addition to randomly navigating the large state-action space, as in the standard DRL approach using the ϵ-greedy algorithm, we also sampled actions defined by the rules. The priority of sampling actions from rules decreased over the training process to encourage VTPN to explore new policy on parameter adjustment that were not covered by the rules. To test this idea, we trained a VTPN using KgDRL and compared its performance with another VTPN trained using the standard DRL approach. Both networks were trained using 10 training patient cases and five additional cases for validation, while another 59 cases were employed for the evaluation purpose. Results: It was found that both VTPNs trained via KgDRL and standard DRL spontaneously learned how to operate the in-house TPS to generate high-quality plans, achieving plan quality scores of 8.82 (±0.29) and 8.43 (±0.48), respectively. Both VTPNs outperformed treatment planning purely based on the rules, which had a plan score of 7.81 (±1.59). VTPN trained with eight episodes using KgDRL was able to perform similar to that trained using DRL with 100 epochs. The training time was reduced from more than a week to ~13 hrs. Conclusion: The proposed KgDRL framework was effective in accelerating the training process of a VTPN by incorporating human knowledge, which will facilitate the development of VTPN for more complicated treatment planning scenarios.",

keywords = "Intelligent automatic treatment planning, deep reinforcement learning, human-knowledge guided deep learning",

author = "Chenyang Shen and Liyuan Chen and Yesenia Gonzalez and Xun Jia",

note = "Publisher Copyright: {\textcopyright} 2021 American Association of Physicists in Medicine",

year = "2021",

month = apr,

doi = "10.1002/mp.14712",

language = "English (US)",

volume = "48",

pages = "1909--1920",

journal = "Medical physics",

issn = "0094-2405",

publisher = "AAPM - American Association of Physicists in Medicine",

number = "4",

}

TY - JOUR

T1 - Improving efficiency of training a virtual treatment planner network via knowledge-guided deep reinforcement learning for intelligent automatic treatment planning of radiotherapy

AU - Shen, Chenyang

AU - Chen, Liyuan

AU - Gonzalez, Yesenia

AU - Jia, Xun

PY - 2021/4

Y1 - 2021/4

N2 - Purpose: We previously proposed an intelligent automatic treatment planning framework for radiotherapy, in which a virtual treatment planner network (VTPN) is built using deep reinforcement learning (DRL) to operate a treatment planning system (TPS) by adjusting treatment planning parameters in it to generate high-quality plans. We demonstrated the potential feasibility of this idea in prostate cancer intensity-modulated radiation therapy (IMRT). Despite the success, the process to train a VTPN via the standard DRL approach with an ϵ-greedy algorithm was time-consuming. The required training time was expected to grow with the complexity of the treatment planning problem, preventing the development of VTPN for more complicated but clinically relevant scenarios. In this study, we proposed a novel knowledge-guided DRL (KgDRL) approach that incorporated knowledge from human planners to guide the training process to improve the efficiency of training a VTPN. Method: Using prostate cancer IMRT as a test bed, we first summarized a number of rules in the actions of adjusting treatment planning parameters of our in-house TPS. During the training process of VTPN, in addition to randomly navigating the large state-action space, as in the standard DRL approach using the ϵ-greedy algorithm, we also sampled actions defined by the rules. The priority of sampling actions from rules decreased over the training process to encourage VTPN to explore new policy on parameter adjustment that were not covered by the rules. To test this idea, we trained a VTPN using KgDRL and compared its performance with another VTPN trained using the standard DRL approach. Both networks were trained using 10 training patient cases and five additional cases for validation, while another 59 cases were employed for the evaluation purpose. Results: It was found that both VTPNs trained via KgDRL and standard DRL spontaneously learned how to operate the in-house TPS to generate high-quality plans, achieving plan quality scores of 8.82 (±0.29) and 8.43 (±0.48), respectively. Both VTPNs outperformed treatment planning purely based on the rules, which had a plan score of 7.81 (±1.59). VTPN trained with eight episodes using KgDRL was able to perform similar to that trained using DRL with 100 epochs. The training time was reduced from more than a week to ~13 hrs. Conclusion: The proposed KgDRL framework was effective in accelerating the training process of a VTPN by incorporating human knowledge, which will facilitate the development of VTPN for more complicated treatment planning scenarios.

AB - Purpose: We previously proposed an intelligent automatic treatment planning framework for radiotherapy, in which a virtual treatment planner network (VTPN) is built using deep reinforcement learning (DRL) to operate a treatment planning system (TPS) by adjusting treatment planning parameters in it to generate high-quality plans. We demonstrated the potential feasibility of this idea in prostate cancer intensity-modulated radiation therapy (IMRT). Despite the success, the process to train a VTPN via the standard DRL approach with an ϵ-greedy algorithm was time-consuming. The required training time was expected to grow with the complexity of the treatment planning problem, preventing the development of VTPN for more complicated but clinically relevant scenarios. In this study, we proposed a novel knowledge-guided DRL (KgDRL) approach that incorporated knowledge from human planners to guide the training process to improve the efficiency of training a VTPN. Method: Using prostate cancer IMRT as a test bed, we first summarized a number of rules in the actions of adjusting treatment planning parameters of our in-house TPS. During the training process of VTPN, in addition to randomly navigating the large state-action space, as in the standard DRL approach using the ϵ-greedy algorithm, we also sampled actions defined by the rules. The priority of sampling actions from rules decreased over the training process to encourage VTPN to explore new policy on parameter adjustment that were not covered by the rules. To test this idea, we trained a VTPN using KgDRL and compared its performance with another VTPN trained using the standard DRL approach. Both networks were trained using 10 training patient cases and five additional cases for validation, while another 59 cases were employed for the evaluation purpose. Results: It was found that both VTPNs trained via KgDRL and standard DRL spontaneously learned how to operate the in-house TPS to generate high-quality plans, achieving plan quality scores of 8.82 (±0.29) and 8.43 (±0.48), respectively. Both VTPNs outperformed treatment planning purely based on the rules, which had a plan score of 7.81 (±1.59). VTPN trained with eight episodes using KgDRL was able to perform similar to that trained using DRL with 100 epochs. The training time was reduced from more than a week to ~13 hrs. Conclusion: The proposed KgDRL framework was effective in accelerating the training process of a VTPN by incorporating human knowledge, which will facilitate the development of VTPN for more complicated treatment planning scenarios.

KW - Intelligent automatic treatment planning

KW - deep reinforcement learning

KW - human-knowledge guided deep learning

UR - http://www.scopus.com/inward/record.url?scp=85101444904&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85101444904&partnerID=8YFLogxK

U2 - 10.1002/mp.14712

DO - 10.1002/mp.14712

M3 - Article

C2 - 33432646

AN - SCOPUS:85101444904

SN - 0094-2405

VL - 48

SP - 1909

EP - 1920

JO - Medical physics

JF - Medical physics

IS - 4

ER -

Improving efficiency of training a virtual treatment planner network via knowledge-guided deep reinforcement learning for intelligent automatic treatment planning of radiotherapy

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this