TY - JOUR
T1 - A hierarchical deep reinforcement learning framework for intelligent automatic treatment planning of prostate cancer intensity modulated radiation therapy
AU - Shen, Chenyang
AU - Chen, Liyuan
AU - Jia, Xun
N1 - Funding Information:
This work was supported by the National Institutes of Health grant number R01CA237269 and Cancer Prevention and Research Institute of Texas grant number RP160661.
Publisher Copyright:
© 2021 Institute of Physics and Engineering in Medicine.
PY - 2021/7/7
Y1 - 2021/7/7
N2 - Purpose. We have previously proposed an intelligent automatic treatment planning (IATP) framework that builds a virtual treatment planner network (VTPN) to operate a treatment planning system (TPS) to generate high-quality radiation therapy (RT) treatment plans. While the potential of IATP in automating RT treatment planning has been demonstrated, its poor scalability caused by an almost linear growth of network size with the number of treatment planning parameters (TPPs) is a bottleneck, preventing its application in complicate, but clinically relevant treatment planning problems. The decision-making behavior of the trained network is hard to understand. Motivated by the decision-making process of a human planner, this study proposes a hierarchical IATP framework. Methods and materials. The hierarchical VTPN (HieVTPN) consists of three networks, i.e. Structure-Net, Parameter-Net, and Action-Net. When interacting with a TPS, the networks are employed in a sequential order in each step to decide the structure to adjust, the TPP to adjust for the selected structure, and the specific adjustment manner for the parameter, respectively. We developed an end-to-end hierarchical deep reinforcement learning scheme to simultaneously train the three networks. We then evaluated the effectiveness of the proposed framework in the treatment planning problems for prostate cancer intensity modulated RT (IMRT) and stereotactic body RT (SBRT). We benchmarked the performance of our approach by comparing plans made by VTPN of a parallel architecture, and the human plans submitted for competition in the 2016 American Association of Medical Dosimetrist (AAMD)/Radiosurgery Society (RSS) Plan Study. We analyzed scalability of the network size with respect to the number of TPPs. Numerical experiments were also performed to understand the rationale of the decision-making behaviors of the trained HieVTPN. Results. Both HieVTPNs for prostate IMRT and SBRT were trained successfully using 10 training patient cases and 5 validation cases. For IMRT, HieVTPN was able to generate high-quality plans for 59 testing patient cases that were not included in training process, achieving an average plan score of 8.62 (±0.83), with 9 being the maximal score. The score was comparable to that of the VTPN, 8.45 (±0.48). For SBRT planning, HieVTPN achieved an average plan score of 139.07 on five testing patient cases compared to the score of 132.21 averaged over the human plans summited for competition in AAMD/RSS plan study. Different from VTPN with network size linearly scaling with the number of TPPs, the network size of HieVTPN is almost independent of the number of TPPs. It was also observed that the decision-making behaviors of HieVTPN were understandable and generally agreed with the human experience. Conclusions. With the scalability and explainability, the hierarchical IATP framework is more favorable than the previous framework in terms of handling treatment planning problems involving a large number of TPPs.
AB - Purpose. We have previously proposed an intelligent automatic treatment planning (IATP) framework that builds a virtual treatment planner network (VTPN) to operate a treatment planning system (TPS) to generate high-quality radiation therapy (RT) treatment plans. While the potential of IATP in automating RT treatment planning has been demonstrated, its poor scalability caused by an almost linear growth of network size with the number of treatment planning parameters (TPPs) is a bottleneck, preventing its application in complicate, but clinically relevant treatment planning problems. The decision-making behavior of the trained network is hard to understand. Motivated by the decision-making process of a human planner, this study proposes a hierarchical IATP framework. Methods and materials. The hierarchical VTPN (HieVTPN) consists of three networks, i.e. Structure-Net, Parameter-Net, and Action-Net. When interacting with a TPS, the networks are employed in a sequential order in each step to decide the structure to adjust, the TPP to adjust for the selected structure, and the specific adjustment manner for the parameter, respectively. We developed an end-to-end hierarchical deep reinforcement learning scheme to simultaneously train the three networks. We then evaluated the effectiveness of the proposed framework in the treatment planning problems for prostate cancer intensity modulated RT (IMRT) and stereotactic body RT (SBRT). We benchmarked the performance of our approach by comparing plans made by VTPN of a parallel architecture, and the human plans submitted for competition in the 2016 American Association of Medical Dosimetrist (AAMD)/Radiosurgery Society (RSS) Plan Study. We analyzed scalability of the network size with respect to the number of TPPs. Numerical experiments were also performed to understand the rationale of the decision-making behaviors of the trained HieVTPN. Results. Both HieVTPNs for prostate IMRT and SBRT were trained successfully using 10 training patient cases and 5 validation cases. For IMRT, HieVTPN was able to generate high-quality plans for 59 testing patient cases that were not included in training process, achieving an average plan score of 8.62 (±0.83), with 9 being the maximal score. The score was comparable to that of the VTPN, 8.45 (±0.48). For SBRT planning, HieVTPN achieved an average plan score of 139.07 on five testing patient cases compared to the score of 132.21 averaged over the human plans summited for competition in AAMD/RSS plan study. Different from VTPN with network size linearly scaling with the number of TPPs, the network size of HieVTPN is almost independent of the number of TPPs. It was also observed that the decision-making behaviors of HieVTPN were understandable and generally agreed with the human experience. Conclusions. With the scalability and explainability, the hierarchical IATP framework is more favorable than the previous framework in terms of handling treatment planning problems involving a large number of TPPs.
KW - deep reinforcement learning
KW - hierarchical learning
KW - intelligent automatic treatment planning
UR - http://www.scopus.com/inward/record.url?scp=85109079185&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85109079185&partnerID=8YFLogxK
U2 - 10.1088/1361-6560/ac09a2
DO - 10.1088/1361-6560/ac09a2
M3 - Article
C2 - 34107460
AN - SCOPUS:85109079185
SN - 0031-9155
VL - 66
JO - Physics in medicine and biology
JF - Physics in medicine and biology
IS - 13
M1 - 134002
ER -