A sensitivity analysis of probability maps in deep-learning-based anatomical segmentation

Noah Bice; Neil Kirby; Ruiqi Li; Dan Nguyen; Tyler Bahr; Christopher Kabat; Pamela Myers; Niko Papanikolaou; Mohamad Fakhreddine

doi:10.1002/acm2.13331

A sensitivity analysis of probability maps in deep-learning-based anatomical segmentation

Noah Bice, Neil Kirby, Ruiqi Li, Dan Nguyen, Tyler Bahr, Christopher Kabat, Pamela Myers, Niko Papanikolaou, Mohamad Fakhreddine

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Purpose: Deep-learning-based segmentation models implicitly learn to predict the presence of a structure based on its overall prominence in the training dataset. This phenomenon is observed and accounted for in deep-learning applications such as natural language processing but is often neglected in segmentation literature. The purpose of this work is to demonstrate the significance of class imbalance in deep-learning-based segmentation and recommend tuning of the neural network optimization objective. Methods: An architecture and training procedure were chosen to represent common models in anatomical segmentation. A family of 5-block 2D U-Nets were independently trained to segment 10 structures from the Cancer Imaging Archive's Head-Neck-Radiomics-HN1 dataset. We identify the optimal threshold for our models according to their Dice score on the validation datasets and consider perturbations about the optimum. A measure of structure prominence in segmentation datasets is defined, and its impact on the optimal threshold is analyzed. Finally, we consider the use of a 2D Dice objective in addition to binary cross entropy. Results: We observe significant decreases in perceived model performance with conventional 0.5-thresholding. Perturbations of as little as ±0.05 about the optimum threshold induce a median reduction in Dice score of 11.8% for our models. There is statistical evidence to suggest a weak correlation between training dataset prominence and optimal threshold (Pearson (Formula presented.) and (Formula presented.)). We find that network optimization with respect to the 2D Dice score itself significantly reduces variability due to thresholding but does not unequivocally create the best segmentation models when assessed with distance-based segmentation metrics. Conclusion: Our results suggest that those practicing deep-learning-based contouring should consider their postprocessing procedures as a potential avenue for improved performance. For intensity-based postprocessing, we recommend a mixed objective function consisting of the traditional binary cross entropy along with the 2D Dice score.

Original language	English (US)
Pages (from-to)	105-119
Number of pages	15
Journal	Journal of applied clinical medical physics
Volume	22
Issue number	8
DOIs	https://doi.org/10.1002/acm2.13331
State	Published - Aug 2021
Externally published	Yes

Keywords

deep learning
machine learning
segmentation

ASJC Scopus subject areas

Radiation
Instrumentation
Radiology Nuclear Medicine and imaging

Access to Document

10.1002/acm2.13331

Cite this

@article{4495fd2a2ff941dc95788807c19d5f03,

title = "A sensitivity analysis of probability maps in deep-learning-based anatomical segmentation",

abstract = "Purpose: Deep-learning-based segmentation models implicitly learn to predict the presence of a structure based on its overall prominence in the training dataset. This phenomenon is observed and accounted for in deep-learning applications such as natural language processing but is often neglected in segmentation literature. The purpose of this work is to demonstrate the significance of class imbalance in deep-learning-based segmentation and recommend tuning of the neural network optimization objective. Methods: An architecture and training procedure were chosen to represent common models in anatomical segmentation. A family of 5-block 2D U-Nets were independently trained to segment 10 structures from the Cancer Imaging Archive's Head-Neck-Radiomics-HN1 dataset. We identify the optimal threshold for our models according to their Dice score on the validation datasets and consider perturbations about the optimum. A measure of structure prominence in segmentation datasets is defined, and its impact on the optimal threshold is analyzed. Finally, we consider the use of a 2D Dice objective in addition to binary cross entropy. Results: We observe significant decreases in perceived model performance with conventional 0.5-thresholding. Perturbations of as little as ±0.05 about the optimum threshold induce a median reduction in Dice score of 11.8% for our models. There is statistical evidence to suggest a weak correlation between training dataset prominence and optimal threshold (Pearson (Formula presented.) and (Formula presented.)). We find that network optimization with respect to the 2D Dice score itself significantly reduces variability due to thresholding but does not unequivocally create the best segmentation models when assessed with distance-based segmentation metrics. Conclusion: Our results suggest that those practicing deep-learning-based contouring should consider their postprocessing procedures as a potential avenue for improved performance. For intensity-based postprocessing, we recommend a mixed objective function consisting of the traditional binary cross entropy along with the 2D Dice score.",

keywords = "deep learning, machine learning, segmentation",

author = "Noah Bice and Neil Kirby and Ruiqi Li and Dan Nguyen and Tyler Bahr and Christopher Kabat and Pamela Myers and Niko Papanikolaou and Mohamad Fakhreddine",

note = "Publisher Copyright: {\textcopyright} 2021 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals LLC on behalf of American Association of Physicists in Medicine",

year = "2021",

month = aug,

doi = "10.1002/acm2.13331",

language = "English (US)",

volume = "22",

pages = "105--119",

journal = "Journal of applied clinical medical physics",

issn = "1526-9914",

publisher = "American Institute of Physics Publising LLC",

number = "8",

}

TY - JOUR

T1 - A sensitivity analysis of probability maps in deep-learning-based anatomical segmentation

AU - Bice, Noah

AU - Kirby, Neil

AU - Li, Ruiqi

AU - Nguyen, Dan

AU - Bahr, Tyler

AU - Kabat, Christopher

AU - Myers, Pamela

AU - Papanikolaou, Niko

AU - Fakhreddine, Mohamad

PY - 2021/8

Y1 - 2021/8

N2 - Purpose: Deep-learning-based segmentation models implicitly learn to predict the presence of a structure based on its overall prominence in the training dataset. This phenomenon is observed and accounted for in deep-learning applications such as natural language processing but is often neglected in segmentation literature. The purpose of this work is to demonstrate the significance of class imbalance in deep-learning-based segmentation and recommend tuning of the neural network optimization objective. Methods: An architecture and training procedure were chosen to represent common models in anatomical segmentation. A family of 5-block 2D U-Nets were independently trained to segment 10 structures from the Cancer Imaging Archive's Head-Neck-Radiomics-HN1 dataset. We identify the optimal threshold for our models according to their Dice score on the validation datasets and consider perturbations about the optimum. A measure of structure prominence in segmentation datasets is defined, and its impact on the optimal threshold is analyzed. Finally, we consider the use of a 2D Dice objective in addition to binary cross entropy. Results: We observe significant decreases in perceived model performance with conventional 0.5-thresholding. Perturbations of as little as ±0.05 about the optimum threshold induce a median reduction in Dice score of 11.8% for our models. There is statistical evidence to suggest a weak correlation between training dataset prominence and optimal threshold (Pearson (Formula presented.) and (Formula presented.)). We find that network optimization with respect to the 2D Dice score itself significantly reduces variability due to thresholding but does not unequivocally create the best segmentation models when assessed with distance-based segmentation metrics. Conclusion: Our results suggest that those practicing deep-learning-based contouring should consider their postprocessing procedures as a potential avenue for improved performance. For intensity-based postprocessing, we recommend a mixed objective function consisting of the traditional binary cross entropy along with the 2D Dice score.

AB - Purpose: Deep-learning-based segmentation models implicitly learn to predict the presence of a structure based on its overall prominence in the training dataset. This phenomenon is observed and accounted for in deep-learning applications such as natural language processing but is often neglected in segmentation literature. The purpose of this work is to demonstrate the significance of class imbalance in deep-learning-based segmentation and recommend tuning of the neural network optimization objective. Methods: An architecture and training procedure were chosen to represent common models in anatomical segmentation. A family of 5-block 2D U-Nets were independently trained to segment 10 structures from the Cancer Imaging Archive's Head-Neck-Radiomics-HN1 dataset. We identify the optimal threshold for our models according to their Dice score on the validation datasets and consider perturbations about the optimum. A measure of structure prominence in segmentation datasets is defined, and its impact on the optimal threshold is analyzed. Finally, we consider the use of a 2D Dice objective in addition to binary cross entropy. Results: We observe significant decreases in perceived model performance with conventional 0.5-thresholding. Perturbations of as little as ±0.05 about the optimum threshold induce a median reduction in Dice score of 11.8% for our models. There is statistical evidence to suggest a weak correlation between training dataset prominence and optimal threshold (Pearson (Formula presented.) and (Formula presented.)). We find that network optimization with respect to the 2D Dice score itself significantly reduces variability due to thresholding but does not unequivocally create the best segmentation models when assessed with distance-based segmentation metrics. Conclusion: Our results suggest that those practicing deep-learning-based contouring should consider their postprocessing procedures as a potential avenue for improved performance. For intensity-based postprocessing, we recommend a mixed objective function consisting of the traditional binary cross entropy along with the 2D Dice score.

KW - deep learning

KW - machine learning

KW - segmentation

UR - http://www.scopus.com/inward/record.url?scp=85109710277&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85109710277&partnerID=8YFLogxK

U2 - 10.1002/acm2.13331

DO - 10.1002/acm2.13331

M3 - Article

C2 - 34231950

AN - SCOPUS:85109710277

SN - 1526-9914

VL - 22

SP - 105

EP - 119

JO - Journal of applied clinical medical physics

JF - Journal of applied clinical medical physics

IS - 8

ER -

A sensitivity analysis of probability maps in deep-learning-based anatomical segmentation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this