Is this good enough on expert perception of brain tumor segmentation quality

Katharina Hoebel, Christopher P. Bridge, Sara Ahmed, Oluwatosin Akintola, Caroline Chung, Raymond Huang, Jason Johnson, Albert Kim, K. Ina Ly, Ken Chang, Jay Patel, Marco Pinho, Tracy T. Batchelor, Bruce Rosen, Elizabeth Gerstner, Jayashree Kalpathy-Cramer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations


The performance of Deep Learning (DL) segmentation algorithms is routinely determined using quantitative metrics like the Dice score and Hausdorff distance. However, these metrics show a low concordance with humans perception of segmentation quality. The successful collaboration of health care professionals with DL segmentation algorithms will require a detailed understanding of experts assessment of segmentation quality. Here, we present the results of a study on expert quality perception of brain tumor segmentations of brain MR images generated by a DL segmentation algorithm. Eight expert medical professionals were asked to grade the quality of segmentations on a scale from 1 (worst) to 4 (best). To this end, we collected four ratings for a dataset of 60 cases. We observed a low inter-rater agreement among all raters (Krippendorff s alpha: 0.34), which potentially is a result of different internal cutoffs for the quality ratings. Several factors, including the volume of the segmentation and model uncertainty, were associated with high disagreement between raters. Furthermore, the correlations between the ratings and commonly used quantitative segmentation quality metrics ranged from no to moderate correlation. We conclude that, similar to the inter-rater variability observed for manual brain tumor segmentation, segmentation quality ratings are prone to variability due to the ambiguity of tumor boundaries and individual perceptual differences. Clearer guidelines for quality evaluation could help to mitigate these differences. Importantly, existing technical metrics do not capture clinical perception of segmentation quality. A better understanding of expert quality perception is expected to support the design of more human-centered DL algorithms for integration into the clinical workflow.

Original languageEnglish (US)
Title of host publicationMedical Imaging 2022
Subtitle of host publicationImage Perception, Observer Performance, and Technology Assessment
EditorsClaudia R. Mello-Thoms, Claudia R. Mello-Thoms, Sian Taylor-Phillips
ISBN (Electronic)9781510649453
StatePublished - 2022
EventMedical Imaging 2022: Image Perception, Observer Performance, and Technology Assessment - Virtual, Online
Duration: Mar 21 2022Mar 27 2022

Publication series

NameProgress in Biomedical Optics and Imaging - Proceedings of SPIE
ISSN (Print)1605-7422


ConferenceMedical Imaging 2022: Image Perception, Observer Performance, and Technology Assessment
CityVirtual, Online


  • deep learning
  • inter-rater variability
  • quality assessment
  • segmentation

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Atomic and Molecular Physics, and Optics
  • Biomaterials
  • Radiology Nuclear Medicine and imaging


Dive into the research topics of 'Is this good enough on expert perception of brain tumor segmentation quality'. Together they form a unique fingerprint.

Cite this