TY - JOUR
T1 - Robustness study of noisy annotation in deep learning based medical image segmentation
AU - Yu, Shaode
AU - Chen, Mingli
AU - Zhang, Erlei
AU - Wu, Junjie
AU - Yu, Hang
AU - Yang, Zi
AU - Ma, Lin
AU - Gu, Xuejun
AU - Lu, Weiguo
N1 - Publisher Copyright:
© 2020 Institute of Physics and Engineering in Medicine.
PY - 2020/9/7
Y1 - 2020/9/7
N2 - Partly due to the use of exhaustive-annotated data, deep networks have achieved impressive performance on medical image segmentation. Medical imaging data paired with noisy annotation are, however, ubiquitous, but little is known about the effect of noisy annotation on deep learning based medical image segmentation. We studied the effect of noisy annotation in the context of mandible segmentation from CT images. First, 202 images of head and neck cancer patients were collected from our clinical database, where the organs-at-risk were annotated by one of twelve planning dosimetrists. The mandibles were roughly annotated as the planning avoiding structure. Then, mandible labels were checked and corrected by a head and neck specialist to get the reference standard. At last, by varying the ratios of noisy labels in the training set, deep networks were trained and tested for mandible segmentation. The trained models were further tested on other two public datasets. Experimental results indicated that the network trained with noisy labels had worse segmentation than that trained with reference standard, and in general, fewer noisy labels led to better performance. When using 20% or less noisy cases for training, no significant difference was found on the segmentation results between the models trained by noisy or reference annotation. Cross-dataset validation results verified that the models trained with noisy data achieved competitive performance to that trained with reference standard. This study suggests that the involved network is robust to noisy annotation to some extent in mandible segmentation from CT images. It also highlights the importance of labeling quality in deep learning. In the future work, extra attention should be paid to how to utilize a small number of reference standard samples to improve the performance of deep learning with noisy annotation.
AB - Partly due to the use of exhaustive-annotated data, deep networks have achieved impressive performance on medical image segmentation. Medical imaging data paired with noisy annotation are, however, ubiquitous, but little is known about the effect of noisy annotation on deep learning based medical image segmentation. We studied the effect of noisy annotation in the context of mandible segmentation from CT images. First, 202 images of head and neck cancer patients were collected from our clinical database, where the organs-at-risk were annotated by one of twelve planning dosimetrists. The mandibles were roughly annotated as the planning avoiding structure. Then, mandible labels were checked and corrected by a head and neck specialist to get the reference standard. At last, by varying the ratios of noisy labels in the training set, deep networks were trained and tested for mandible segmentation. The trained models were further tested on other two public datasets. Experimental results indicated that the network trained with noisy labels had worse segmentation than that trained with reference standard, and in general, fewer noisy labels led to better performance. When using 20% or less noisy cases for training, no significant difference was found on the segmentation results between the models trained by noisy or reference annotation. Cross-dataset validation results verified that the models trained with noisy data achieved competitive performance to that trained with reference standard. This study suggests that the involved network is robust to noisy annotation to some extent in mandible segmentation from CT images. It also highlights the importance of labeling quality in deep learning. In the future work, extra attention should be paid to how to utilize a small number of reference standard samples to improve the performance of deep learning with noisy annotation.
KW - deep learning
KW - medical image segmentation
KW - noisy annotation
KW - radiation oncology
UR - http://www.scopus.com/inward/record.url?scp=85091112412&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091112412&partnerID=8YFLogxK
U2 - 10.1088/1361-6560/ab99e5
DO - 10.1088/1361-6560/ab99e5
M3 - Article
C2 - 32503027
AN - SCOPUS:85091112412
SN - 0031-9155
VL - 65
JO - Physics in Medicine and Biology
JF - Physics in Medicine and Biology
IS - 17
M1 - 175007
ER -