Robustness study of noisy annotation in deep learning based medical image segmentation

Shaode Yu; Mingli Chen; Erlei Zhang; Junjie Wu; Hang Yu; Zi Yang; Lin Ma; Xuejun Gu; Weiguo Lu

doi:10.1088/1361-6560/ab99e5

Robustness study of noisy annotation in deep learning based medical image segmentation

Shaode Yu, Mingli Chen, Erlei Zhang, Junjie Wu, Hang Yu, Zi Yang, Lin Ma, Xuejun Gu, Weiguo Lu

Research output: Contribution to journal › Article › peer-review

31 Scopus citations

Abstract

Partly due to the use of exhaustive-annotated data, deep networks have achieved impressive performance on medical image segmentation. Medical imaging data paired with noisy annotation are, however, ubiquitous, but little is known about the effect of noisy annotation on deep learning based medical image segmentation. We studied the effect of noisy annotation in the context of mandible segmentation from CT images. First, 202 images of head and neck cancer patients were collected from our clinical database, where the organs-at-risk were annotated by one of twelve planning dosimetrists. The mandibles were roughly annotated as the planning avoiding structure. Then, mandible labels were checked and corrected by a head and neck specialist to get the reference standard. At last, by varying the ratios of noisy labels in the training set, deep networks were trained and tested for mandible segmentation. The trained models were further tested on other two public datasets. Experimental results indicated that the network trained with noisy labels had worse segmentation than that trained with reference standard, and in general, fewer noisy labels led to better performance. When using 20% or less noisy cases for training, no significant difference was found on the segmentation results between the models trained by noisy or reference annotation. Cross-dataset validation results verified that the models trained with noisy data achieved competitive performance to that trained with reference standard. This study suggests that the involved network is robust to noisy annotation to some extent in mandible segmentation from CT images. It also highlights the importance of labeling quality in deep learning. In the future work, extra attention should be paid to how to utilize a small number of reference standard samples to improve the performance of deep learning with noisy annotation.

Original language	English (US)
Article number	175007
Journal	Physics in medicine and biology
Volume	65
Issue number	17
DOIs	https://doi.org/10.1088/1361-6560/ab99e5
State	Published - Sep 7 2020

Keywords

deep learning
medical image segmentation
noisy annotation
radiation oncology

ASJC Scopus subject areas

Radiological and Ultrasound Technology
Radiology Nuclear Medicine and imaging

Access to Document

10.1088/1361-6560/ab99e5

Cite this

@article{574feed768f746df81982b8b2be9da7b,

title = "Robustness study of noisy annotation in deep learning based medical image segmentation",

abstract = "Partly due to the use of exhaustive-annotated data, deep networks have achieved impressive performance on medical image segmentation. Medical imaging data paired with noisy annotation are, however, ubiquitous, but little is known about the effect of noisy annotation on deep learning based medical image segmentation. We studied the effect of noisy annotation in the context of mandible segmentation from CT images. First, 202 images of head and neck cancer patients were collected from our clinical database, where the organs-at-risk were annotated by one of twelve planning dosimetrists. The mandibles were roughly annotated as the planning avoiding structure. Then, mandible labels were checked and corrected by a head and neck specialist to get the reference standard. At last, by varying the ratios of noisy labels in the training set, deep networks were trained and tested for mandible segmentation. The trained models were further tested on other two public datasets. Experimental results indicated that the network trained with noisy labels had worse segmentation than that trained with reference standard, and in general, fewer noisy labels led to better performance. When using 20% or less noisy cases for training, no significant difference was found on the segmentation results between the models trained by noisy or reference annotation. Cross-dataset validation results verified that the models trained with noisy data achieved competitive performance to that trained with reference standard. This study suggests that the involved network is robust to noisy annotation to some extent in mandible segmentation from CT images. It also highlights the importance of labeling quality in deep learning. In the future work, extra attention should be paid to how to utilize a small number of reference standard samples to improve the performance of deep learning with noisy annotation.",

keywords = "deep learning, medical image segmentation, noisy annotation, radiation oncology",

author = "Shaode Yu and Mingli Chen and Erlei Zhang and Junjie Wu and Hang Yu and Zi Yang and Lin Ma and Xuejun Gu and Weiguo Lu",

note = "Publisher Copyright: {\textcopyright} 2020 Institute of Physics and Engineering in Medicine.",

year = "2020",

month = sep,

day = "7",

doi = "10.1088/1361-6560/ab99e5",

language = "English (US)",

volume = "65",

journal = "Physics in medicine and biology",

issn = "0031-9155",

publisher = "IOP Publishing Ltd.",

number = "17",

}

TY - JOUR

T1 - Robustness study of noisy annotation in deep learning based medical image segmentation

AU - Yu, Shaode

AU - Chen, Mingli

AU - Zhang, Erlei

AU - Wu, Junjie

AU - Yu, Hang

AU - Yang, Zi

AU - Ma, Lin

AU - Gu, Xuejun

AU - Lu, Weiguo

PY - 2020/9/7

Y1 - 2020/9/7

N2 - Partly due to the use of exhaustive-annotated data, deep networks have achieved impressive performance on medical image segmentation. Medical imaging data paired with noisy annotation are, however, ubiquitous, but little is known about the effect of noisy annotation on deep learning based medical image segmentation. We studied the effect of noisy annotation in the context of mandible segmentation from CT images. First, 202 images of head and neck cancer patients were collected from our clinical database, where the organs-at-risk were annotated by one of twelve planning dosimetrists. The mandibles were roughly annotated as the planning avoiding structure. Then, mandible labels were checked and corrected by a head and neck specialist to get the reference standard. At last, by varying the ratios of noisy labels in the training set, deep networks were trained and tested for mandible segmentation. The trained models were further tested on other two public datasets. Experimental results indicated that the network trained with noisy labels had worse segmentation than that trained with reference standard, and in general, fewer noisy labels led to better performance. When using 20% or less noisy cases for training, no significant difference was found on the segmentation results between the models trained by noisy or reference annotation. Cross-dataset validation results verified that the models trained with noisy data achieved competitive performance to that trained with reference standard. This study suggests that the involved network is robust to noisy annotation to some extent in mandible segmentation from CT images. It also highlights the importance of labeling quality in deep learning. In the future work, extra attention should be paid to how to utilize a small number of reference standard samples to improve the performance of deep learning with noisy annotation.

AB - Partly due to the use of exhaustive-annotated data, deep networks have achieved impressive performance on medical image segmentation. Medical imaging data paired with noisy annotation are, however, ubiquitous, but little is known about the effect of noisy annotation on deep learning based medical image segmentation. We studied the effect of noisy annotation in the context of mandible segmentation from CT images. First, 202 images of head and neck cancer patients were collected from our clinical database, where the organs-at-risk were annotated by one of twelve planning dosimetrists. The mandibles were roughly annotated as the planning avoiding structure. Then, mandible labels were checked and corrected by a head and neck specialist to get the reference standard. At last, by varying the ratios of noisy labels in the training set, deep networks were trained and tested for mandible segmentation. The trained models were further tested on other two public datasets. Experimental results indicated that the network trained with noisy labels had worse segmentation than that trained with reference standard, and in general, fewer noisy labels led to better performance. When using 20% or less noisy cases for training, no significant difference was found on the segmentation results between the models trained by noisy or reference annotation. Cross-dataset validation results verified that the models trained with noisy data achieved competitive performance to that trained with reference standard. This study suggests that the involved network is robust to noisy annotation to some extent in mandible segmentation from CT images. It also highlights the importance of labeling quality in deep learning. In the future work, extra attention should be paid to how to utilize a small number of reference standard samples to improve the performance of deep learning with noisy annotation.

KW - deep learning

KW - medical image segmentation

KW - noisy annotation

KW - radiation oncology

UR - http://www.scopus.com/inward/record.url?scp=85091112412&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85091112412&partnerID=8YFLogxK

U2 - 10.1088/1361-6560/ab99e5

DO - 10.1088/1361-6560/ab99e5

M3 - Article

C2 - 32503027

AN - SCOPUS:85091112412

SN - 0031-9155

VL - 65

JO - Physics in medicine and biology

JF - Physics in medicine and biology

IS - 17

M1 - 175007

ER -

Robustness study of noisy annotation in deep learning based medical image segmentation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this