Deep Learning–Based COVID-19 Pneumonia Classification Using Chest CT Images: Model Generalizability

Dan Nguyen; Fernando Kay; Jun Tan; Yulong Yan; Yee Seng Ng; Puneeth Iyengar; Ron Peshock; Steve Jiang

doi:10.3389/frai.2021.694875

Deep Learning–Based COVID-19 Pneumonia Classification Using Chest CT Images: Model Generalizability

Dan Nguyen, Fernando Kay, Jun Tan, Yulong Yan, Yee Seng Ng, Puneeth Iyengar, Ron Peshock, Steve Jiang

Research output: Contribution to journal › Article › peer-review

17 Scopus citations

Abstract

Since the outbreak of the COVID-19 pandemic, worldwide research efforts have focused on using artificial intelligence (AI) technologies on various medical data of COVID-19–positive patients in order to identify or classify various aspects of the disease, with promising reported results. However, concerns have been raised over their generalizability, given the heterogeneous factors in training datasets. This study aims to examine the severity of this problem by evaluating deep learning (DL) classification models trained to identify COVID-19–positive patients on 3D computed tomography (CT) datasets from different countries. We collected one dataset at UT Southwestern (UTSW) and three external datasets from different countries: CC-CCII Dataset (China), COVID-CTset (Iran), and MosMedData (Russia). We divided the data into two classes: COVID-19–positive and COVID-19–negative patients. We trained nine identical DL-based classification models by using combinations of datasets with a 72% train, 8% validation, and 20% test data split. The models trained on a single dataset achieved accuracy/area under the receiver operating characteristic curve (AUC) values of 0.87/0.826 (UTSW), 0.97/0.988 (CC-CCCI), and 0.86/0.873 (COVID-CTset) when evaluated on their own dataset. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better. However, the performance dropped close to an AUC of 0.5 (random guess) for all models when evaluated on a different dataset outside of its training datasets. Including MosMedData, which only contained positive labels, into the training datasets did not necessarily help the performance of other datasets. Multiple factors likely contributed to these results, such as patient demographics and differences in image acquisition or reconstruction, causing a data shift among different study cohorts.

Original language	English (US)
Article number	694875
Journal	Frontiers in Artificial Intelligence
Volume	4
DOIs	https://doi.org/10.3389/frai.2021.694875
State	Published - Jun 29 2021

Keywords

COVID-19
SARS-CoV-2
classification
computed tomography
convolutional neural network
deep learning
generalizability

ASJC Scopus subject areas

Artificial Intelligence

Access to Document

10.3389/frai.2021.694875

Cite this

@article{7681388be89e4bb7a35669cadc430807,

title = "Deep Learning–Based COVID-19 Pneumonia Classification Using Chest CT Images: Model Generalizability",

abstract = "Since the outbreak of the COVID-19 pandemic, worldwide research efforts have focused on using artificial intelligence (AI) technologies on various medical data of COVID-19–positive patients in order to identify or classify various aspects of the disease, with promising reported results. However, concerns have been raised over their generalizability, given the heterogeneous factors in training datasets. This study aims to examine the severity of this problem by evaluating deep learning (DL) classification models trained to identify COVID-19–positive patients on 3D computed tomography (CT) datasets from different countries. We collected one dataset at UT Southwestern (UTSW) and three external datasets from different countries: CC-CCII Dataset (China), COVID-CTset (Iran), and MosMedData (Russia). We divided the data into two classes: COVID-19–positive and COVID-19–negative patients. We trained nine identical DL-based classification models by using combinations of datasets with a 72% train, 8% validation, and 20% test data split. The models trained on a single dataset achieved accuracy/area under the receiver operating characteristic curve (AUC) values of 0.87/0.826 (UTSW), 0.97/0.988 (CC-CCCI), and 0.86/0.873 (COVID-CTset) when evaluated on their own dataset. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better. However, the performance dropped close to an AUC of 0.5 (random guess) for all models when evaluated on a different dataset outside of its training datasets. Including MosMedData, which only contained positive labels, into the training datasets did not necessarily help the performance of other datasets. Multiple factors likely contributed to these results, such as patient demographics and differences in image acquisition or reconstruction, causing a data shift among different study cohorts.",

keywords = "COVID-19, SARS-CoV-2, classification, computed tomography, convolutional neural network, deep learning, generalizability",

author = "Dan Nguyen and Fernando Kay and Jun Tan and Yulong Yan and Ng, {Yee Seng} and Puneeth Iyengar and Ron Peshock and Steve Jiang",

note = "Publisher Copyright: {\textcopyright} Copyright {\textcopyright} 2021 Nguyen, Kay, Tan, Yan, Ng, Iyengar, Peshock and Jiang.",

year = "2021",

month = jun,

day = "29",

doi = "10.3389/frai.2021.694875",

language = "English (US)",

volume = "4",

journal = "Frontiers in Artificial Intelligence",

issn = "2624-8212",

publisher = "Frontiers Media S. A.",

}

TY - JOUR

T1 - Deep Learning–Based COVID-19 Pneumonia Classification Using Chest CT Images

T2 - Model Generalizability

AU - Nguyen, Dan

AU - Kay, Fernando

AU - Tan, Jun

AU - Yan, Yulong

AU - Ng, Yee Seng

AU - Iyengar, Puneeth

AU - Peshock, Ron

AU - Jiang, Steve

PY - 2021/6/29

Y1 - 2021/6/29

N2 - Since the outbreak of the COVID-19 pandemic, worldwide research efforts have focused on using artificial intelligence (AI) technologies on various medical data of COVID-19–positive patients in order to identify or classify various aspects of the disease, with promising reported results. However, concerns have been raised over their generalizability, given the heterogeneous factors in training datasets. This study aims to examine the severity of this problem by evaluating deep learning (DL) classification models trained to identify COVID-19–positive patients on 3D computed tomography (CT) datasets from different countries. We collected one dataset at UT Southwestern (UTSW) and three external datasets from different countries: CC-CCII Dataset (China), COVID-CTset (Iran), and MosMedData (Russia). We divided the data into two classes: COVID-19–positive and COVID-19–negative patients. We trained nine identical DL-based classification models by using combinations of datasets with a 72% train, 8% validation, and 20% test data split. The models trained on a single dataset achieved accuracy/area under the receiver operating characteristic curve (AUC) values of 0.87/0.826 (UTSW), 0.97/0.988 (CC-CCCI), and 0.86/0.873 (COVID-CTset) when evaluated on their own dataset. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better. However, the performance dropped close to an AUC of 0.5 (random guess) for all models when evaluated on a different dataset outside of its training datasets. Including MosMedData, which only contained positive labels, into the training datasets did not necessarily help the performance of other datasets. Multiple factors likely contributed to these results, such as patient demographics and differences in image acquisition or reconstruction, causing a data shift among different study cohorts.

AB - Since the outbreak of the COVID-19 pandemic, worldwide research efforts have focused on using artificial intelligence (AI) technologies on various medical data of COVID-19–positive patients in order to identify or classify various aspects of the disease, with promising reported results. However, concerns have been raised over their generalizability, given the heterogeneous factors in training datasets. This study aims to examine the severity of this problem by evaluating deep learning (DL) classification models trained to identify COVID-19–positive patients on 3D computed tomography (CT) datasets from different countries. We collected one dataset at UT Southwestern (UTSW) and three external datasets from different countries: CC-CCII Dataset (China), COVID-CTset (Iran), and MosMedData (Russia). We divided the data into two classes: COVID-19–positive and COVID-19–negative patients. We trained nine identical DL-based classification models by using combinations of datasets with a 72% train, 8% validation, and 20% test data split. The models trained on a single dataset achieved accuracy/area under the receiver operating characteristic curve (AUC) values of 0.87/0.826 (UTSW), 0.97/0.988 (CC-CCCI), and 0.86/0.873 (COVID-CTset) when evaluated on their own dataset. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better. However, the performance dropped close to an AUC of 0.5 (random guess) for all models when evaluated on a different dataset outside of its training datasets. Including MosMedData, which only contained positive labels, into the training datasets did not necessarily help the performance of other datasets. Multiple factors likely contributed to these results, such as patient demographics and differences in image acquisition or reconstruction, causing a data shift among different study cohorts.

KW - COVID-19

KW - SARS-CoV-2

KW - classification

KW - computed tomography

KW - convolutional neural network

KW - deep learning

KW - generalizability

UR - http://www.scopus.com/inward/record.url?scp=85117060707&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85117060707&partnerID=8YFLogxK

U2 - 10.3389/frai.2021.694875

DO - 10.3389/frai.2021.694875

M3 - Article

C2 - 34268489

AN - SCOPUS:85117060707

SN - 2624-8212

VL - 4

JO - Frontiers in Artificial Intelligence

JF - Frontiers in Artificial Intelligence

M1 - 694875

ER -

Deep Learning–Based COVID-19 Pneumonia Classification Using Chest CT Images: Model Generalizability

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this