TY - JOUR
T1 - MB-GAN
T2 - Microbiome Simulation via Generative Adversarial Network
AU - Rong, Ruichen
AU - Jiang, Shuang
AU - Xu, Lin
AU - Xiao, Guanghua
AU - Xie, Yang
AU - Liu, Dajiang J.
AU - Li, Qiwei
AU - Zhan, Xiaowei
N1 - Publisher Copyright:
© 2021 The Author(s). Published by Oxford University Press GigaScience.
PY - 2021/2/1
Y1 - 2021/2/1
N2 - Background: Trillions of microbes inhabit the human body and have a profound effect on human health. The recent development of metagenome-wide association studies and other quantitative analysis methods accelerate the discovery of the associations between human microbiome and diseases. To assess the strengths and limitations of these analytical tools, simulating realistic microbiome datasets is critically important. However, simulating the real microbiome data is challenging because it is difficult to model their correlation structure using explicit statistical models. Results: To address the challenge of simulating realistic microbiome data, we designed a novel simulation framework termed MB-GAN, by using a generative adversarial network (GAN) and utilizing methodology advancements from the deep learning community. MB-GAN can automatically learn from given microbial abundances and compute simulated abundances that are indistinguishable from them. In practice, MB-GAN showed the following advantages. First, MB-GAN avoids explicit statistical modeling assumptions, and it only requires real datasets as inputs. Second, unlike the traditional GANs, MB-GAN is easily applicable and can converge efficiently. Conclusions: By applying MB-GAN to a case-control gut microbiome study of 396 samples, we demonstrated that the simulated data and the original data had similar first-order and second-order properties, including sparsity, diversities, and taxa-Taxa correlations. These advantages are suitable for further microbiome methodology development where high-fidelity microbiome data are needed.
AB - Background: Trillions of microbes inhabit the human body and have a profound effect on human health. The recent development of metagenome-wide association studies and other quantitative analysis methods accelerate the discovery of the associations between human microbiome and diseases. To assess the strengths and limitations of these analytical tools, simulating realistic microbiome datasets is critically important. However, simulating the real microbiome data is challenging because it is difficult to model their correlation structure using explicit statistical models. Results: To address the challenge of simulating realistic microbiome data, we designed a novel simulation framework termed MB-GAN, by using a generative adversarial network (GAN) and utilizing methodology advancements from the deep learning community. MB-GAN can automatically learn from given microbial abundances and compute simulated abundances that are indistinguishable from them. In practice, MB-GAN showed the following advantages. First, MB-GAN avoids explicit statistical modeling assumptions, and it only requires real datasets as inputs. Second, unlike the traditional GANs, MB-GAN is easily applicable and can converge efficiently. Conclusions: By applying MB-GAN to a case-control gut microbiome study of 396 samples, we demonstrated that the simulated data and the original data had similar first-order and second-order properties, including sparsity, diversities, and taxa-Taxa correlations. These advantages are suitable for further microbiome methodology development where high-fidelity microbiome data are needed.
KW - deep learning
KW - generative adversarial network
KW - microbiome simulation
UR - http://www.scopus.com/inward/record.url?scp=85101202312&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101202312&partnerID=8YFLogxK
U2 - 10.1093/gigascience/giab005
DO - 10.1093/gigascience/giab005
M3 - Article
C2 - 33543271
AN - SCOPUS:85101202312
SN - 2047-217X
VL - 10
JO - GigaScience
JF - GigaScience
IS - 2
M1 - giab005
ER -