TY - JOUR
T1 - A sequence-based global map of regulatory activity for deciphering human genetics
AU - Chen, Kathleen M.
AU - Wong, Aaron K.
AU - Troyanskaya, Olga G.
AU - Zhou, Jian
N1 - Funding Information:
We thank all members of the Troyanskaya lab for helpful discussions. This work was performed using the high-performance computing resources, supported by the Scientific Computing Core, at the Flatiron Institute and the Terascale Infrastructure for Groundbreaking Research in Science and Engineering high-performance computer center at Princeton University. K.M.C. is supported by the National Science Foundation Graduate Research Fellowship Program (no. NSF-GRFP). O.G.T. is supported by National Institutes of Health (NIH) grant nos. R01HG005998, U54HL117798 and R01GM071966, U.S. Department of Health and Human Services grant no. HHSN272201000054C and Simons Foundation grant no. 395506. O.G.T. is a senior fellow of the Genetic Networks program of the Canadian Institute for Advanced Research. J.Z. is supported by a Cancer Prevention and Research Institute of Texas grant no. RR190071, NIH grant no. DP2GM146336 and the UT Southwestern Endowed Scholars Program.
Funding Information:
We thank all members of the Troyanskaya lab for helpful discussions. This work was performed using the high-performance computing resources, supported by the Scientific Computing Core, at the Flatiron Institute and the Terascale Infrastructure for Groundbreaking Research in Science and Engineering high-performance computer center at Princeton University. K.M.C. is supported by the National Science Foundation Graduate Research Fellowship Program (no. NSF-GRFP). O.G.T. is supported by National Institutes of Health (NIH) grant nos. R01HG005998, U54HL117798 and R01GM071966, U.S. Department of Health and Human Services grant no. HHSN272201000054C and Simons Foundation grant no. 395506. O.G.T. is a senior fellow of the Genetic Networks program of the Canadian Institute for Advanced Research. J.Z. is supported by a Cancer Prevention and Research Institute of Texas grant no. RR190071, NIH grant no. DP2GM146336 and the UT Southwestern Endowed Scholars Program.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/7
Y1 - 2022/7
N2 - Epigenomic profiling has enabled large-scale identification of regulatory elements, yet we still lack a systematic mapping from any sequence or variant to regulatory activities. We address this challenge with Sei, a framework for integrating human genetics data with sequence information to discover the regulatory basis of traits and diseases. Sei learns a vocabulary of regulatory activities, called sequence classes, using a deep learning model that predicts 21,907 chromatin profiles across >1,300 cell lines and tissues. Sequence classes provide a global classification and quantification of sequence and variant effects based on diverse regulatory activities, such as cell type-specific enhancer functions. These predictions are supported by tissue-specific expression, expression quantitative trait loci and evolutionary constraint data. Furthermore, sequence classes enable characterization of the tissue-specific, regulatory architecture of complex traits and generate mechanistic hypotheses for individual regulatory pathogenic mutations. We provide Sei as a resource to elucidate the regulatory basis of human health and disease.
AB - Epigenomic profiling has enabled large-scale identification of regulatory elements, yet we still lack a systematic mapping from any sequence or variant to regulatory activities. We address this challenge with Sei, a framework for integrating human genetics data with sequence information to discover the regulatory basis of traits and diseases. Sei learns a vocabulary of regulatory activities, called sequence classes, using a deep learning model that predicts 21,907 chromatin profiles across >1,300 cell lines and tissues. Sequence classes provide a global classification and quantification of sequence and variant effects based on diverse regulatory activities, such as cell type-specific enhancer functions. These predictions are supported by tissue-specific expression, expression quantitative trait loci and evolutionary constraint data. Furthermore, sequence classes enable characterization of the tissue-specific, regulatory architecture of complex traits and generate mechanistic hypotheses for individual regulatory pathogenic mutations. We provide Sei as a resource to elucidate the regulatory basis of human health and disease.
UR - http://www.scopus.com/inward/record.url?scp=85133848044&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133848044&partnerID=8YFLogxK
U2 - 10.1038/s41588-022-01102-2
DO - 10.1038/s41588-022-01102-2
M3 - Article
C2 - 35817977
AN - SCOPUS:85133848044
SN - 1061-4036
VL - 54
SP - 940
EP - 949
JO - Nature genetics
JF - Nature genetics
IS - 7
ER -