TY - JOUR
T1 - GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation
AU - Zhang, Hongyi
AU - Zhan, Xiaowei
AU - Li, Bo
N1 - Funding Information:
This work is supported by the following funding sources: Cancer Prevention and Research Institute of Texas (CPRIT) RR170079 (B.L.), NCI 1R01CA245318 (B.L.), NIGMS 5R01GM126479 (X.Z.), and CPRIT RP190107 (X.Z.).
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Similarity in T-cell receptor (TCR) sequences implies shared antigen specificity between receptors, and could be used to discover novel therapeutic targets. However, existing methods that cluster T-cell receptor sequences by similarity are computationally inefficient, making them impractical to use on the ever-expanding datasets of the immune repertoire. Here, we developed GIANA (Geometric Isometry-based TCR AligNment Algorithm) a computationally efficient tool for this task that provides the same level of clustering specificity as TCRdist at 600 times its speed, and without sacrificing accuracy. GIANA also allows the rapid query of large reference cohorts within minutes. Using GIANA to cluster large-scale TCR datasets provides candidate disease-specific receptors, and provides a new solution to repertoire classification. Querying unseen TCR-seq samples against an existing reference differentiates samples from patients across various cohorts associated with cancer, infectious and autoimmune disease. Our results demonstrate how GIANA could be used as the basis for a TCR-based non-invasive multi-disease diagnostic platform.
AB - Similarity in T-cell receptor (TCR) sequences implies shared antigen specificity between receptors, and could be used to discover novel therapeutic targets. However, existing methods that cluster T-cell receptor sequences by similarity are computationally inefficient, making them impractical to use on the ever-expanding datasets of the immune repertoire. Here, we developed GIANA (Geometric Isometry-based TCR AligNment Algorithm) a computationally efficient tool for this task that provides the same level of clustering specificity as TCRdist at 600 times its speed, and without sacrificing accuracy. GIANA also allows the rapid query of large reference cohorts within minutes. Using GIANA to cluster large-scale TCR datasets provides candidate disease-specific receptors, and provides a new solution to repertoire classification. Querying unseen TCR-seq samples against an existing reference differentiates samples from patients across various cohorts associated with cancer, infectious and autoimmune disease. Our results demonstrate how GIANA could be used as the basis for a TCR-based non-invasive multi-disease diagnostic platform.
UR - http://www.scopus.com/inward/record.url?scp=85111964824&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85111964824&partnerID=8YFLogxK
U2 - 10.1038/s41467-021-25006-7
DO - 10.1038/s41467-021-25006-7
M3 - Article
C2 - 34349111
AN - SCOPUS:85111964824
SN - 2041-1723
VL - 12
JO - Nature communications
JF - Nature communications
IS - 1
M1 - 4699
ER -