TY - JOUR
T1 - The DBSAV Database
T2 - Predicting Deleteriousness of Single Amino Acid Variations in the Human Proteome
AU - Pei, Jimin
AU - Grishin, Nick V.
N1 - Funding Information:
We thank Dr. Lisa Kinch for helpful discussions and Ming Tang for technical support. The study is supported in part by the grants (to NVG) from the National Institutes of Health (GM127390) and the Welch Foundation (I-1505).
Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/5/28
Y1 - 2021/5/28
N2 - Deleterious single amino acid variation (SAV) is one of the leading causes of human diseases. Evaluating the functional impact of SAVs is crucial for diagnosis of genetic disorders. We previously developed a deep convolutional neural network predictor, DeepSAV, to evaluate the deleterious effects of SAVs on protein function based on various sequence, structural, and functional properties. DeepSAV scores of rare SAVs observed in the human population are aggregated into a gene-level score called GTS (Gene Tolerance of rare SAVs) that reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. In this study, we aim to enhance the performance of DeepSAV by using expanded datasets of pathogenic and benign variants, more features, and neural network optimization. We found that multiple sequence alignments built from vertebrate-level orthologs yield better prediction results compared to those built from mammalian-level orthologs. For multiple sequence alignments built from BLAST searches, optimal performance was achieved with a sequence identify cutoff of 50% to remove distant homologs. The new version of DeepSAV exhibits the best performance among standalone predictors of deleterious effects of SAVs. We developed the DBSAV database (http://prodata.swmed.edu/DBSAV) that reports GTS scores of human genes and DeepSAV scores of SAVs in the human proteome, including pathogenic and benign SAVs, population-level SAVs, and all possible SAVs by single nucleotide variations. This database serves as a useful resource for research of human SAVs and their relationships with protein functions and human diseases.
AB - Deleterious single amino acid variation (SAV) is one of the leading causes of human diseases. Evaluating the functional impact of SAVs is crucial for diagnosis of genetic disorders. We previously developed a deep convolutional neural network predictor, DeepSAV, to evaluate the deleterious effects of SAVs on protein function based on various sequence, structural, and functional properties. DeepSAV scores of rare SAVs observed in the human population are aggregated into a gene-level score called GTS (Gene Tolerance of rare SAVs) that reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. In this study, we aim to enhance the performance of DeepSAV by using expanded datasets of pathogenic and benign variants, more features, and neural network optimization. We found that multiple sequence alignments built from vertebrate-level orthologs yield better prediction results compared to those built from mammalian-level orthologs. For multiple sequence alignments built from BLAST searches, optimal performance was achieved with a sequence identify cutoff of 50% to remove distant homologs. The new version of DeepSAV exhibits the best performance among standalone predictors of deleterious effects of SAVs. We developed the DBSAV database (http://prodata.swmed.edu/DBSAV) that reports GTS scores of human genes and DeepSAV scores of SAVs in the human proteome, including pathogenic and benign SAVs, population-level SAVs, and all possible SAVs by single nucleotide variations. This database serves as a useful resource for research of human SAVs and their relationships with protein functions and human diseases.
KW - benign variants
KW - genetic variations
KW - neural network predictor
KW - pathogenic variants
KW - variant deleteriousness prediction
UR - http://www.scopus.com/inward/record.url?scp=85103011760&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103011760&partnerID=8YFLogxK
U2 - 10.1016/j.jmb.2021.166915
DO - 10.1016/j.jmb.2021.166915
M3 - Article
C2 - 33676930
AN - SCOPUS:85103011760
SN - 0022-2836
VL - 433
JO - Journal of Molecular Biology
JF - Journal of Molecular Biology
IS - 11
M1 - 166915
ER -