TY - JOUR
T1 - Differentiation of Hispanic biogeographic ancestry with 80 ancestry informative markers
AU - Setser, Casandra H.
AU - Planz, John V.
AU - Barber, Robert C.
AU - Phillips, Nicole R.
AU - Chakraborty, Ranajit
AU - Cross, Deanna S.
N1 - Funding Information:
Funding support for the Genomic Origins and Admixture in Latinos (GOAL) Study was provided by the National Institute of General Medical Sciences (1R01GM090087). Additional support for sample collection was provided by a grant from the Stanley J. Glaser Foundation and the Dr. John T. Macdonald Foundation Department of Human Genetics. The dataset used for the analyses described in this manuscript was obtained from dbGaP through accession number phs000750.v1.p1. The authors would like to thank the late Dr. Arthur Eisenberg for the inspiration behind this project based on the needs of the Center for Human Identification and DNA ProKids. We would also like to thank the late Dr. Ranajit Chakraborty who was instrumental in the design of this research. We thank Dr. Carlos Bustamante for making the GOAL dataset available via dbGaP. Dr. Xiangpei Zeng helped with STRUCTURE and Dr. Frank Wendt helped with access and use of data from 1000 Genomes. Dr. Gita Pathak helped in many small but significant ways in discussing concepts and troubleshooting software.
Funding Information:
The Genomic Origins and Admixture in Latinos (GOAL) dataset analyzed during the current study is available in the dbGaP repository, accession number phs000750.v1.p1, found at: https://www. ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000750.v1.p1&phv=202273&phd=4443&pha = &pht=3936&phvf = &phdf = &phaf = &phtf = &dssp=1&consent = &temp=1. Funding support for the GOAL Study was provided by the National Institute of General Medical Sciences (1R01GM090087). Additional support for sample collection was provided by a grant from the Stanley J. Glaser Foundation and the Dr. John T. Macdonald Foundation Department of Human Genetics.
Publisher Copyright:
© 2020, The Author(s).
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Ancestry informative single nucleotide polymorphisms (SNPs) can identify biogeographic ancestry (BGA); however, population substructure and relatively recent admixture can make differentiation difficult in heterogeneous Hispanic populations. Utilizing unrelated individuals from the Genomic Origins and Admixture in Latinos dataset (GOAL, n = 160), we designed an 80 SNP panel (Setser80) that accurately depicts BGA through STRUCTURE and PCA. We compared our Setser80 to the Seldin and Kidd panels via resampling simulations, which models data based on allele frequencies. We incorporated Admixed American 1000 Genomes populations (1000 G, n = 347), into a combined populations dataset to determine robustness. Using multinomial logistic regression (MLR), we compared the 3 panels on the combined dataset and found overall MLR classification accuracies: 93.2% Setser80, 87.9% Seldin panel, 71.4% Kidd panel. Naïve Bayesian classification had similar results on the combined dataset: 91.5% Setser80, 84.7% Seldin panel, 71.1% Kidd panel. Although Peru and Mexico were absent from panel design, we achieved high classification accuracy on the combined populations for Peru (MLR = 100%, naïve Bayes = 98%), and Mexico (MLR = 90%, naïve Bayes = 83.4%) as evidence of the portability of the Setser80. Our results indicate the Setser80 SNP panel can reliably classify BGA for individuals of presumed Hispanic origin.
AB - Ancestry informative single nucleotide polymorphisms (SNPs) can identify biogeographic ancestry (BGA); however, population substructure and relatively recent admixture can make differentiation difficult in heterogeneous Hispanic populations. Utilizing unrelated individuals from the Genomic Origins and Admixture in Latinos dataset (GOAL, n = 160), we designed an 80 SNP panel (Setser80) that accurately depicts BGA through STRUCTURE and PCA. We compared our Setser80 to the Seldin and Kidd panels via resampling simulations, which models data based on allele frequencies. We incorporated Admixed American 1000 Genomes populations (1000 G, n = 347), into a combined populations dataset to determine robustness. Using multinomial logistic regression (MLR), we compared the 3 panels on the combined dataset and found overall MLR classification accuracies: 93.2% Setser80, 87.9% Seldin panel, 71.4% Kidd panel. Naïve Bayesian classification had similar results on the combined dataset: 91.5% Setser80, 84.7% Seldin panel, 71.1% Kidd panel. Although Peru and Mexico were absent from panel design, we achieved high classification accuracy on the combined populations for Peru (MLR = 100%, naïve Bayes = 98%), and Mexico (MLR = 90%, naïve Bayes = 83.4%) as evidence of the portability of the Setser80. Our results indicate the Setser80 SNP panel can reliably classify BGA for individuals of presumed Hispanic origin.
UR - http://www.scopus.com/inward/record.url?scp=85084294577&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084294577&partnerID=8YFLogxK
U2 - 10.1038/s41598-020-64245-4
DO - 10.1038/s41598-020-64245-4
M3 - Article
C2 - 32385290
AN - SCOPUS:85084294577
SN - 2045-2322
VL - 10
JO - Scientific reports
JF - Scientific reports
IS - 1
M1 - 7745
ER -