Seqminer2: An efficient tool to query and retrieve genotypes for statistical genetics analyses from biobank scale sequence dataset

Lina Yang; Shuang Jiang; Bibo Jiang; Dajiang J. Liu; Xiaowei Zhan

doi:10.1093/bioinformatics/btaa628

Seqminer2: An efficient tool to query and retrieve genotypes for statistical genetics analyses from biobank scale sequence dataset

Lina Yang, Shuang Jiang, Bibo Jiang, Dajiang J. Liu, Xiaowei Zhan

Research output: Contribution to journal › Article › peer-review

Abstract

Here, we present a highly efficient R-package seqminer2 for querying and retrieving sequence variants from biobank scale datasets of millions of individuals and hundreds of millions of genetic variants. Seqminer2 implements a novel variant-based index for querying VCF/BCF files. It improves the speed of query and retrieval by several magnitudes compared to the state-of-the-art tools based upon tabix. It also reimplements support for BGEN and PLINK format, which improves speed over alternative implementations. The improved efficiency and comprehensive support for popular file formats will facilitate method development, software prototyping and data analysis of biobank scale sequence datasets in R. Availability and implementation: The seqminer2 R package is available from https://github.com/zhanxw/seqminer. Scripts used for the benchmarks are available in https://github.com/yang-lina/seqminer/blob/master/seqminer2%20benchmark%20script.txt.

Original language	English (US)
Pages (from-to)	4951-4954
Number of pages	4
Journal	Bioinformatics
Volume	36
Issue number	19
DOIs	https://doi.org/10.1093/bioinformatics/btaa628
State	Published - Oct 1 2020

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btaa628

Cite this

@article{b21b76d3f09f47f1876b2a8bf73ba97b,

title = "Seqminer2: An efficient tool to query and retrieve genotypes for statistical genetics analyses from biobank scale sequence dataset",

abstract = "Here, we present a highly efficient R-package seqminer2 for querying and retrieving sequence variants from biobank scale datasets of millions of individuals and hundreds of millions of genetic variants. Seqminer2 implements a novel variant-based index for querying VCF/BCF files. It improves the speed of query and retrieval by several magnitudes compared to the state-of-the-art tools based upon tabix. It also reimplements support for BGEN and PLINK format, which improves speed over alternative implementations. The improved efficiency and comprehensive support for popular file formats will facilitate method development, software prototyping and data analysis of biobank scale sequence datasets in R. Availability and implementation: The seqminer2 R package is available from https://github.com/zhanxw/seqminer. Scripts used for the benchmarks are available in https://github.com/yang-lina/seqminer/blob/master/seqminer2%20benchmark%20script.txt.",

author = "Lina Yang and Shuang Jiang and Bibo Jiang and Liu, {Dajiang J.} and Xiaowei Zhan",

year = "2020",

month = oct,

day = "1",

doi = "10.1093/bioinformatics/btaa628",

language = "English (US)",

volume = "36",

pages = "4951--4954",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "19",

}

TY - JOUR

T1 - Seqminer2

T2 - An efficient tool to query and retrieve genotypes for statistical genetics analyses from biobank scale sequence dataset

AU - Yang, Lina

AU - Jiang, Shuang

AU - Jiang, Bibo

AU - Liu, Dajiang J.

AU - Zhan, Xiaowei

PY - 2020/10/1

Y1 - 2020/10/1

N2 - Here, we present a highly efficient R-package seqminer2 for querying and retrieving sequence variants from biobank scale datasets of millions of individuals and hundreds of millions of genetic variants. Seqminer2 implements a novel variant-based index for querying VCF/BCF files. It improves the speed of query and retrieval by several magnitudes compared to the state-of-the-art tools based upon tabix. It also reimplements support for BGEN and PLINK format, which improves speed over alternative implementations. The improved efficiency and comprehensive support for popular file formats will facilitate method development, software prototyping and data analysis of biobank scale sequence datasets in R. Availability and implementation: The seqminer2 R package is available from https://github.com/zhanxw/seqminer. Scripts used for the benchmarks are available in https://github.com/yang-lina/seqminer/blob/master/seqminer2%20benchmark%20script.txt.

AB - Here, we present a highly efficient R-package seqminer2 for querying and retrieving sequence variants from biobank scale datasets of millions of individuals and hundreds of millions of genetic variants. Seqminer2 implements a novel variant-based index for querying VCF/BCF files. It improves the speed of query and retrieval by several magnitudes compared to the state-of-the-art tools based upon tabix. It also reimplements support for BGEN and PLINK format, which improves speed over alternative implementations. The improved efficiency and comprehensive support for popular file formats will facilitate method development, software prototyping and data analysis of biobank scale sequence datasets in R. Availability and implementation: The seqminer2 R package is available from https://github.com/zhanxw/seqminer. Scripts used for the benchmarks are available in https://github.com/yang-lina/seqminer/blob/master/seqminer2%20benchmark%20script.txt.

UR - http://www.scopus.com/inward/record.url?scp=85097576593&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85097576593&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btaa628

DO - 10.1093/bioinformatics/btaa628

M3 - Article

C2 - 32756942

AN - SCOPUS:85097576593

SN - 1367-4803

VL - 36

SP - 4951

EP - 4954

JO - Bioinformatics

JF - Bioinformatics

IS - 19

ER -

Seqminer2: An efficient tool to query and retrieve genotypes for statistical genetics analyses from biobank scale sequence dataset

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this