Statistics of random protein superpositions: p-Values for pairwise structure alignment

James O. Wrabl; Nick V. Grishin

doi:10.1089/cmb.2007.0161

Statistics of random protein superpositions: p-Values for pairwise structure alignment

James O. Wrabl, Nick V. Grishin

Research output: Contribution to journal › Article › peer-review

8 Scopus citations

Abstract

Quantification of statistical significance is essential for the interpretation of protein structural similarity. To address this, a random model for protein structure comparison was developed. Novelty of the model is threefold. First, a sample of random structure comparisons is restricted to molecules of the same size and shape as the superposition of interest. Second, careful selection of the sample and accurate modeling of shape allows approximation of the root mean square deviation (RMSD) distribution of random comparisons with a Nakagami probability density function. Third, through convolution, a second probability density function is obtained that describes the coordinate difference vector projections underlying the random distribution of RMSD. This last feature allows sampling random distributions of not only RMSD, but also any similarity score that depends on difference vector projections, such as GDT_TS score, TM score, and LiveBench 3D score. Probabilities estimated from the method correlate well with common measures of structural similarity, such as the Dali Z-score and the GDT_TS score. As a result, the p-value for a given superposition can be calculated using simple formulae depending on RMSD, radius of gyration, and thinnest molecular dimension. In addition to scoring structural similarity, p-values computed by this method can be applied to evaluation of homology modeling techniques, providing a statistically sound alternative to scores used in reference-independent evaluation of alignment quality.

Original language	English (US)
Pages (from-to)	317-355
Number of pages	39
Journal	Journal of Computational Biology
Volume	15
Issue number	3
DOIs	https://doi.org/10.1089/cmb.2007.0161
State	Published - Apr 1 2008

Keywords

Protein structure alignment
RMSD
Random model
Statistical significance
Superposition

ASJC Scopus subject areas

Modeling and Simulation
Molecular Biology
Genetics
Computational Mathematics
Computational Theory and Mathematics

Access to Document

10.1089/cmb.2007.0161

Cite this

@article{493cba885a3a4e82a0c9861de3d7f695,

title = "Statistics of random protein superpositions: p-Values for pairwise structure alignment",

abstract = "Quantification of statistical significance is essential for the interpretation of protein structural similarity. To address this, a random model for protein structure comparison was developed. Novelty of the model is threefold. First, a sample of random structure comparisons is restricted to molecules of the same size and shape as the superposition of interest. Second, careful selection of the sample and accurate modeling of shape allows approximation of the root mean square deviation (RMSD) distribution of random comparisons with a Nakagami probability density function. Third, through convolution, a second probability density function is obtained that describes the coordinate difference vector projections underlying the random distribution of RMSD. This last feature allows sampling random distributions of not only RMSD, but also any similarity score that depends on difference vector projections, such as GDT_TS score, TM score, and LiveBench 3D score. Probabilities estimated from the method correlate well with common measures of structural similarity, such as the Dali Z-score and the GDT_TS score. As a result, the p-value for a given superposition can be calculated using simple formulae depending on RMSD, radius of gyration, and thinnest molecular dimension. In addition to scoring structural similarity, p-values computed by this method can be applied to evaluation of homology modeling techniques, providing a statistically sound alternative to scores used in reference-independent evaluation of alignment quality.",

keywords = "Protein structure alignment, RMSD, Random model, Statistical significance, Superposition",

author = "Wrabl, {James O.} and Grishin, {Nick V.}",

year = "2008",

month = apr,

day = "1",

doi = "10.1089/cmb.2007.0161",

language = "English (US)",

volume = "15",

pages = "317--355",

journal = "Journal of Computational Biology",

issn = "1066-5277",

publisher = "Mary Ann Liebert Inc.",

number = "3",

}

TY - JOUR

T1 - Statistics of random protein superpositions

T2 - p-Values for pairwise structure alignment

AU - Wrabl, James O.

AU - Grishin, Nick V.

PY - 2008/4/1

Y1 - 2008/4/1

N2 - Quantification of statistical significance is essential for the interpretation of protein structural similarity. To address this, a random model for protein structure comparison was developed. Novelty of the model is threefold. First, a sample of random structure comparisons is restricted to molecules of the same size and shape as the superposition of interest. Second, careful selection of the sample and accurate modeling of shape allows approximation of the root mean square deviation (RMSD) distribution of random comparisons with a Nakagami probability density function. Third, through convolution, a second probability density function is obtained that describes the coordinate difference vector projections underlying the random distribution of RMSD. This last feature allows sampling random distributions of not only RMSD, but also any similarity score that depends on difference vector projections, such as GDT_TS score, TM score, and LiveBench 3D score. Probabilities estimated from the method correlate well with common measures of structural similarity, such as the Dali Z-score and the GDT_TS score. As a result, the p-value for a given superposition can be calculated using simple formulae depending on RMSD, radius of gyration, and thinnest molecular dimension. In addition to scoring structural similarity, p-values computed by this method can be applied to evaluation of homology modeling techniques, providing a statistically sound alternative to scores used in reference-independent evaluation of alignment quality.

AB - Quantification of statistical significance is essential for the interpretation of protein structural similarity. To address this, a random model for protein structure comparison was developed. Novelty of the model is threefold. First, a sample of random structure comparisons is restricted to molecules of the same size and shape as the superposition of interest. Second, careful selection of the sample and accurate modeling of shape allows approximation of the root mean square deviation (RMSD) distribution of random comparisons with a Nakagami probability density function. Third, through convolution, a second probability density function is obtained that describes the coordinate difference vector projections underlying the random distribution of RMSD. This last feature allows sampling random distributions of not only RMSD, but also any similarity score that depends on difference vector projections, such as GDT_TS score, TM score, and LiveBench 3D score. Probabilities estimated from the method correlate well with common measures of structural similarity, such as the Dali Z-score and the GDT_TS score. As a result, the p-value for a given superposition can be calculated using simple formulae depending on RMSD, radius of gyration, and thinnest molecular dimension. In addition to scoring structural similarity, p-values computed by this method can be applied to evaluation of homology modeling techniques, providing a statistically sound alternative to scores used in reference-independent evaluation of alignment quality.

KW - Protein structure alignment

KW - RMSD

KW - Random model

KW - Statistical significance

KW - Superposition

UR - http://www.scopus.com/inward/record.url?scp=41349094868&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=41349094868&partnerID=8YFLogxK

U2 - 10.1089/cmb.2007.0161

DO - 10.1089/cmb.2007.0161

M3 - Article

C2 - 18333756

AN - SCOPUS:41349094868

SN - 1066-5277

VL - 15

SP - 317

EP - 355

JO - Journal of Computational Biology

JF - Journal of Computational Biology

IS - 3

ER -

Statistics of random protein superpositions: p-Values for pairwise structure alignment

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this