Statistics of random protein superpositions: p-Values for pairwise structure alignment

James O. Wrabl, Nick V. Grishin

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Quantification of statistical significance is essential for the interpretation of protein structural similarity. To address this, a random model for protein structure comparison was developed. Novelty of the model is threefold. First, a sample of random structure comparisons is restricted to molecules of the same size and shape as the superposition of interest. Second, careful selection of the sample and accurate modeling of shape allows approximation of the root mean square deviation (RMSD) distribution of random comparisons with a Nakagami probability density function. Third, through convolution, a second probability density function is obtained that describes the coordinate difference vector projections underlying the random distribution of RMSD. This last feature allows sampling random distributions of not only RMSD, but also any similarity score that depends on difference vector projections, such as GDT_TS score, TM score, and LiveBench 3D score. Probabilities estimated from the method correlate well with common measures of structural similarity, such as the Dali Z-score and the GDT_TS score. As a result, the p-value for a given superposition can be calculated using simple formulae depending on RMSD, radius of gyration, and thinnest molecular dimension. In addition to scoring structural similarity, p-values computed by this method can be applied to evaluation of homology modeling techniques, providing a statistically sound alternative to scores used in reference-independent evaluation of alignment quality.

Original languageEnglish (US)
Pages (from-to)317-355
Number of pages39
JournalJournal of Computational Biology
Volume15
Issue number3
DOIs
StatePublished - Apr 1 2008

Keywords

  • Protein structure alignment
  • RMSD
  • Random model
  • Statistical significance
  • Superposition

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Statistics of random protein superpositions: p-Values for pairwise structure alignment'. Together they form a unique fingerprint.

Cite this