Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization

James O. Wrabl, Nick V. Grishin

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

Understanding of amino acid type co-occurrence in trusted multiple sequence alignments is a prerequisite for improved sequence alignment and remote homology detection algorithms. Two objective approaches were used to investigate co-occurrence, both based on variance maximization of the weighted residue frequencies in columns taken from a large alignment database. The first approach discretely grouped amino acid types, and the second approach extracted orthogonal properties of amino acids using principal components analysis. The grouping results corresponded to amino acid physical properties such as side chain hydrophobicity, size, or backbone flexibility, and an optimal arrangement of approximately eight groups was observed. However, interpretation of the orthogonal properties was more complex. Although the principal components accounting for the largest variances exhibited modest correlations with hydrophobicity and conservation of glycine, in general principal components did not correspond to physical properties of amino acids. Although not intuitive, these amino acid mathematical properties were demonstrated to be robust and to improve local pairwise alignment accuracy, relative to 20 amino acid frequencies alone, for a simple test case.

Original languageEnglish (US)
Pages (from-to)523-534
Number of pages12
JournalProteins: Structure, Function and Genetics
Volume61
Issue number3
DOIs
StatePublished - Nov 15 2005

Keywords

  • Amino acid physical properties
  • Amino acid similarity
  • Principal components analysis

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology

Fingerprint

Dive into the research topics of 'Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization'. Together they form a unique fingerprint.

Cite this