Phylogeny based on whole genome as inferred from complete information set analysis

W. Li, W. Fang, L. Ling, J. Wang, Z. Xuan, R. Chen

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Previous molecular phylogeny algorithms mainly rely on multi-sequence alignments of cautiously selected characteristic sequences, thus not directly appropriate for whole genome phylogeny where events such as rearrangements make full-length alignments impossible. We introduce here the concept of Complete Information Set (CIS) and its measurement implementation as evolution distance without reference to sizes. As method proof-test, the 16s rRNA sequences of 22 completely sequenced Bacteria and Archaea species are used to reconstruct a phylogenetic tree, which is generally consistent with the commonly accepted one. Based on whole genome, our further efforts yield a highly robust whole genome phylogenetic tree, supporting separate monophyletic cluster of species with similar phenotype as well as the early evolution of thermophilic Bacteria and late diverging of Eukarya. The purpose of this work is not to contradict or confirm previous phylogeny standards but rather to bring a brand-new algorithm and tool to the phylogeny research community. The software to estimate the sequence distance and materials used in this study are available upon request to corresponding author.

Original languageEnglish (US)
Pages (from-to)439-447
Number of pages9
JournalJournal of Biological Physics
Issue number3
StatePublished - 2002
Externally publishedYes


  • Comparative genomics
  • Information discrepancy
  • Molecular evolution
  • Sequence analysis

ASJC Scopus subject areas

  • Biophysics
  • Atomic and Molecular Physics, and Optics
  • Molecular Biology
  • Cell Biology


Dive into the research topics of 'Phylogeny based on whole genome as inferred from complete information set analysis'. Together they form a unique fingerprint.

Cite this