Self consistency grouping: a stringent clustering method.

Bong Hyun Kim; Bhadrachalam Chitturi; Nick V. Grishin

doi:10.1186/1471-2105-13-S13-S3

Self consistency grouping: a stringent clustering method.

Bong Hyun Kim, Bhadrachalam Chitturi, Nick V. Grishin

Research output: Contribution to journal › Article › peer-review

Abstract

Numerous types of clustering like single linkage and K-means have been widely studied and applied to a variety of scientific problems. However, the existing methods are not readily applicable for the problems that demand high stringency. Our method, self consistency grouping, i.e. SCG, yields clusters whose members are closer in rank to each other than to any member outside the cluster. We do not define a distance metric; we use the best known distance metric and presume that it measures the correct distance. SCG does not impose any restriction on the size or the number of the clusters that it finds. The boundaries of clusters are determined by the inconsistencies in the ranks. In addition to the direct implementation that finds the complete structure of the (sub)clusters we implemented two faster versions. The fastest version is guaranteed to find only the clusters that are not subclusters of any other clusters and the other version yields the same output as the direct implementation but does so more efficiently. Our tests have demonstrated that SCG yields very few false positives. This was accomplished by introducing errors in the distance measurement. Clustering of protein domain representatives by structural similarity showed that SCG could recover homologous groups with high precision. SCG has potential for finding biological relationships under stringent conditions.

Original language	English (US)
Pages (from-to)	S3
Journal	Unknown Journal
Volume	13 Suppl 13
DOIs	https://doi.org/10.1186/1471-2105-13-S13-S3
State	Published - 2012

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

Access to Document

10.1186/1471-2105-13-S13-S3

Cite this

@article{9a5a14d23d9840e4b2075e3fe726824d,

title = "Self consistency grouping: a stringent clustering method.",

abstract = "Numerous types of clustering like single linkage and K-means have been widely studied and applied to a variety of scientific problems. However, the existing methods are not readily applicable for the problems that demand high stringency. Our method, self consistency grouping, i.e. SCG, yields clusters whose members are closer in rank to each other than to any member outside the cluster. We do not define a distance metric; we use the best known distance metric and presume that it measures the correct distance. SCG does not impose any restriction on the size or the number of the clusters that it finds. The boundaries of clusters are determined by the inconsistencies in the ranks. In addition to the direct implementation that finds the complete structure of the (sub)clusters we implemented two faster versions. The fastest version is guaranteed to find only the clusters that are not subclusters of any other clusters and the other version yields the same output as the direct implementation but does so more efficiently. Our tests have demonstrated that SCG yields very few false positives. This was accomplished by introducing errors in the distance measurement. Clustering of protein domain representatives by structural similarity showed that SCG could recover homologous groups with high precision. SCG has potential for finding biological relationships under stringent conditions.",

author = "Kim, {Bong Hyun} and Bhadrachalam Chitturi and Grishin, {Nick V.}",

year = "2012",

doi = "10.1186/1471-2105-13-S13-S3",

language = "English (US)",

volume = "13 Suppl 13",

pages = "S3",

journal = "Unknown Journal",

issn = "1744-165X",

publisher = "W.B. Saunders Ltd",

}

TY - JOUR

T1 - Self consistency grouping

T2 - a stringent clustering method.

AU - Kim, Bong Hyun

AU - Chitturi, Bhadrachalam

AU - Grishin, Nick V.

PY - 2012

Y1 - 2012

N2 - Numerous types of clustering like single linkage and K-means have been widely studied and applied to a variety of scientific problems. However, the existing methods are not readily applicable for the problems that demand high stringency. Our method, self consistency grouping, i.e. SCG, yields clusters whose members are closer in rank to each other than to any member outside the cluster. We do not define a distance metric; we use the best known distance metric and presume that it measures the correct distance. SCG does not impose any restriction on the size or the number of the clusters that it finds. The boundaries of clusters are determined by the inconsistencies in the ranks. In addition to the direct implementation that finds the complete structure of the (sub)clusters we implemented two faster versions. The fastest version is guaranteed to find only the clusters that are not subclusters of any other clusters and the other version yields the same output as the direct implementation but does so more efficiently. Our tests have demonstrated that SCG yields very few false positives. This was accomplished by introducing errors in the distance measurement. Clustering of protein domain representatives by structural similarity showed that SCG could recover homologous groups with high precision. SCG has potential for finding biological relationships under stringent conditions.

AB - Numerous types of clustering like single linkage and K-means have been widely studied and applied to a variety of scientific problems. However, the existing methods are not readily applicable for the problems that demand high stringency. Our method, self consistency grouping, i.e. SCG, yields clusters whose members are closer in rank to each other than to any member outside the cluster. We do not define a distance metric; we use the best known distance metric and presume that it measures the correct distance. SCG does not impose any restriction on the size or the number of the clusters that it finds. The boundaries of clusters are determined by the inconsistencies in the ranks. In addition to the direct implementation that finds the complete structure of the (sub)clusters we implemented two faster versions. The fastest version is guaranteed to find only the clusters that are not subclusters of any other clusters and the other version yields the same output as the direct implementation but does so more efficiently. Our tests have demonstrated that SCG yields very few false positives. This was accomplished by introducing errors in the distance measurement. Clustering of protein domain representatives by structural similarity showed that SCG could recover homologous groups with high precision. SCG has potential for finding biological relationships under stringent conditions.

UR - http://www.scopus.com/inward/record.url?scp=84877592087&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877592087&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-13-S13-S3

DO - 10.1186/1471-2105-13-S13-S3

M3 - Article

C2 - 23320864

AN - SCOPUS:84877592087

SN - 1744-165X

VL - 13 Suppl 13

SP - S3

JO - Unknown Journal

JF - Unknown Journal

ER -

Self consistency grouping: a stringent clustering method.

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this