Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile

Nitin Kumar Singh; M. Vidyasagar; Michael A. White

doi:10.1109/LISSA.2011.5754170

Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile

Nitin Kumar Singh, M. Vidyasagar, Michael A. White

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

In this paper we study the problem of predicting which genes are likely to have a role in tumor-suppression in lung and colorectal cancer. Mutation frequencies alone cannot serve to differentiate between drivers (mutations that cause cancer) and passengers (mutations that are caused by cancer) some other features must be added. Our hypothesis is that the developmental stage gene expression profile provides one such additional feature, that can potentially serve to differentiate between drivers and passengers. The developmental stage gene expression profile refers to the seven-dimensional vector of the gene's expression, as found in the Unigene database [15]. We focus our attention of two sets of genes: (i) a master set of more than 1,700 genes found to be mutated in breast and colorectal cancer tissues in a famous study by Wood et al. [16], and (ii) a set of nearly 1,800 genes consisting of all genes that have been tested for mutations in lung cancer in the COSMIC database [4], and have a developmental gene expression profile in the Unigene database [15]. An experimental study by a team led by the third author tested a set of 151 CAN-genes as identified in [16], [12] and identified a subset of 65 hits that resulted in cell proliferation; the rest were classified as misses. The challenge is to reproduce these results at a high level of significance using a classification approach. Using the K-means algorithm, the seven-dimensional expression profile vectors for 1,799 genes were grouped into two clusters, which were properly separated as indicated by a silhouette value of 0.37. The first cluster contained 15 hits and 8 misses out of a total of 626 genes, while the second cluster contained 13 hits and 20 misses out of a total 1,173 genes. The null hypothesis that the known hits and misses occur in equal proportions in both clusters can be rejected at a 1.56% level, while the null hypothesis that both clusters contain an equal proportion of hits can be rejected at a 0.89% level. In short, clustering based on developmental gene expression level provides quite significant discrimination between known experimental outcomes. Going forward, further experiments need to be performed to verify that indeed the first cluster does contain more hits than the second cluster. Also, the approach needs to be extended to other forms of cancer.

Original language	English (US)
Title of host publication	Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011
Pages	116-120
Number of pages	5
DOIs	https://doi.org/10.1109/LISSA.2011.5754170
State	Published - May 23 2011
Event	2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011 - Bethesda, MD, United States Duration: Apr 7 2011 → Apr 8 2011

Publication series

Name	Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011

Other

Other	2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011
Country/Territory	United States
City	Bethesda, MD
Period	4/7/11 → 4/8/11

ASJC Scopus subject areas

Life-span and Life-course Studies

Access to Document

10.1109/LISSA.2011.5754170

Cite this

Singh, N. K., Vidyasagar, M., & White, M. A. (2011). Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile. In Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011 (pp. 116-120). Article 5754170 (Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011). https://doi.org/10.1109/LISSA.2011.5754170

Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile. / Singh, Nitin Kumar; Vidyasagar, M.; White, Michael A.
Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011. 2011. p. 116-120 5754170 (Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Singh, NK, Vidyasagar, M & White, MA 2011, Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile. in Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011., 5754170, Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011, pp. 116-120, 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011, Bethesda, MD, United States, 4/7/11. https://doi.org/10.1109/LISSA.2011.5754170

Singh NK, Vidyasagar M, White MA. Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile. In Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011. 2011. p. 116-120. 5754170. (Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011). doi: 10.1109/LISSA.2011.5754170

Singh, Nitin Kumar ; Vidyasagar, M. ; White, Michael A. / Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile. Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011. 2011. pp. 116-120 (Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011).

@inproceedings{15921ed168064794b8438ff87e64fade,

title = "Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile",

abstract = "In this paper we study the problem of predicting which genes are likely to have a role in tumor-suppression in lung and colorectal cancer. Mutation frequencies alone cannot serve to differentiate between drivers (mutations that cause cancer) and passengers (mutations that are caused by cancer) some other features must be added. Our hypothesis is that the developmental stage gene expression profile provides one such additional feature, that can potentially serve to differentiate between drivers and passengers. The developmental stage gene expression profile refers to the seven-dimensional vector of the gene's expression, as found in the Unigene database [15]. We focus our attention of two sets of genes: (i) a master set of more than 1,700 genes found to be mutated in breast and colorectal cancer tissues in a famous study by Wood et al. [16], and (ii) a set of nearly 1,800 genes consisting of all genes that have been tested for mutations in lung cancer in the COSMIC database [4], and have a developmental gene expression profile in the Unigene database [15]. An experimental study by a team led by the third author tested a set of 151 CAN-genes as identified in [16], [12] and identified a subset of 65 hits that resulted in cell proliferation; the rest were classified as misses. The challenge is to reproduce these results at a high level of significance using a classification approach. Using the K-means algorithm, the seven-dimensional expression profile vectors for 1,799 genes were grouped into two clusters, which were properly separated as indicated by a silhouette value of 0.37. The first cluster contained 15 hits and 8 misses out of a total of 626 genes, while the second cluster contained 13 hits and 20 misses out of a total 1,173 genes. The null hypothesis that the known hits and misses occur in equal proportions in both clusters can be rejected at a 1.56% level, while the null hypothesis that both clusters contain an equal proportion of hits can be rejected at a 0.89% level. In short, clustering based on developmental gene expression level provides quite significant discrimination between known experimental outcomes. Going forward, further experiments need to be performed to verify that indeed the first cluster does contain more hits than the second cluster. Also, the approach needs to be extended to other forms of cancer.",

author = "Singh, {Nitin Kumar} and M. Vidyasagar and White, {Michael A.}",

year = "2011",

month = may,

day = "23",

doi = "10.1109/LISSA.2011.5754170",

language = "English (US)",

isbn = "9781457704208",

series = "Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011",

pages = "116--120",

booktitle = "Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011",

note = "2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011 ; Conference date: 07-04-2011 Through 08-04-2011",

}

TY - GEN

T1 - Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile

AU - Singh, Nitin Kumar

AU - Vidyasagar, M.

AU - White, Michael A.

PY - 2011/5/23

Y1 - 2011/5/23

N2 - In this paper we study the problem of predicting which genes are likely to have a role in tumor-suppression in lung and colorectal cancer. Mutation frequencies alone cannot serve to differentiate between drivers (mutations that cause cancer) and passengers (mutations that are caused by cancer) some other features must be added. Our hypothesis is that the developmental stage gene expression profile provides one such additional feature, that can potentially serve to differentiate between drivers and passengers. The developmental stage gene expression profile refers to the seven-dimensional vector of the gene's expression, as found in the Unigene database [15]. We focus our attention of two sets of genes: (i) a master set of more than 1,700 genes found to be mutated in breast and colorectal cancer tissues in a famous study by Wood et al. [16], and (ii) a set of nearly 1,800 genes consisting of all genes that have been tested for mutations in lung cancer in the COSMIC database [4], and have a developmental gene expression profile in the Unigene database [15]. An experimental study by a team led by the third author tested a set of 151 CAN-genes as identified in [16], [12] and identified a subset of 65 hits that resulted in cell proliferation; the rest were classified as misses. The challenge is to reproduce these results at a high level of significance using a classification approach. Using the K-means algorithm, the seven-dimensional expression profile vectors for 1,799 genes were grouped into two clusters, which were properly separated as indicated by a silhouette value of 0.37. The first cluster contained 15 hits and 8 misses out of a total of 626 genes, while the second cluster contained 13 hits and 20 misses out of a total 1,173 genes. The null hypothesis that the known hits and misses occur in equal proportions in both clusters can be rejected at a 1.56% level, while the null hypothesis that both clusters contain an equal proportion of hits can be rejected at a 0.89% level. In short, clustering based on developmental gene expression level provides quite significant discrimination between known experimental outcomes. Going forward, further experiments need to be performed to verify that indeed the first cluster does contain more hits than the second cluster. Also, the approach needs to be extended to other forms of cancer.

AB - In this paper we study the problem of predicting which genes are likely to have a role in tumor-suppression in lung and colorectal cancer. Mutation frequencies alone cannot serve to differentiate between drivers (mutations that cause cancer) and passengers (mutations that are caused by cancer) some other features must be added. Our hypothesis is that the developmental stage gene expression profile provides one such additional feature, that can potentially serve to differentiate between drivers and passengers. The developmental stage gene expression profile refers to the seven-dimensional vector of the gene's expression, as found in the Unigene database [15]. We focus our attention of two sets of genes: (i) a master set of more than 1,700 genes found to be mutated in breast and colorectal cancer tissues in a famous study by Wood et al. [16], and (ii) a set of nearly 1,800 genes consisting of all genes that have been tested for mutations in lung cancer in the COSMIC database [4], and have a developmental gene expression profile in the Unigene database [15]. An experimental study by a team led by the third author tested a set of 151 CAN-genes as identified in [16], [12] and identified a subset of 65 hits that resulted in cell proliferation; the rest were classified as misses. The challenge is to reproduce these results at a high level of significance using a classification approach. Using the K-means algorithm, the seven-dimensional expression profile vectors for 1,799 genes were grouped into two clusters, which were properly separated as indicated by a silhouette value of 0.37. The first cluster contained 15 hits and 8 misses out of a total of 626 genes, while the second cluster contained 13 hits and 20 misses out of a total 1,173 genes. The null hypothesis that the known hits and misses occur in equal proportions in both clusters can be rejected at a 1.56% level, while the null hypothesis that both clusters contain an equal proportion of hits can be rejected at a 0.89% level. In short, clustering based on developmental gene expression level provides quite significant discrimination between known experimental outcomes. Going forward, further experiments need to be performed to verify that indeed the first cluster does contain more hits than the second cluster. Also, the approach needs to be extended to other forms of cancer.

UR - http://www.scopus.com/inward/record.url?scp=79956095837&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79956095837&partnerID=8YFLogxK

U2 - 10.1109/LISSA.2011.5754170

DO - 10.1109/LISSA.2011.5754170

M3 - Conference contribution

AN - SCOPUS:79956095837

SN - 9781457704208

T3 - Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011

SP - 116

EP - 120

BT - Proceedings of the 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011

T2 - 2011 IEEE/NIH Life Science Systems and Applications Workshop, LiSSA 2011

Y2 - 7 April 2011 through 8 April 2011

ER -

Predicting tumor-suppressing genes in cancer via clustering the developmental stage gene expression profile

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this