Abstract
With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.
Original language | English (US) |
---|---|
Pages (from-to) | 523-540 |
Number of pages | 18 |
Journal | Statistica Sinica |
Volume | 16 |
Issue number | 2 |
State | Published - Apr 1 2006 |
Keywords
- Classification
- Microarray data
- Model mixing
- Partial least squares
- Prediction
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty