Representative splitting cross validation

Lu Xu, Ou Hu, Yuwan Guo, Mengqin Zhang, Daowang Lu, Chen Bo Cai, Shunping Xie, Mohammad Goodarzi, Hai Yan Fu, Yuan Bin She

Research output: Contribution to journalArticlepeer-review

18 Scopus citations


Cross-validation (CV) is widely used to estimate model complexity or the number of significant latent variables (LVs) for multivariate calibration methods like partial least squares (PLS). A basic consideration when developing and validating multivariate calibration models is that both the training and validation sets should be representative and distributed in the experimental space as uniformly as possible. Motivated by this idea, we proposed a new CV method called representative splitting cross-validation (RSCV). In RSCV, firstly, the DUPLEX algorithm was used to sequentially divide the original training set into k (in this work, k = 2, 4, 8 and 16) equal parts. Secondly, a series of k-fold (k = 2, 4, 8 and 16) CVs were performed based on the above data splitting. Finally, the pooled root mean squared error of CV (RMSECV) was used to estimate model complexity. Five real multivariate calibration data sets were investigated and RSCV was compared with leave-one-out CV (LOOCV), 10-fold CV and Monte Carlo CV (MCCV). With a maximum k of 16, RSCV was shown to be a useful and stable method to select PLS LVs, and can obtain simpler models with acceptable computational burden.

Original languageEnglish (US)
Pages (from-to)29-35
Number of pages7
JournalChemometrics and Intelligent Laboratory Systems
StatePublished - Dec 15 2018


  • Cross-validation (CV)
  • Model complexity
  • Multivariate calibration
  • Partial least squares (PLS)
  • Representative splitting cross-validation (RSCV)

ASJC Scopus subject areas

  • Software
  • Analytical Chemistry
  • Process Chemistry and Technology
  • Spectroscopy
  • Computer Science Applications


Dive into the research topics of 'Representative splitting cross validation'. Together they form a unique fingerprint.

Cite this