TY - JOUR
T1 - Stochastic cross validation
AU - Xu, Lu
AU - Fu, Hai Yan
AU - Goodarzi, Mohammad
AU - Cai, Chen Bo
AU - Yin, Qiao Bo
AU - Wu, Ya
AU - Tang, Bang Cheng
AU - She, Yuan Bin
N1 - Funding Information:
Authors are grateful to the financial support from the National Natural Science Foundation of China (Grants nos. 21665022 , 21776321 , 21576297 , 21706233 , 21476270 ), Key Projects of Technological Innovation of Hubei Province ( 2016ACA138 ), and the Open Research Program (nos. 2015ZD001 , 2015ZD002 and 2015ZY006 ) from the Modernization Engineering Technology Research Center of Ethnic Minority Medicine of Hubei province (South-Central University for Nationalities). Lu Xu is financially supported by Tongren Culture, Science and Technology Industry Innovation Research Center (no. 171172 ), the China Postdoctoral Science Project (no. 2016M602719 ) and Guizhou Provincial Science and Technology Department (no. QKHJC[2017]1186 ).
Publisher Copyright:
© 2018
PY - 2018/4/15
Y1 - 2018/4/15
N2 - Cross validation (CV) is by far one of the most commonly used methods to estimate model complexity for partial least squares (PLS). In this study, stochastic cross validation (SCV) was proposed as a novel CV strategy, where the percent of left-out objects (PLOO) was defined as a changeable random number. We proposed two SCV strategies, namely, SCV with uniformly distributed PLOO (SCV-U) and SCV with normally distributed PLOO (SCV-N). SCV-U is actually a hybrid of leave-one-out CV (LOOCV), k-fold CV and Monte Carlo CV (MCCV). The rationale behind SCV-N is that the probability of large perturbations of the original training set will be small. SCV is expected to provide more flexibility for data splitting to explore and learn from the data set and evaluate internally a built model. SCV-U and SCV-N were used for PLS calibrations of three real data sets as well as a simulated data set and they were compared with LOOCV, k-fold CV and MCCV. Given a training and external validation set, different CV techniques were repeatedly used to evaluate the optimal model complexity and the prediction results were compared. The results indicate that SCV-U and SCV-N could provide useful alternatives to the traditional CV methods and SCV is less sensitive to the values of PLOO.
AB - Cross validation (CV) is by far one of the most commonly used methods to estimate model complexity for partial least squares (PLS). In this study, stochastic cross validation (SCV) was proposed as a novel CV strategy, where the percent of left-out objects (PLOO) was defined as a changeable random number. We proposed two SCV strategies, namely, SCV with uniformly distributed PLOO (SCV-U) and SCV with normally distributed PLOO (SCV-N). SCV-U is actually a hybrid of leave-one-out CV (LOOCV), k-fold CV and Monte Carlo CV (MCCV). The rationale behind SCV-N is that the probability of large perturbations of the original training set will be small. SCV is expected to provide more flexibility for data splitting to explore and learn from the data set and evaluate internally a built model. SCV-U and SCV-N were used for PLS calibrations of three real data sets as well as a simulated data set and they were compared with LOOCV, k-fold CV and MCCV. Given a training and external validation set, different CV techniques were repeatedly used to evaluate the optimal model complexity and the prediction results were compared. The results indicate that SCV-U and SCV-N could provide useful alternatives to the traditional CV methods and SCV is less sensitive to the values of PLOO.
KW - Cross validation (CV)
KW - Model complexity
KW - Multivariate calibration
KW - Partial least squares (PLS)
KW - Stochastic cross validation (SCV)
UR - http://www.scopus.com/inward/record.url?scp=85042380942&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85042380942&partnerID=8YFLogxK
U2 - 10.1016/j.chemolab.2018.02.008
DO - 10.1016/j.chemolab.2018.02.008
M3 - Article
AN - SCOPUS:85042380942
SN - 0169-7439
VL - 175
SP - 74
EP - 81
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
ER -