![]() |
![]() |
|
PLS_Toolbox Documentation: crossval | < crosscor | datahat > |
crossval
Purpose
Cross-validation for PCA, PLS, MLR, and PCR.
Synopsis
results = crossval(x,y,rm,cvi,ncomp,options)
[press,cumpress,rmsecv,rmsec,cvpred,misclassed] = crossval(x,y,rm,cvi,ncomp,options)
Description
CROSSVAL performs cross-validation for linear regression (PCR, PLS, MLR) and principal components analysis (PCA). Inputs are the predictor variable matrix x, predicted variable y (y is empty [] for rm = 'pca'), regression method rm, cross-validation method cvi, and maximum number of latent variables / components ncomp.
rm = 'pca' performs cross-validation for PCA,
rm = 'mlr' performs cross-validation for MLR,
rm = 'pcr' performs cross-validation for PCR,
rm = 'nip' performs cross-validation for PLS using NIPALS,
rm = 'sim' performs cross-validation for PLS using SIMPLS, and
rm = 'lwr' performs cross-validation for LWR.
cvi can be 1) a cell containing one of the cross-validation methods below with the appropriate parameters {cvm splits iter}, or 2) a vector representing user-defined cross-validation groups.
cvi = {'loo'}; leave-one-out cross-validation,
cvi = {'vet' splits}; venetian blinds (every n-th sample together),
cvi = {'con' splits}; contiguous blocks, and
cvi = {'rnd' splits iter}; random subsets.
Except for leave-one-out, all methods require the number of data splits splits to be provided. Random data subsets ('rnd') also requires number of iterations iter.
For user-defined cross-validation, cvi is a vector with the same number of elements as x has rows (i.e. length(cvi) = size(x,1); when x is class "double", or length(cvi) = size(x.data,1); when x is class "dataset") with integer elements, defining test subsets. Each cvi(i) is defined as:
cvi(i) = -2 the sample is always in the test set,
cvi(i) = -1 the sample is always in the calibration set,
cvi(i) = 0 the sample is always never used, and
cvi(i) = 1,2,3 defines each subset.
Optional input options is an options structure containing one or more of the following fields:
Outputs are the predictive residual error sum of squares (PRESS) press for each subset, the cumulative PRESS cumpress, the root mean square error of cross validation RMSECV rmsecv, the root mean square error of calibration RMSEC rmsec, the cross-valiated predictions for the y-block (if any) cvpred, and the fractional misclassifications misclassed. Misclassifications are only reported if the y-block is a logical (ie. discrete classes) vector. When options.plots is not 'none' the routine also plots both RMSECV and RMSEC.
Examples
[press,cumpress] = crossval(x,y,'nip',{'loo'},10);
[press,cumpress] = crossval(x,y,'pcr',{'vet',3},10);
[press,cumpress] = crossval(x,y,'nip',{'con',5},10);
[press,cumpress] = crossval(x,y,'sim',{'rnd',3,20},10);
pre = {preprocess('autoscale') preprocess('autoscale')};
opts.preprocessing = pre;
opts.plots = 'none';
[press,cumpress] = crossval(x,y,'sim',{'rnd',3,20},10,opts);
[press,cumpress] = crossval(x,[],'pca',{'loo'},10);
[press,cumpress] = crossval(x,[],'pca',{'vet',3},10);
[press,cumpress] = crossval(x,[],'pca',{'con',5},10);
See Also
pca, pcr, pls, preprocess, ncrossval
< crosscor | datahat > |