ipls
Purpose
Interval PLS variable selection.
Synopsis
[bestuse,bestfit,bestlvs,intervals] =
ipls(X,Y,int_width,maxlv,options)
Description
Performs forward or reverse selection of variable windows
based on the RMSECV obtained for each individual window ("intervals")
of variables. Multiple windows can also be selected iteratively.
Inputs are (X,Y) the X and Y data, (int_width) the interval
i.e. window width in variables and (maxlv) the maximum number of latent
variables to use in any model. Note that excluding a variable in X will prevent
it from being used in any model.
If options.plots is 'final', a plot is given of the minimum
RMSECV versus window center. Windows which were used are indicated in blue,
windows which were excluded are indicated in red. The number of latent
variables (LVs) used to assess each interval (the model size that gives the
indicated RMSECV) is shown at the bottom of each interval's bar, inside the
axes. The best RMSECV that can be obtained using all intervals is shown as a
dashed red line (all-interval RMSECV). The number of LVs used in this model is
shown on the right of the axes. If this number of LVs (all-interval model) is
different from the number used for the best model of the selected interval(s)
(selected-interval model) then a dashed magenta line will indicate the RMSECV
obtained when using all intervals but at the selected-interval model size. The
mean sample is superimposed on the plot for reference.
INPUTS:
X = X-block,
Y = Y-block, and
int_width
= the interfal (window width in variables)
maxlv = the maximum number of latent variables to use in any
model.
NOTE that excluding a variable in X will prevent it from
being used in any model.
OUTPUTS:
model = standard model structure (see: MODELSTRUCT) with the following fields:
bestuse: the
final selected indices which gave the best model,
bestfit: the
RMSECV for the selected indicies,
bestlvs: the
number of latent variables which gives the best fit,
intervals: a
matrix containing the indicies used for each interval.
Options
options = options structure containing the fields:
display: [ 'off' | {'on'} ], governs level of display to
command window,
plots: [ 'none' | {'final'} ], governs level of
plotting,
mode: [{'forward'} | 'reverse' ] Defines action to be performed with
each interval.
'forward' mode: the
RMSECV calculated for each interval represents how well the y-block can be
predicted using ONLY the variables included in the interval.
'reverse' mode: the
RMSECV calculated for each interval represents how well the y-block can be
predicted when the given interval of variables are removed from the range of
included X variables.
NOTE that excluding a
variable in X will prevent it from being used in any model.
numintervals: {
[1] } Number of intervals to select or remove. If (num_intervals) is
Inf, intervals are iteratively selected and added/removed until no improvement
in RMSECV is observed.
mustuse: [ ] A vector of variable indices which MUST be used in
all models. These variables will always be included in any model, whether or
not they are included in the current interval.
stepsize: [ ] Distance between interval centers. An empty
matrix gives the default spacing in which intervals do not overlap (stepsize =
int_width).
preprocessing: defines
preprocessing and can be one of the following:
(a) One of the following
strings:
'none' : no
preprocessing {default}
'meancenter' : mean
centering
'autoscale' :
autoscaling
(b) A single
preprocessing structure defined using the function
preprocess. The same
preprocessing structure will be used on both
the X and Y blocks.
(c) A cell containing two
preprocessing structures {pre pre} one for
the X block and one for
the Y block.
cvi: {'vet' [ ] 5} Three element cell indicating
the cross-validation leave-out settings to use {method splits iterations}. For
valid modes, see the "cvi" input to crossval. If splits (the second
element in the cell) is empty, the square root of the number of samples will be
used. cvi can also be a vector (non-cell) of indices indicating leave-out
groupings (see crossval for more info).
See Also
gaselctr, genalg