ipls
Purpose
IPLS Interval PLS and forward/reverse MLR variable selection.
Synopsis
results = ipls(X,Y,int_width,maxlv,options)
results = ipls(X,Y,int_width,maxlv,numintervals,options)
[use,fit,lvs,intervals,intcv,intlv] =
ipls(X,Y,int_width,maxlv,options)
Description
Performs forward or reverse selection of variable windows
based on the RMSECV obtained for each individual window ("intervals")
of variables. Multiple windows can also be selected iteratively by modifying
the options.numintervals options. The "algorithm" option allows this
function to behave as an IPLS or IPCR algorithm or a forward/reverse MLR
variable selection algorithm. The default is PLS but options.algorithm = 'mlr'
changes to MLR mode. See other options below.
Inputs are (X,Y) the X and Y data, (int_width) the interval
i.e. window width in variables and (maxlv) the maximum number of latent
variables to use in any model (maxlv has no impact if options.algorithm =
'mlr'). Note that excluding a variable in X will prevent it from being used in
any model.
If options.plots is 'final', a plot is given of the minimum
RMSECV versus window center. Windows which were used are indicated in blue,
windows which were excluded are indicated in red. The number of latent
variables (LVs) used to assess each interval (the model size that gives the
indicated RMSECV) is shown at the bottom of each interval's bar, inside the
axes. The best RMSECV that can be obtained using all intervals is shown as a
dashed red line (all-interval RMSECV). The number of LVs used in this model is
shown on the right of the axes. If this number of LVs (all-interval model) is
different from the number used for the best model of the selected interval(s)
(selected-interval model) then a dashed magenta line will indicate the RMSECV
obtained when using all intervals but at the selected-interval model size. The
mean sample is superimposed on the plot for reference.
INPUTS:
X = X-block,
Y = Y-block, and
int_width
= the interval (window width in variables)
maxlv = the maximum number of latent variables to use in any
model.
NOTE that excluding a variable in X will prevent it from
being used in any model.
OUTPUTS:
When a single output is requested, the output is a structure
with the following fields:
use: the
final selected indices which gave the best model,
fit: the
RMSECV for the selected indicies,
lvs: the
number of latent variables which gives the best fit,
intervals: a
matrix containing the indicies used for each interval.
intcv: the
RMSECV in the last selection cycle for all intervals (these values were used to
select the last interval).
intlv: the
number of latent variables used in the model which gave the RMSECV values
returned in intcv.
Optionally, with multiple outputs, these vaiables will be
returned as single outputs (not in structure format) in the order shown above.
Options
options = options structure containing the fields:
display: [ 'off' | {'on'} ], governs level of display to
command window,
plots: [ 'none' | {'final'} ], governs level of
plotting,
mode: [{'forward'} | 'reverse' ] Defines action to be performed with
each interval.
'forward' mode: the RMSECV
calculated for each interval represents how well the y-block can be predicted
using ONLY the variables included in the interval.
'reverse' mode: the
RMSECV calculated for each interval represents how well the y-block can be
predicted when the given interval of variables are removed from the range of
included X variables.
NOTE that excluding a
variable in X will prevent it from being used in any model.
algorithm:
[{'pls'} | 'pcr' | 'mlr' ]
Defines regression algorithm to use. Selection is done for the specific
algorithm. Note that when MLR is used, input (int_width) is most often = 1
(single variable per window).
numintervals: {
[1] } Number of intervals to select or remove. If (num_intervals) is
Inf, intervals are iteratively selected and added/removed until no improvement
in RMSECV is observed. NOTE: this can also be set by passing as a scalar value
before, or in place of, the options structure. When passed this way, any value
passed in the options structure will be ignored.
mustuse: [ ] A vector of variable indices which MUST be used in
all models. These variables will always be included in any model, whether or
not they are included in the current interval.
stepsize: [ ] Distance between interval centers. An empty
matrix gives the default spacing in which intervals do not overlap (stepsize =
int_width).
preprocessing: defines
preprocessing and can be one of the following:
(a) One of the following
strings:
'none' : no
preprocessing {default}
'meancenter' : mean
centering
'autoscale' : autoscaling
(b) A single
preprocessing structure defined using the function
preprocess. The same
preprocessing structure will be used on both
the X and Y blocks.
(c) A cell containing two
preprocessing structures {pre pre} one for
the X block and one for
the Y block.
cvi: {'vet' [ ] 1} Three element cell indicating
the cross-validation leave-out settings to use {method splits iterations}. For
valid modes, see the "cvi" input to crossval. If splits (the second
element in the cell) is empty, the square root of the number of samples will be
used. cvi can also be a vector (non-cell) of indices indicating leave-out
groupings (see crossval for more info).
See Also
gaselctr, genalg