PLS_Toolbox Documentation: ipls< hline jmlimit >

ipls

Purpose

Interval PLS variable selection.

Synopsis

 

[bestuse,bestfit,bestlvs,intervals] = ipls(X,Y,int_width,maxlv,options)

Description

Performs forward or reverse selection of variable windows based on the RMSECV obtained for each individual window ("intervals") of variables. Multiple windows can also be selected iteratively.

Inputs are (X,Y) the X and Y data, (int_width) the interval i.e. window width in variables and (maxlv) the maximum number of latent variables to use in any model. Note that excluding a variable in X will prevent it from being used in any model.

If options.plots is 'final', a plot is given of the minimum RMSECV versus window center. Windows which were used are indicated in blue, windows which were excluded are indicated in red. The number of latent variables (LVs) used to assess each interval (the model size that gives the indicated RMSECV) is shown at the bottom of each interval's bar, inside the axes. The best RMSECV that can be obtained using all intervals is shown as a dashed red line (all-interval RMSECV). The number of LVs used in this model is shown on the right of the axes. If this number of LVs (all-interval model) is different from the number used for the best model of the selected interval(s) (selected-interval model) then a dashed magenta line will indicate the RMSECV obtained when using all intervals but at the selected-interval model size. The mean sample is superimposed on the plot for reference.

INPUTS:

                         X =   X-block,

                         Y =   Y-block, and

         int_width =   the interfal (window width in variables)

                 maxlv =   the maximum number of latent variables to use in any model.

                                

NOTE that excluding a variable in X will prevent it from being used in any model.


OUTPUTS:

                 model =   standard model structure (see: MODELSTRUCT) with the following fields:

       bestuse:  the final selected indices which gave the best model,

       bestfit:  the RMSECV for the selected indicies,

       bestlvs:   the number of latent variables which gives the best fit,

     intervals:   a matrix containing the indicies used for each interval.

Options

             options =   options structure containing the fields:

              display:  [ 'off' | {'on'} ], governs level of display to command window,

                  plots:  [ 'none' | {'final'} ], governs level of plotting,

                    mode:   [{'forward'} | 'reverse' ] Defines action to be performed with each interval.

                                 'forward' mode: the RMSECV calculated for each interval represents how well the y-block can be predicted using ONLY the variables included in the interval.

                                 'reverse' mode: the RMSECV calculated for each interval represents how well the y-block can be predicted when the given interval of variables are removed from the range of included X variables.

                                 NOTE that excluding a variable in X will prevent it from being used in any model.

    numintervals:  { [1] } Number of intervals to select or remove. If (num_intervals) is Inf, intervals are iteratively selected and added/removed until no improvement in RMSECV is observed.

              mustuse:  [ ] A vector of variable indices which MUST be used in all models. These variables will always be included in any model, whether or not they are included in the current interval.

            stepsize:  [ ] Distance between interval centers. An empty matrix gives the default spacing in which intervals do not overlap (stepsize = int_width).

  preprocessing:  defines preprocessing and can be one of the following:

                                 (a) One of the following strings:

                                 'none'  : no preprocessing  {default}

                                 'meancenter' : mean centering

                                 'autoscale'  : autoscaling

                                 (b) A single preprocessing structure defined using the function

                                 preprocess. The same preprocessing structure will be used on both

                                 the X and Y blocks.

                                 (c) A cell containing two preprocessing structures {pre pre} one for

                                 the X block and one for the Y block.

                      cvi:  {'vet' [ ] 5} Three element cell indicating the cross-validation leave-out settings to use {method splits iterations}. For valid modes, see the "cvi" input to crossval. If splits (the second element in the cell) is empty, the square root of the number of samples will be used. cvi can also be a vector (non-cell) of indices indicating leave-out groupings (see crossval for more info).

See Also

gaselctr, genalg


< hline jmlimit >