gaselctr
Purpose
Genetic algorithm for variable selection with PLS.
Synopsis
model = gaselctr(x,y,options)
[fit,pop,avefit,bstfit] = gaselctr(x,y,options)
options = gaselctr('options')
Description
GASELCTR
uses a genetic algorithm optimization to minimize cross validation error for
variable selection.
INPUTS:
x = the predictor block (x-block), and
y = the predicted block (y-block) (note that all
scaling should be done prior to running GASELCTR).
Options
options = a structure array with the following fields:
name: 'options', name indicating that this is an options structure,
popsize: {64}
the population size (16<=np<=256
and np must be
divisible by 4),
maxgenerations: {100} the maximum number of
generations (25<=mg<=500),
mutationrate: {0.005} the mutation rate
(typically 0.001<=mt<=0.01),
windowwidth: {1} the number of variables
in a window (integer window width),
convergence: {50} percent of population
the same at convergence (typically cn=80),
initialterms: {30} percent terms included
at initiation (10<=bf<=50),
crossover: {2} breeding cross-over rule
(cr = 1: single
cross-over; cr = 2:
double cross-over),
algorithm: [ 'mlr' | {'pls'} ]
regression algorithm,
ncomp: {10} maximum number of latent variables for PLS
models,
cv: [ 'rnd' | {'con'} ] cross-validation option ('rnd': random subset
cross-validation; 'con':
contiguous block subset cross-validation),
split: {5} number of subsets to divide data into for cross-validation,
iter: {1} number of iterations for cross-validation at each
generation,
preprocessing: {[] []} a cell containing
standard preprocessing structures for the X- and Y-blocks respectively (see PREPROCESS),
reps: {1} the number of replicate runs to perform,
target: a two element vector [target_min target_max] describing the target range
for number of variables/terms included in a model n. Outside of this range, the penaltyslope option is applied
by multiplying the fitness for each member of the population by:
penaltyslope*(target_min-n)
when n<target_min,
or
penaltyslope*(n-target_max)
when n>target_max.
Field target is used to bias
models towards a given range of included variables (see penaltyslope below),
targetpct: {1} flag indicating if
values in field target
are given in percent of variables (1) or in absolute number of variables (0),
and
penaltyslope: {0} the slope of the penalty
function (see target
above).
The default options can be retreived using: options = gaslctr('options');.
OUTPUT:
model = a standard GENALG model
structure with the following fields:
modeltype: 'GENALG' This field will
always have this value,
datasource: {[1x1 struct] [1x1 struct]},
structures defining where the X- and Y-blocks came from
date: date stamp for when GASELCTR was run,
time: time stamp for when GASELCTR was run,
info: 'Fit results in "rmsecv", population included variables in
"icol"', information field describing where the fitness results
for each member of the population are contained,
rmsecv: fitness results for each member of the population, for
X MxN and Mp unique populations at convergence then rmsecv will be 1xMp,
icol: each row of icol corresponds to the variables used for that member
of the population (a 1 [one] means that variable was used and a 0 [zero] means
that it was not), for X MxN and Mp unique populations at
convergence then icol
will be MpxN, and
detail: [1x1
struct], a structure array containing model details including the
following fields:
avefit: the average fitness at each
generation,
bestfit: the best fitness at each generation,
and
options: a structure corresponding to the
options discussed above.
Examples
To use mean centering outside the genetic algorithm (no
additional centering will be performed within the algorithm) do the following:
x2
= mncn(x);
y2 = mncn(y);
[fit,pop] = gaselctr(x2,y2);
To use mean centering inside the genetic algorithm (centering
will be performed for each cross-validation subset) do the following:
options =
gaselctr('options');
options.preprocessing{1} = preprocess('default', 'mean
center');
options.preprocessing{2} = preprocess('default', 'mean
center');
[fit,pop] = gaselctr(x2,y2,options);
See Also
calibsel, genalg, genalgplot