PLS_Toolbox Documentation: preprouser | < preprocess | purity > |
preprouser
Purpose
User defined items for preprocess catalog.
Synopsis
preprouser(fig)
Description
Each method available in the preprocess function has an associated 'methodname' such as those listed in the help for preprocess. Each method is defined using a preprocessing structure that contains all the necessary information to perform calculations for that method. The standard methods are defined in the preprocatalog file, which should not be edited by the user. Additional user-defined methods can be defined in the preprouser file and the following text describes how the user to add custom preprocessing methods. A few example methods already exist in the preprouser file to guide the user.
To add a custom user-defined preprocessing method, the user must 1) open the PREPROUSER.M file, 2) edit the file to create a structure with the fields described below, 3) after defining the structure add the line preprocess('addtocatalog',fig,usermethod), and 4) save and close the PREPROUSER.M file.
The line added in Step 3
preprocess('addtocatalog',fig,usermethod)
makes the new custom method available to PREPROCESS. The input usermethod is the preprocessing structure containing the user-defined method, and fig is a figure handle passed to preprouser by preprocess.
The methods defined in the preprocatalog and preprouser files are available to all functions making use of the preprocess function.
The fields in a preprocessing structure are listed here. Detailed descriptions and examples follow this list.
Detailed descriptions and examples for each field follow:
DESCRIPTION:
The description is a short (1-2 word) text string containing a description for the preprocessing method. The string will be displayed in the GUI and can also be used as a string keyword (see also keyword) to refer to this method.
Example:
pp.description = 'Mean Center';
CALIBRATE, APPLY, UNDO:
Each of these “command” fields contains a single cell consisting of a command string to be executed by PREPROCESS when performing calibration, apply, or undo operations (see command-line forms 2, 3, and 4 of PREPROCESS). Calibrate actions operate on original calibration data with the output parameters stored in the out field, whereas apply actions operate on new data using parameters stored in the out field as input(s). For methods which act on a single sample at a time, the calibrate and apply operations are often identical (for example, see the normalize example below). The undo action uses parameters stored in the out field as input(s) to remove preprocessing from previously preprocessed data. However, the undo action may be undefined for certain methods. If this is the case, the undo field should be an empty cell.
To assure that all samples (rows) in the data have been appropriately preprocessed, an apply command is automatically performed following a calibrate call. Note that excluded variables are replaced with NaN.
The command strings should be one or more valid Matlab commands, each separated by a semicolon ';' (e.g. see EVAL). Each command will be executed inside the PREPROCESS environment in which the following variables are available:
Several variables are available for use during command operations (calibarate, apply, and undo). However, these variables should not be changed by the commands and are considered “read-only”.
Examples:
The following calibrate field performs mean-centering on data, returning both the mean-centered data as well as the mean values which are stored in out{1}:
pp.calibrate = { '[data,out{1}] = mncn(data);' };
The following apply and undo fields use the scale and rescale functions to apply and undo the previously determined mean values (stored by the calibrate operation in out{1}) with new data:
pp.apply = { 'data = scale(data,out{1});' };
pp.undo = { 'data = rescale(data,out{1});' };
OUT:
The out field is a cell array that contains the output parameters returned during the calibration operation. For example, if the following commands are run
load wine
s = preprocess('default','autoscale');
[dp,sp] = preprocess('calibrate',s,wine);
then the out field of sp is a 1 by 2 cell array with the first cell, out{1}, containing the means of the variables in the dataset wine, and the second cell, out{2}, contains the standard deviations. These parameters are used in subsequent apply and undo commands. See the related field caloutputs. Prior to the calibration operation both the out and caloutputs fields are empty.
SETTINGSGUI:
The name of a graphical user interface (GUI) function that allows the user to set options for this method. The function is expected to take as its only input a standard preprocessing structure from which it should take the current settings. The function should output the same preprocessing structure modified to meet the user's specification. Typically, these changes are made to the userdata field and the commands in the calibrate, apply and undo fields use that field’s contents as input options.
The design of GUIs for selection of options is beyond the scope of this document and the user is directed to the following example files, both of which use GUIs to modify the userdata field of a preprocessing structure: autoset.m savgolset.m .
Example:
pp.settingsgui = 'autoset';
SETTINGSONADD:
The settingsonadd field contains a boolean (1=true, 0=false) value. If it is 1=true, then when the user adds the method in the PREPROCESS GUI, the method's settingsgui will be automatically invoked. If a method requires the user to make a selection of options, settingsonadd=1 will guarantee that the user has an opportunity to modify the options or at least choose the default settings.
Example:
pp.settingsonadd = 1;
USESDATASET:
The usesdataset field contains a boolean (1=true, 0=false) value.
If it is 1=true, the preprocessing method is capable of handling dataset objects and PREPROCESS will pass the data as a dataset. It is the responsibility of the function(s) called by the method to appropriately handle the dataset’s includ field.
If it is 0=false, the preprocssing method expects standard MATLAB classes (double, uint8, etc). PREPROCESS, which uses a dataset object internally to hold the data, will extract data from the dataset ojbect prior to calling this method. It will then reinsert the preprocessed data back into the dataset object after the method has been invoked.
Although excluded columns are never extracted and excluded rows are not extracted when performing calibration operations, excluded rows are passed when performing apply and undo operations.
Example:
pp.usesdataset = 0;
CALOUTPUTS:
For functions which require a calibrate operation prior to an apply or undo (see the fields: calibrate and out), this field indicates how many values are expected in the out field. For example, in the case of mean centering the mean values stored in the field out are required to apply or undo the operation. Initially, out is an empty cell ({}). Following the calibration operation for mean centering, it becomes a single-item cell (length of one). For other calibration operations out may be a cell of length greater than one.
By examining this cell’s length, PREPROCESS can determine if a preprocessing structure has already been calibrated and contains the necessary information. The caloutputs field, when greater than zero, indicates to PREPROCESS that it should test the out field prior to attempting an apply or undo.
Example: in the case of mean-centering, the length of out should be 1 (one) after calibration.
pp.caloutputs = 1;
KEYWORD:
The field keyword is a string that can be used to retrieve the default preprocessing structure for this method. When retrieving a structure by keyword, PREPROCESS ignores any spaces and is case-insensitive. The keyword field (or the description string, discussed above) can be used in place of any preprocessing structure in calibrate and default calls to preprocess:
pp = preprocess('default','meancenter');
Example:
pp.keyword = 'mncn';
USERDATA:
The field userdata contains additional user-defined data that can be changed during a calibration operation and retrieved for use in apply and undo operations. This field is often used to hold options for the preprocessing method which are then used by the commands in the calibrate, apply, and undo fields.
Example: in SAVGOL several input variables are defined with various method options, then they are assembled into a vector in userdata:
pp.userdata = [windowsize order derivative];
Examples
The following is the preprocessing structure used for sample normalization (see NORMALIZ). The calibrate and apply commands are identical and there is no information that is stored during the calibration phase, thus caloutputs is zero. There is no undo defined for this operation (this is because the normalization information required to undo the action is not being stored anywhere). The norm type (e.g. a 2-norm) of the normalization is set in userdata and is used in both calibrate and apply steps.
pp.description = 'Normalize';
pp.calibrate = {'data = normaliz(data,0,userdata(1));'};
pp.apply = {'data = normaliz(data,0,userdata(1));'};
pp.undo = {};
pp.out = {};
pp.settingsgui = 'normset';
pp.settingsonadd = 0;
pp.usesdataset = 0;
pp.caloutputs = 0;
pp.keyword = 'Normalize';
pp.userdata = 2;
The following is the preprocessing structure used for Savitsky-Golay smoothing and derivatives (see SAVGOL). In many ways this structure is similar to the normalize structure except that SAVGOL takes a dataset object as input and, thus, usesdataset is set to 1. Also note that because of the various settings required by savgol, this method uses of the settingsonadd feature to bring up the settings GUI as soon as the method is added.
pp.description = 'SG Smooth/Derivative';
pp.calibrate = {'data=savgol(data,userdata(1),userdata(2),userdata(3));'};
pp.apply = {'data=savgol(data,userdata(1),userdata(2),userdata(3));'};
pp.undo = {};
pp.out = {};
pp.settingsgui = 'savgolset';
pp.settingsonadd = 1;
pp.usesdataset = 1;
pp.caloutputs = 0;
pp.keyword = 'sg';
pp.userdata = [ 15 2 0 ];
The following example creates a preprocessing structure to invoke multiplicative scatter correction (MSC, see MSCORR) using the mean of the calibration data as the target spectrum. The calibrate cell here contains two separate operations. The first calculates the mean spectrum and the second performs the MSC. The third input to the MSCORR function is a flag indicating whether an offset should also be removed. This flag is stored in the userdata field so that the settingsgui (mscorrset) can change the value easily. Note that there is no undo defined for this function.
pp.description = 'MSC (mean)';
pp.calibrate = { 'out{1}=mean(data); data=mscorr(data,out{1},userdata);' };
pp.apply = { 'data = mscorr(data,out{1});' };
pp.undo = {};
pp.out = {};
pp.settingsgui = 'mscorrset';
pp.settingsonadd = 0;
pp.usesdataset = 0;
pp.caloutputs = 1;
pp.keyword = 'MSC (mean)';
pp.userdata = 1;
See Also
preprocatalog, preprocess
< preprocess | purity > |