Advanced Preprocessing: Variable Centering
From Eigenvector Documentation Wiki
Introduction
Many preprocessing methods are based on the variance in the data. Such techniques should generally be provided with data which are centered relative to some reference point. Centering is generically defined as
where is a vector representing the reference point for each variable,
is a column-vector of ones, and
represents the centered data. Often the reference point is the mean of the data. Interpretation of loadings and samples from models built on centered data is done relative to this reference point. For example, when centering is used before calculating a PCA model, the resultant eigenvalues can be interpreted as variance captured by each principal component. Without centering, the eigenvalues include both variance and the sum-squared mean of each variable.
In most cases, centering and/or scaling (see next section) will be the last method in a series of preprocessing methods. When other preprocessing methods are being used, they are usually performed prior to a centering and/or scaling method.
Mean-Center
One of the most common preprocessing methods, mean-centering calculates the mean of each column and subtracts this from the column. Another way of interpreting mean-centered data is that, after mean-centering, each row of the mean-centered data includes only how that row differs from the average sample in the original data matrix.
In the Preprocessing GUI, this method has no adjustable settings. From the command line, this method is achieved using the mncn function.
For more information on the use of mean-centering, see the discussion on Principal Components Analysis in Chapter 5 of the Chemometrics Tutorial.
Median-Center
The median-centering preprocessing method is very similar to mean-centering except that the reference point is the median of each column rather than the mean. This is considered one of the "robust" preprocessing methods in that it is not as influenced by outliers (unusual samples) in the data.
In the Preprocessing GUI, this method has no adjustable settings. From the command line, this method is performed using the medcn function.