Exteriorpts
Contents |
Purpose
Finds pts on the exterior of a normalized data space.
Synopsis
- [isel,loads] = exteriorpts(x,ncomp,options)
Description
Given a two-way or higher-order data set (X), the most exterior samples or variables are identified and their indices returned.
For a two-way data set, the data (X) are assumed to be modelable as: X = CS' + E
The following is how it works. First, note that non-negative data all lie in a multivariate analog of the upper right hand quadrant and that, given sufficient selectivity in the data, the pure-component spectra (a.k.a. end-members) must lie at the exterior of the data cloud.
- A) First take a 1 norm of all the data which constrains the responses to a hyper-plane and
- B) remove data points with low norm (and most likely to be affected by noise). [see options.minnorm] (An alternative is to add a small offset to all the data to 'push them' towards the center of the data cloud.)
At this point the data are transformed from looking like a "snow-cone" with it's point at the origin to looking like a "hyper-pyramid" with the end-members corresponding to the corners.
- C) Next, the 1-normed data are mean-centered so that the hyper-plane has a center at [0,0,...]. This procedure transforms the problem from finding points on the exterior of a data cloud to finding points at the vertices of a hyper-polygon which is done using the DISTSLCT function (called from EXTERIORPTS).
Inputs
- x = MxN matrix.
- ncomp = number of components to extract.
Optional Inputs
- options = a standard options structure containing one or more of the fields discussed in the Options section below.
Outputs
- isel = if selectdim option was non-empty, isel is a vector of the selected indices. Otherwise, isel is a cell array with the indices selected on each mode of the data.
- loads = cell array with extracted pts/factors. Modes other than selectdim are determined via projection.
Options
options = a structure array with the following fields:
- selectdim: [1] mode of the data from which items should be selected (i.e. 1=rows, 2=columns, ...) If empty [], all modes are analyzed and the mode with the largest sum-squared captured value is used.
- waitbar: [ 'off' | 'on' | {'auto'} ] governs of waitbar while processing. 'auto' uses waitbar only if multiple modes are being analyzed with nway data.
- minnorm: [ 0.03 ] approximate noise level, points with unit area smaller than this (as a fraction of the maximum value in x) are ignored during selection.
- usepca: [{'no'}| 'yes' ] governs use of PCA as a pre-filtering step on the data prior to selection.
- usennls: [{'no'}| 'yes' ] governs use of non-negative least squares when calculating loadings for other-than-sample modes. Only used when (loads) output is requested.
- distmeasure: [ {'Euclidian'} | 'Mahalanobis' ] Governs the type of distance measurement to use. Mahalanobis requires the usepca option to be 'yes'.
- samplemode: [ 1 ] mode that contains variance (factors for other modes are normalized to unit 2-norm). Only used when loads output is requested.