DataSet Standard Data Object for use with MATLAB
Version 5.00 - Released
July 11, 2007
The dataset object is a Matlab object written to
be applicable to any data which requires storing axillary information
along with the data itself. A typical set of data contains many
parts. In spectroscopy, for instance, a data set can contain the
matrix of spectra, the wavelength axis, sample labels, sample numbers,
class variables, reference variables, etc. MATLAB supports a variety
of data types such as double arrays, character arrays, structures
and cell arrays that can accomodate these pieces. Until now, however,
there has been no standard way to associate all the parts of a data
set that go together, including the sample and variable labels,
class variables, time and wavelength axes, etc. In order to facilite
data set handling, Eigenvector has created a standard object, the
DATASET Object (DSO). When added to a MATLAB installation,
DataSet creates a new object in MATLAB that integrates all of the
separate components associated with a data set into a single variable
in the MATLAB workspace.
An example of the MATLAB command
window after the DataSet files have been installed is shown to the
right. Data consiting of three different variables has been loaded
into the workspace. These include the data matrix dat, sample
labels names and variable labels vars. This data can
be combined into a single data object h using the commands
shown. When displayed the separate parts of the data are listed.
(The data object should look familiar to those who have been using
MATLAB structures.)
Eigenvector Research is making the DataSet object freely
available and hopes that MATLAB users everywhere
will use it when writing routines that are data intensive. Existence
of the object will greatly enhance the exchange of data sets, the
translation of data sets between file types (such as JCAMP) and
the handling of data sets within MATLAB. Of course, the PLS_Toolbox
takes full advantage of the DataSet object and also provides several
data import routines.
Changes in Version 5.0
- This major revision of the DataSet Oject includes the addition
of several new fields and functions (listed below). These additions
both expand the functionality of the DataSet Object and fix incompatibility
issues associated with The MathWorks recent addition of their own
DataSet Object to the Stats Toolbox. Read more about the history
of the DataSet Object here.
* Classid: Classes can now be assigned and returned
as strings using the "classid" synonym field (used in place of "class"
field). Internally, classes are still stored numerically, but you
can now associate string descriptors with classes. These strings
can be used to assign classes and can be retrieved as strings using
.classid field of the DataSet object.
In addition, you can access a "lookup table" of class numbers and
their respective strings using x.classlookup{mode,set}
command. These strings will be increasingly used by PLS_Toolbox
and Solo and are already used by the main plotting interfaces.
* Allow subscripting into DSO using labels or
class names. At the MATLAB command line, you can index into a DataSet
object using the column or row (or n-dim) labels. For example, you
can extract a single sample from a DataSet by just giving the name
of the DataSet followed by .samplename where samplename is the label
for the desired sample. An example using the "Arch" DataSet follows:
load arch
arch.s2
%extract SAMPLE "s2" from the arch DataSet object
arch.k
%extract the VARIABLE "k" from the arch DataSet object
Note that this indexing will only
work when labels do not contain spaces and when they do not conflict
with other generic DataSet field names.
* Axistype: not used within PLS_Toolbox or Solo
yet, but this field allows you to define an axis "type" to be one
of {'none' 'continuous' 'stick' 'discrete'}. Future use includes
automatic selection of plot style based on this setting.
* Added overload support for: disp double
isempty numel single sortrows unique
To get the DataSet object: download the compressed
ZIP file. You can also access the DataSet
technical manual (requires Free Adobe
Acrobat Reader).
The manual (which is also included in the ZIP file)
explains the installation instructions. In short, the @dataset
directory must be a sub-directory of a directory which is on the
MATLAB path. The demo datasetdemo.m file can be placed anywhere
on the path.
For more information on DataSet, please contact
our helpdesk. Comments and ideas would be especially appreciated.
|