Home > Software > DataSet Object
DataSet Standard Data Object for use with MATLAB
Version 5.00
- Released July 11, 2007

The dataset object is a Matlab object written to be applicable to any data which requires storing axillary information along with the data itself. A typical set of data contains many parts. In spectroscopy, for instance, a data set can contain the matrix of spectra, the wavelength axis, sample labels, sample numbers, class variables, reference variables, etc. MATLAB supports a variety of data types such as double arrays, character arrays, structures and cell arrays that can accomodate these pieces. Until now, however, there has been no standard way to associate all the parts of a data set that go together, including the sample and variable labels, class variables, time and wavelength axes, etc. In order to facilite data set handling, Eigenvector has created a standard object, the DATASET Object (DSO). When added to a MATLAB installation, DataSet creates a new object in MATLAB that integrates all of the separate components associated with a data set into a single variable in the MATLAB workspace.

An example of the MATLAB command window after the DataSet files have been installed is shown to the right. Data consiting of three different variables has been loaded into the workspace. These include the data matrix dat, sample labels names and variable labels vars. This data can be combined into a single data object h using the commands shown. When displayed the separate parts of the data are listed. (The data object should look familiar to those who have been using MATLAB structures.)

Eigenvector Research is making the DataSet object freely available and hopes that MATLAB users everywhere will use it when writing routines that are data intensive. Existence of the object will greatly enhance the exchange of data sets, the translation of data sets between file types (such as JCAMP) and the handling of data sets within MATLAB. Of course, the PLS_Toolbox takes full advantage of the DataSet object and also provides several data import routines.

Changes in Version 5.0 -  This major revision of the DataSet Oject includes the addition of several new fields and functions (listed below). These additions both expand the functionality of the DataSet Object and fix incompatibility issues associated with The MathWorks recent addition of their own DataSet Object to the Stats Toolbox. Read more about the history of the DataSet Object here.

* Classid: Classes can now be assigned and returned as strings using the "classid" synonym field (used in place of "class" field). Internally, classes are still stored numerically, but you can now associate string descriptors with classes. These strings can be used to assign classes and can be retrieved as strings using .classid field of the DataSet object. In addition, you can access a "lookup table" of class numbers and their respective strings using x.classlookup{mode,set} command. These strings will be increasingly used by PLS_Toolbox and Solo and are already used by the main plotting interfaces.

* Allow subscripting into DSO using labels or class names. At the MATLAB command line, you can index into a DataSet object using the column or row (or n-dim) labels. For example, you can extract a single sample from a DataSet by just giving the name of the DataSet followed by .samplename where samplename is the label for the desired sample. An example using the "Arch" DataSet follows:

    load arch
    arch.s2    %extract SAMPLE "s2" from the arch DataSet object
    arch.k     %extract the VARIABLE "k" from the arch DataSet object

Note that this indexing will only work when labels do not contain spaces and when they do not conflict with other generic DataSet field names.

* Axistype: not used within PLS_Toolbox or Solo yet, but this field allows you to define an axis "type" to be one of {'none' 'continuous' 'stick' 'discrete'}. Future use includes automatic selection of plot style based on this setting.

* Added overload support for: disp  double  isempty  numel  single  sortrows  unique

To get the DataSet object: download the compressed ZIP file. You can also access the DataSet technical manual (requires Free Adobe Acrobat Reader).

The manual (which is also included in the ZIP file) explains the installation instructions. In short, the @dataset directory must be a sub-directory of a directory which is on the MATLAB path. The demo datasetdemo.m file can be placed anywhere on the path.

For more information on DataSet, please contact our helpdesk. Comments and ideas would be especially appreciated.

Eigenvector Research, Inc., 3905 West Eaglerock Drive, Wenatchee, WA 98801
B.M. Wise, bmw@eigenvector.com, Phone: 509.662.9213, Fax: 509.662.9214
N.B. Gallagher, nealg@eigenvector.com, Phone: 509.687.1039