Solo Predictor Reference Manual

From Eigenvector Documentation Wiki

Introduction

Solo_Predictor, from Eigenvector Research, Inc. (EVRI) is a stand-alone model application engine which applies models created by PLS_Toolbox or Solo. Solo_Predictor features a simple and flexible scripting language, platform- and operating-system-independent interface, and an inherent distributed-computation design.

This documentation describes the setup and use of Solo_Predictor and explains the script language used to issue commands.

System Requirements

Solo_Predictor requires the following:

Operating system:

Windows 2000, XP, 2003 server, or Vista

MAC OS X (Intel only)

Linux (Intel only)

200 MB Disk Space
100 MB RAM (recommended)

Features and Supported Methods

Solo_Predictor is a prediction engine which supports importing of data and models from an external source, application of those models to the data, and retrieval of the values from the prediction. It supports predictions for all methods which produce standard model structures in PLS_Toolbox and Solo. This includes all methods in the Analysis GUI (including, but not limited to, PCA, PARAFAC, MCR, Purity, PLS, PCR, MLR, PLSDA, SIMCA), Calibration Transfer GUI, and any other PLS_Toolbox command-line functions which produce standard model structures.

Solo_Predictor also supports:

All preprocessing methods available in the custom Preprocessing GUI.
Missing data replacement (where supported by the model type)
Variable pre-alignment to model (handles resampling, extra variables, missing variables)
Importing all data types supported by the Analysis GUI and Workspace Browser including, but not limited to:

Comma-, tab-, space-, and other delimited text files (.csv, .dat)

X,Y… delimited files (.xy)

Excel spreadsheets (.xls, .xlst)

Thermo-Galactic SPC files (single and multifile formats) (.spc)

Hamilton Sundstrand files (.asf, .aif)

Horiba JY files (various)

JCAMP (simple single-record formats) (.jcamp .jdx)

XML (Eigenvector XML data format) (.xml)

Matlab .mat files (.mat)

Note that Solo_Predictor does not support execution of custom, user-defined MATLAB® scripts or commands. Such functionality requires a full MATLAB license. Please contact Eigenvector Research for more information on using Solo_Predictor in a MATLAB environment.

Solo_Predictor can be connected through a socket interface using TCP/IP, through an ActiveX or .NET object, or operate in a wait-for-file mode. It can send results to a client and/or write to an output file. Solo_Predictor also maintains a text-based log file to aid with diagnosis of problems.

Interface Specifications

In this description of the Solo_Predictor interface, the term "client" refers to a user-specified application which is requesting a prediction and the term "server" refers to Solo_Predictor. The client is often a distributed control system (DCS) or other data collection software (instrumentation software, etc) but can be any application which needs to apply a multivariate model to data. In general, the client issues one or more commands to Solo_Predictor either by passing data or by describing where data can be retrieved from. Additional commands are passed to instruct Solo_Predictor how to process that data and what results should be returned. See the Scripting Language section for details on the scripting language used for the instructions.

Introduction to Socket Interfaces

Solo_Predictor operates using standard TCP/IP (Transmission Control Protocol/Internet Protocol) communications over "socket" connections. Sockets are available on all operating system platforms (Windows, Mac, Linux) and are the same technology used in most Intranet and Internet communications including http, ftp, and other familiar inter-computer systems. They are also used for some "plug and play" hardware devices. Simply put, sockets are a general method to pass messages between two programs.

Although socket connections are most often used between computers, they can also be used when the client and server reside on the same computer (and even when that computer is not networked). When connecting two programs on the same computer, sockets are similar to other familiar inter-program communication systems (e.g. DDE or Active-X) with these added advantages:

Sockets are completely platform independent. The same communication methods are used on all operating systems and hardware. They can also be used across mixed operating systems and platforms (e.g. Windows to Linux.)
Most modern languages have some sort of provision for socket communication and require no proprietary technology to implement.
Socket technology allows the client and server to be located on the same computer or separate computers connected by a network. The identical software and setup are used in both cases. The only modification needed is to provide a remote IP address or name for the server. As a result, sockets also inherently allow for distributed computation.

The procedure of communication over sockets is well described in many places. The basic procedure is:

The client opens a socket connection between the client and server. This requires knowing the IP address of the server's computer (use "loopback" or "127.0.0.1" if the server and client are on the same computer) and the port number on which the server is "listening."
The client sends a command to the server. The end of the message is indicated when no additional characters are available.
The server receives the command and performs some operation.
The server returns a response to the client often containing either a simple acknowledgement of the message or possibly some additional data or results.
The socket connection is closed.

The messages passed to Solo_Predictor are passed in plain text, but the ability to pass XML to describe some more complicated data types also exists. The response from Solo_Predictor can be in any of a number of formats including plain text, XML, or HTML. In addition, Solo_Predictor also permits some standard HTTP-format (i.e. web browser-style) input and output messages. For more information on the message format, see the "Scripting Language" section in this manual.

See Appendix C Solo Predictor Example Connection Code for socket-connection coding examples.

End-of-Message Indicator Option

In some cases, a system has a high load (many programs running) or the messages being transferred are large. In these cases, the message transferred by the client may be broken up into smaller pieces. This may cause Solo_Predictor to believe the message is complete before it has received the entire message. In these cases, Solo_Predictor can be told to expect an end-of-message (EOM) character or string (e.g. "[EOM]") and it will wait to process a message until it sees that string arrive. See Incoming Message Format and Timeout Settings for how to set an EOM string.

POST Protocol Option

Solo_Predictor also accepts the common HTTP POST protocol for incoming messages. This format specifies the expected length of the message and, thus, allows messages to be split into segments because Solo_Predictor will not process the entire message until the received message is that length. See this external page for a simple example of the POST protocol format. Although standard POST format allows specification of different content types, the only Content-Type header which Solo_Predictor currently supports is text/plain. The following gives an example of a valid POST message for Solo_Predictor:

  POST . HTTP/1.0
  Content-Length: 15
  Content-Type: text/plain

  data='[1 2 3]';

Also note that outgoing messages from Solo_Predictor are never "chunked" (split into several pieces) nor do they ever use the POST format.

ActiveX and .NET Interfaces

For client applications which cannot or do not want to use sockets, Solo_Predictor provides both an ActiveX and .NET suite of objects called EigenvectorTools which can communicate with Solo_Predictor without the client having to implement socket interface code. EigenvectorTools must be installed on the same computer as the client application, but Solo_Predictor can still be located on the same computer or on a separate computer (if the socket option is used). Please note that EigenvectorTools are only available on Windows. Other platforms must use Sockets to communicate with Solo_Predictor.

For information on using EigenvectorTools, see the help page EigenvectorTools. Note that although the EigenvectorTools page makes reference to accessing graphical user interfaces (GUIs), Solo_Predictor does not allow access to the GUIs. Only the creation of data objects and application of models.

Wait-For-File Interface

Solo_Predictor also offers a basic wait-for-file method of interface. This feature is designed for compatibility with legacy systems which may not offer flexible interfacing and allows a client to trigger an analysis by simply dropping a readable file into a specified folder. Solo_Predictor can be configured to write a response file for the client to read the results of the analysis. For more information on this option, see the Installation and Configuration section and the Script Construction section.

Single- and Multi-Client Servers

Upon starting up, Solo_Predictor will automatically identify itself ("imprint") with the first client computer that makes contact with it. After imprinting, only that computer will be able to send commands to the server. This is true if the client and server are on the same computer, or on separate computers. Solo_Predictor can only be reset to respond to another client by restarting the server.

Some licenses will permit more than one client computer to access the predictor simultaneously. Thus, a single predictor can be installed on a centrally-located, networked computer and serve a number of clients on different computers (or multiple clients on the same local computer). Note that although multiple clients can make connections and request predictions, the following conditions are put into place:

Each client normally has its own workspace to store data and results. That is, one client cannot normally access the workspace of other clients. This can be disabled if, for example, multiple clients are contributing to the data used to make a prediction or when a remote client will be used to interrogate the workspace of another client. See the Installation and Configuration section for more information on workspace options.
In order to assure the fastest response for a given client, Solo_Predictor will only execute one client's request at a time.

Please contact Eigenvector Research, Inc. for more information on multi-client licenses.

Installation and Configuration

The following section describes the options available for configuring Solo_Predictor.

Installation

Solo_Predictor is packaged in several different ways depending on the platform on which it is being installed. Follow the instructions provided with the downloaded software to install on the appropriate platform.

Solo_Predictor is typically run by a start-up process so that it is always available, however, it can also be started "on-demand" by simply executing the Solo_Predictor file or shortcut (again depending on the operating system). The options for stopping or restarting the server depend on the configuration of the Status Window (see below).

Normally, upon startup, Solo_Predictor will prompt the user for a license code. This prompt can avoid by modifying the configuration file. See details below about adding the license code to the configuration file.

The server's IP address depends on the local network setup. If the client is running on the same system as Solo_Predictor, then the loopback address (127.0.0.1) can be used for both client and server. The port number is configured as described below.

If, however, Solo_Predictor is on a different computer than the client, the client must make a connection into the computer running Solo_Predictor. Normally this is done by IP address but most sockets provide some means for looking up an IP address based on the computer name. If dynamic IP addresses are being used, it is recommended that the Solo_Predictor computer be set up with a preference for a given IP address. However, if the IP address does get changed, the client will need to be pointed to the new address.

For more information on programming socket connections, see Appendix C: Solo_Predictor_Example_Connection_Code.

Configuration

All configuration of Solo_Predictor is accomplished through the defaults.xml file which is located in the program's main folder. This XML file contains a number of tags which can be edited by the user. Note that changes in this file will not be read by Solo_Predictor until the server is stopped and restarted.

The tags within the <socketserver> tag control the server settings. In each case, an options value is provided using standard XML notation:

 <optionname>value</optionname>

In addition, inside each opening tag, several attributes are set:

 <optionname class="numeric" size="[1,1]">1</optionname>

The "class" attribute should not be changed from the given value. The "size" attribute is informational only and can be omitted.

The following are the user-modifiable options. The expected class attribute is included in parentheses.

License Code

At the bottom of the configuration file is the licensecode tag which is empty in a new installation. Entering a code into this tag allows Solo_Predictor to start up without asking the user for the code. The license code, provided by Eigenvector Research, can be added to the file by simply entering it between the <licensecode></licensecode> tags. The next time the server is restarted, the code (if valid) will be used and Solo_Predictor will not prompt for a code. If the license code is a demonstration code and expires, or if the code is invalid, Solo_Predictor will display a dialog indicating the error when it starts up.

Status Window and Controls Options

These options control functionality of the Solo_Predictor status window.

controls (class="string"): Manages the display and functionality of the status window. Valid settings include:
- none: no status window will be given and all controls are hidden.
- status: status window is shown, but all server controls are disabled.
- limited: status window is shown and only the "restart" control is enabled.
- full: status window is shown and all controls (stop/start/restart/exit) are enabled.

Except when "full" settings are used, the only means to stop and/or restart the server is by using operating-system-specific process kill commands ("Program Manager" in windows, the "Activity Monitor" in OS X, and the kill command in linux or unix). Default is "status".

max_screen_lines (class="numeric"): Defines the total number of past message lines displayed on the (on-screen) status window. Default is 20 lines.
pulseperiod (class="numeric"): Defines the number of seconds between "pulse" messages in the status window. Default is 15 seconds.

Log File

These options control the log file and the level of detail and age of messages retained.

log_severity (class="numeric"): Defines the minimum message "severity" which will be reported in the log file (on disk). The level must be one of the following:

0 = log all messages

1 = log all startup, shutdown, rejected connection and fatal error messages

2 = log fatal error messages only

3 = log no messages (disable logging).

The default level is 1 (one).

max_log_size (class="numeric"): Defines the maximum log file size (in bytes). Solo_Predictor will discard old messages to keep the log file from exceeding this size. Default is 50000 (50 Kb).
logfile (class="string"): Gives the path and filename to use for the log file. By default, this is solo_pred.log in the user's temporary directory. The exact location of the temporary folder depends on the operating system. For example, this is usually:

Windows XP: \Documents and Settings\username\Local Settings\Temp.

Windows Vista: \Users\username\AppData\Local\Temp

Server Connection Options

These options control the behavior of the socket server and the kind of connections it will accept.

port (class="numeric"): Defines the computer port on which the socket server will respond to requests. This value should be changed with great care as some sockets are used by the operating system and other software. The default port value of 2211 is selected to minimize conflict between known port uses. Additional ports which might be of use include: 2210, 2212, and 2005. Contact Eigenvector Research for more information on valid ports.
loopbackonly (class="numeric"): If set to 1 (one), the server will only respond to a client which is located on the same computer as the server. All external requests will be ignored. A value of 0 (zero) will respond to any IP address (see also validip option). Default is 1 (one).
validip (class="cell"): Gives a list of valid IP addresses to which the server may respond. If empty, any IP address client is permitted to contact the server (unless the loopbackonly option is set to 1 (one)). Remember that the server is limited to a given number of clients (usually 1 (one)) and once it has been contacted by that many clients, it cannot respond to any other clients. This setting only limits the clients who can contact the server before it has imprinted on a given client.

The ip addresses must be supplied as separate items each inside a set of <td></td> tags with all <td> tags enclosed in a set of <tr> tags. For example:

<validip class="cell"> <tr> <td>10.0.0.1</td> <td>10.0.0.2</td> </tr> </validip>

privateworkspace (class="numeric"): If set to 1 (one), each client will have its own workspace to store objects and no client can access another client's objects. If set to 0 (zero), each client accesses the same workspace. A client may access and/or overwrite other client's objects. This may lead to unexpected results (if a given client expects a model to stay loaded but other clients are using the same object name and overwrite the model, for example). Default is one.

Incoming Message Format and Timeout Settings

eomstring (class="string") End Of Message character or string. If non-empty, this character or string must be passed to indicate end of message. The same string will be appended onto any messages returned by the server. The use of an EOM string allows Solo_Predictor to function on higher-load systems or with large messages where the entire contents of the message may not be queued and delivered all at once. See Introduction to Socket Interfaces for more information. It is best to set a string which is very unique and will never show up in a common message, for example: **EndOfMessage**
tickletimeout (class="numeric") Number of seconds of delay allowed between opening socket and getting first character. At timeout, before sending client a space character. Required to tickle some clients into responding.
emptytimeout (class="numeric") Number of seconds of delay allowed before receiving first message from client. At timeout, throws an empty packet message.
eomtimeout (class="numeric") Number of seconds after which no more characters received indicates an end-of-message (generally for use with POST messages and EOMSTRING messages only).

Wait-For-File Options

Wait for file options control the optional Solo_Predictor wait-for-file engine. This engine will watch a given folder for a new file (with an optional specific file type). When a new file appears, the file will be automatically loaded as the object "data" and a specific script (stored in a disk file) will be executed. This script can use a :writefile command (see Script Construction section) to store results of an analysis in an output file.

waitforfile (class="string"): either "on" or "off" (the default). When "on", the wait-for-file functionality is enabled (although the waitfolder and waitscript must also be non-empty strings for wait-for-file to operate).
waitfolder (class="string"): defines the folder (local or networked) in which Solo_Predictor should look for new files.
waitfilespec (class="string"): defines the file specifications (if any) to which the wait-for-file should be limited. For example, waitfilespec = "*.dat" will only recognize .dat files appearing in the wait folder.
waitscript (class="string"): defines the filename containing the script to execute when a new file is found. This option must contain the entire path to the file. Note that the indicated script should expect to find the loaded data in the object named "data" in the current workspace.

Output Format Options

default_format (class="string"): Defines the default response format. This is the output format used by the server if no format type is included in the request script. Valid types are: "xml", "plain" or "html". See Scripting Language for more information on these formats. Default is "xml".

writefilefolder (class="string"): Defines the top-level folder to which writefile is allowed to write. Writefile command can ONLY write to this folder and any sub-folders of it. Empty string for writefilefolder = writefile is NOT permitted at all.

Script Construction

Solo_Predictor provides a simple, flexible scripting language with which clients can send instructions to load data, apply a model to that data ("make a prediction"), and retrieve results. For details, see the page: Solo_Predictor Script Construction

Appendices

The following additional information is available about using Solo_Predictor:

Appendix A: DataSet XML Format - describes XML format used to create DataSet objects.
Appendix B: Solo_Predictor Script Commands Summary
Appendix C: Solo_Predictor Example Connection Code|