OS Profile for GPE

Dmitry N. Petrov, Intel Corporation

Version 0.3 December 5, 2007

Version history

Version and date	By whom	Changes
0.0 October 24, 2005	Dmitry N. Petrov	Document created
0.1 December 14, 2005	Dmitry N. Petrov	"Global" fields semantics
0.2 March 14, 2005	Dmitry N. Petrov	Reflect IDB refactoring
0.3 December 5, 2007	Dmitry N. Petrov	Reflect the changes in server configuration mechanisms

0. Preface

This document describes Operating System Profile (OS Profile) configuration file format used in GPE. It assumes basic knowledge of XML and basic understanding of how does GPE server work.

OS Profile allows Target System Service (TSS) to be completely generic, relying only on Java run-time environment and properties of the target system defined in OS Profile to perform required operations. OS Profile contains information on applications installed at the target system, available storages, and templates used to incarnate scripts and file paths.

OS Profiles are organised in hierarchies (processed by OS Profile Repository tools), so, for example, you can provide basic system specific information about how to create directories and list file properties on a Linux system in one file, and refer to it in another file containing information for specific target system, such as the list of installed applications.

NB!: platform specific information often contains characters that can be treated in some special way by the XML parser. For example, angle brackets that are often used to redirect stream input-output are treated as tag brackets. Use CDATA syntax to avoid such collisions.

For documentation purposes we use extracts from corresponding XML schemas.

Namespace prefix idb: is associated with the namespace http://gpe.intel.com/idb.

Namespace prefix unigrids: is associated with the namespace http://unigrids.org/2005/06/types.

Namespace prefix jsdl: is associated with the namespace http://schemas.ggf.org/jsdl/2005/06/jsdl.

Several sample configuration files are provided in GPE installation package (e.g., conf/sample_profile). If you read this document for the first time, you may find it useful to take a look at these files - they provide good practical examples.

We are always looking for feedback so if you have any comments or bug reports or are just interested in our development please register at our bug tracking system at SourceForge.

1. General OS Profile Structure

OS Profile structure is defined by the followin XML schema:

name identifies OS Profile as a part of the profiles hierarchy. Other profiles can extend it referring to it by this name.

extends identifies the parent OS Profile. When one profile exteds another profile, it can use all the entities (applications, storages, script templates) defined in parent profile. If child profile introduces its own entities with the same names, they override corresponding definitions taken from parent profile.

UspaceRoot is the absolute path to root directory for job working directories. This directory should be accessible by the users who run jobs on the given target system.

Delimiter is the file delimiter used by this target system. It is used to incarnate file paths.

Template elements represent templates used to incarnate scripts and file paths.

Storage elements represent file spaces available on a given target system.

Application elements represent applications available on a given target system.

Simplest (and almost useless) example:

<osp:Profile
    xmlns:osp="http://gpe.intel.com/osprs/profile"
    name="my-linux"
    extends="linux">

  <osp:UspaceRoot>/var/tmp/gpe/uspaces</osp:UspaceRoot>

  <osp:Delimiter>/</osp:Delimiter>

</osp:Profile>

Here we define the OS profile named my-linux. It says that the job working directories will be placed to /var/tmp/gpe/uspaces, and that the operating system uses a slash character (/) as the file name delimiter. This profile extends another profile - linux, which, most likely, defines the basic invocations required to run system-specific commands on Linux.

2. Incarnation Templates

Template definitions are based on the concept of script template, which is a structured representation of string template with parameter substitutions. These templates can be used to represent not only parameterized shell scripts, but also other entities that are generated using some string pattern, such as path names for storages.

2.1. Replacements

To make templates capable of using different sorts of parameters, different textual replacements are used. Basic replacement is represented in template body as <replacement_name> , where replacement_name is the name of field (see later) to be replaced with its value.

Two-level replacements are also possible. They are represented in template body as: <replacement_name/replace_from/replace_to>

where replacement_name is the name of entity to be replaced, and replace_from is a string to be replaced by replace_to in the value of corresponding entity before placing it into resulting script. replace_from and replace_to are, in fact, regular expressions. Refer to the Java documentation on regular expressions for more details.

Consider as an example the following script template:

cp <SOURCE> <DESTINATION/\.log/>

If value of the SOURCE is "x.log" and value of DESTINATION is "y.log" , then after performing the replacements the script will look like: cp x.log y (<SOURCE> replaced with "x.log", <DESTINATION/.log/> is replaced with "y" - result of replacing ".log" in "y.log" with empty string).

NB: since dot (".") is a special character in Java regular expressions, we had to backslash it.

2.2. Invocation variations

Single template can contain several possible named string templates. For example, script templates can be used to perform required action in different conditions that require different script templates, such as launching application in standard or debug mode. These different invocation versions are called "invocation variations".

Invocation variations are defined as:

Body element is used when you want template to produce some string - for example, a directory path, or a simple shell command.

StaticScript element is used when you want to generate (and run) a specific script on a target system. Static scripts are stored in a specific place (usually a subdirectory of user's HOME), defined by BIN template.

Default variation ("") of this template defines the directory where scripts will be stored

Variation STATIC_SCRIPT defines the script files paths (using <FILE_NAME> field as a script file name).

Variation RUN_STATIC_SCRIPT defines the shell command used to run the scirpts. It refers to the following fields:
TARGET - full script file path (incarnated using STATIC_SCRIPT varation);
ARGUMENTS - script arguments.

Static scripts are generated once and stored in a given directory (with appropriate file permissions). Later these scripts are invoked ty the server to perform system-specific operations. In general, it is better to use static scripts for anything but running other static scripts.

Static script elements have the following structure:

Element body is the body of a given static script. It is written as is to the corresponding script file (with name provided by fileName attribute). Later, when the command is used, this script is invoked with arguments, where you can use template substitutions to pass actual parameters to this script.

2.3. Fields

Fields represent variables that have values specific for a certain instance of template. They can be settable (used as parameters for template) and non-settable (specific properties of entity that is represented by template). They also can be public (possibly seen from outside) and non-public.

Each job submitted to GPE TargetSystem is actually a set of parameter values for a given application. These values are used to substitute the occurrences of corresponding replacement in script template for a given application. For example, if field X has value "Hello", then template echo <X> will be transformed into echo Hello.

Field definitions in OS Profle actually may be used to control the values of corresponding parameters. This is achieved with the help of "tags" and "default values".

Tags are constants somewhat similar to enumeration type tags in high-level programming languages. Each tag has name and value, and is valid in the scope of corresponding field of corresponding template definition. Tags act straight and simple: if value of a given field is equal to tag name, then it is replaced with tag value. This is useful to abstract frequently used command-line options, e.g., Fortran compiler optimization levels.

Field definition can provide either a "fixed value" or "default value". Fixed value is used to provide specific incarnation parameter required by incarnation and overrides everything else, even if incarnation request contains corresponding parameter (although usually it should not). They are often found in system-specific incarnations in case when server requires additional information to perform its operations - for example, it should know, how to extract job identifier from the output of job submission command, and uses incarnation fields to provide regular expression for this task. Default value is used if incarnation request does not provide the value for a field, but it is required in incarnation.

You can also provide limitations for the values of numeric fields. For example, you may specify that a given MPI application should run on no more than 10 nodes. If the value of the field does not fit in a given limit, target system fails to incarnate the job and the user receives corresponding error message.

The complete structure of field definition is defined by the following piece of schema:

Description- optional field description.

Tag's represent the collection of tags associated with corresponding field.

Value - fixed value of the field.

Default - default value of the field.

Min, Max - limits for numeric fields.

isSettable (default: true) - true if the field is settable from incarnation request, false if it can not occur in incarnation request

2.3.1. Special fields

Templates can use several special fields provided by the context in which they are incarnated. Those special fields are:

USER_NAME - user login name

WORKING_DIRECTORY - working directory of current job (available for jobs only)

TargetSystemInfo:XXX - Target System Resource TextInfoResource or NumericInfoResouce property (provided at the target system creation) with the name XXX. This is usually used in mission-specific GPE installations, where target systems are created with specific parameters.

SSH:HOST - host name used to connect to target system via SSH.

SSH:PORT - port used to connect to target system via SSH.

User profile can also contain fields, generally used to configure file paths in data centres with complex rules for HOME allocation.

2.4. Templates definition and example

Structure of template definition is represented by the following piece of XML Schema:

name is the name of the template.

Description is an option template description.

Invocation elements represent the list of invocation variations for this template.

Field elements represent the list of fields used in this template.

The following example illustrates the definition of script template for Hello application:

<idb:Template name="Hello">
<idb:Description>Sample Hello script template</idb:Description>
<idb:Invocation name=""><idb:Body><![CDATA[echo <TEXT>]]></idb:Body></idb:Invocation>
<idb:Field name="TEXT">
<idb:Value>Hello</idb:Value>
</idb:Field>
</idb:Template>

In the example above we have a script template writing a given text to job's standard output, and "Hello" if text was not provided in job submission description.

See GPE configuration files for practical examples of incarnation template definitions.

2.5. Required templates

Several templates are required by GPE server to perform basic operations on a given target system. They are:

START - start job. Script must provide started job id in its output. Uses the following fields:

TARGET - path to incarnated job script
STDIN - path to stdin file (useful for "interactive" applications which use piped input)
STDOUT - path to stdout file
STDERR - path to stderr file
JOB_ID_PATTERN - regular expression used find the job id in script output.
JOB_ID_GROUP - number of regular expression group containing job id.

JOB_PROLOGUE - prologue added to each incarnated job script. Used to prepare general environment, e.g., change current directory to job working directory.

JOB_EPILOGUE - epilogue added to each incarnated job script. The GPE server expects that the job exit status is written to <WORKING_DIRECTORY>/.gpe_exit_status).

GET_JOB_STATUS - script used to get current job status. Uses the following fields:

JOB_ID - job id (as obtained from START)
JOB_ENTRY_PATTERN - regular expression pattern used to find the job status information in the script output
STATUS_GROUP - number of regular expression group containig job status
RUNNING_PATTERN - if job status group matches this regular expression pattern, then the job is currently running
HELD_PATTERN - if job status group matches this regular expression pattern, then the job is currently held

HOLD - script used to hold the job. Uses the following fields:

JOB_ID - job id (as obtained from START)

RESUME - script used to resume the held job. Uses the following fields:

JOB_ID - job id (as obtained from START)

ABORT - script used to abort the job. Uses the following fields:

JOB_ID - job id (as obtained from START)

DELETE - script used to remove the files and directories. Uses the following fields:

TARGET - path to file or directory to be removed

LIST - script used to list file properties or directory contents. Template defines two variations, FILE for listing file properties, and DIR for listing directories. Directory entries must be placed in separate output lines. Both variations use the following fields:

TARGET - file or directory to be listed
FILE_PATTERN - regular expression used to obtain file information (given file for listing file properties, directory entry for listing directories)
NAME_GROUP - number of regular expression group containing file name
SIZE_GROUP - number of regular expression group containing file size
ISOWNER_PATTERN - if file entry matches this pattern, then the user for whom this command is executed is the owner of this file
ISDIR_PATTERN - if file entry matches this pattern, then this file is a directory
READABLE_PATTERN - if file entry matches this pattern, then the user for whom this command is executed can read this file
WRITABLE_PATTERN - if file entry matches this pattern, then the user for whom this command is executed can write to this file
EXECUTABLE_PATTERN - if file entry matches this pattern, then the user for whom this command is executed can execute this file
DATETIME_GROUP - number of the regular expression group containing file modification date and time
DATETIME_PATTERN - regular expression pattern used to obtain specific date and time information; contents of DATETIME_GROUP will be matched against it
PM_PATTERN - if contents of DATETIME_GROUP matches this pattern, then time is treated as "P.M." time in 12-hour time format
DATE_GROUP - regular expression group in DATETIME_PATTERN containing the number of the day
MONTH_GROUP - regular expression group in DATETIME_PATTERN containing the number of the month (or month name, if MONTH_LIST is provided)
YEAR_GROUP - regular expression group in DATETIME_PATTERN containing the number of the year
HOUR_GROUP - regular expression group in DATETIME_PATTERN containing the number of the hour
MINUTE_GROUP - regular expression group in DATETIME_PATTERN containing the number of the minute
MONTH_LIST - list of month names separated by semicolon (optional)
EXCLUDED_NAMES_PATTERN - names which should not be included in command results, such as "." and ".""."
ERROR_PATTERN - if the line in script output matches this pattern, then the command produces an error (e.g., attempt to list the contents for non-existing directory)
ERROR_MESSAGE_GROUP - number of the group in ERROR_PATTERN representing the error message which should be reported to the user

3. Storages

Storages information in OS Profile is used to generate file paths on a target system. Storage definitions have the following structure:

name is the storage name, e.g., ROOT or HOME.

template is the name of template used to generate directory path.

Storage templates often refer to special fields such as USER_NAME and fields provided in the user profile.

4. Applications

Application definitions have the following structure:

ApplicationName - name of the application (used by the server to identify the application);

ApplicationVersion - version of the application (used by the server to identify the application);

Description - application description (optional).

name is the name used by OS Profile repository to identify application as a part of OS Profiles hierarchy.

template is the name of script template used to run the application.