Dmitry N. Petrov, Intel Corporation
Version 0.3 December 5, 2007
Version history
| Version and date | By whom | Changes | 
| 0.0 October 24, 2005 | Dmitry N. Petrov | Document created | 
| 0.1 December 14, 2005 | Dmitry N. Petrov | "Global" fields semantics | 
| 0.2 March 14, 2005 | Dmitry N. Petrov | Reflect IDB refactoring | 
| 0.3 December 5, 2007 | Dmitry N. Petrov | Reflect the changes in server configuration mechanisms | 
This document describes Operating System Profile (OS Profile) configuration file format used in GPE. It assumes basic knowledge of XML and basic understanding of how does GPE server work.
OS Profile allows Target System Service (TSS) to be completely generic, relying only on Java run-time environment and properties of the target system defined in OS Profile to perform required operations. OS Profile contains information on applications installed at the target system, available storages, and templates used to incarnate scripts and file paths.
OS Profiles are organised in hierarchies (processed by OS Profile Repository tools), so, for example, you can provide basic system specific information about how to create directories and list file properties on a Linux system in one file, and refer to it in another file containing information for specific target system, such as the list of installed applications.
NB!: platform specific information often contains characters that 
can be treated in some special way by the XML parser. For example, angle brackets that are 
often used to redirect stream input-output are treated as tag brackets. 
Use CDATA syntax to avoid such collisions.
For documentation purposes we use extracts from corresponding XML schemas.
Namespace prefix idb: is associated with the namespace 
http://gpe.intel.com/idb.
Namespace prefix unigrids: is associated with the namespace
http://unigrids.org/2005/06/types.
Namespace prefix jsdl: is associated with the namespace 
http://schemas.ggf.org/jsdl/2005/06/jsdl.
Several sample configuration files are provided in GPE installation package 
(e.g., conf/sample_profile). If you read this document for the first time, 
you may find it useful to take a look at these files - they provide good practical
examples.
We are always looking for feedback so if you have any comments or bug reports or are just interested in our development please register at our bug tracking system at SourceForge.
OS Profile structure is defined by the followin XML schema:
<complexType name="ProfileType">
  <sequence>
    <element name="UspaceRoot" type="string"/>
    <element name="Delimiter" type="string"/>
    <element name="Template" type="idb:TemplateType" minOccurs="0" maxOccurs="unbounded"/>
    <element name="Storage" type="tns:StorageTemplateType" minOccurs="0" maxOccurs="unbounded"/>
    <element name="Application" type="tns:ApplicationType" minOccurs="0" maxOccurs="unbounded"/>
  </sequence>
  <attribute name="name" type="string" use="required"/>
  <attribute name="extends" type="string" use="optional"/>
</complexType>
name identifies OS Profile as a part of the profiles hierarchy. Other profiles can extend it referring to it by this name.
extends identifies the parent OS Profile. When one profile exteds another profile, it can use all the entities (applications, storages, script templates) defined in parent profile. If child profile introduces its own entities with the same names, they override corresponding definitions taken from parent profile.
UspaceRoot is the absolute path to root directory for job working directories. This directory should be accessible by the users who run jobs on the given target system.
Delimiter is the file delimiter used by this target system. It is used to incarnate file paths.
Template elements represent templates used to incarnate scripts and file paths.
Storage elements represent file spaces available on a given target system.
Application elements represent applications available on a given target system.
Simplest (and almost useless) example:
<osp:Profile
    xmlns:osp="http://gpe.intel.com/osprs/profile"
    name="my-linux"
    extends="linux">
 
  <osp:UspaceRoot>/var/tmp/gpe/uspaces</osp:UspaceRoot>
 
  <osp:Delimiter>/</osp:Delimiter>
 
</osp:Profile>
Here we define the OS profile named my-linux. It says that the job working directories
will be placed to /var/tmp/gpe/uspaces, and that the operating system uses a slash
character (/) as the file name delimiter. This profile extends another profile -
linux, which, most likely, defines the basic invocations required to run 
system-specific commands on Linux.
Template definitions are based on the concept of script template, which is a structured representation of string template with parameter substitutions. These templates can be used to represent not only parameterized shell scripts, but also other entities that are generated using some string pattern, such as path names for storages.
To make templates capable of using
different sorts of parameters, different textual replacements are used. Basic
replacement is represented in template body as <replacement_name>
, where replacement_name is the name of field (see later) to be
replaced with its value.
Two-level replacements are also possible.
They are represented in template body as:
<replacement_name/replace_from/replace_to>
where replacement_name is the name of entity to be replaced, and replace_from is a string to be replaced by replace_to in the value of corresponding entity before placing it into resulting script. replace_from and replace_to are, in fact, regular expressions. Refer to the Java documentation on regular expressions for more details.
Consider as an example the following script template:
cp <SOURCE> <DESTINATION/\.log/>
If value of the SOURCE is "x.log" 
and value of DESTINATION is "y.log"
, then after performing the replacements the script will look like:
cp x.log y (<SOURCE>
replaced with "x.log", <DESTINATION/.log/>
is replaced with "y" -
result of replacing ".log" in "y.log" 
with empty string).
NB: since dot (".") is a special character in Java regular expressions, we had to backslash it.
Single template can contain several possible named string templates. For example, script templates can be used to perform required action in different conditions that require different script templates, such as launching application in standard or debug mode. These different invocation versions are called "invocation variations".
Invocation variations are defined as:
<complexType name="InvocationType">
  <sequence>
    <element name="Description" minOccurs="0" type="string"/>
    <choice>
      <element name="Body" type="string"/>
      <element name="StaticScript" type="idb:StaticScriptType"/>
    </choice>
  </sequence>
  <attribute name="name" type="string" use="optional"/>
</complexType>
Body element is used when you want template to produce some string - for example, a directory path, or a simple shell command.
StaticScript element is used when you want to generate (and run) a specific script
on a target system. Static scripts are stored in a specific place (usually a subdirectory of
user's HOME), defined by BIN template.
Default variation ("")
of this template defines the directory where scripts will be stored
Variation STATIC_SCRIPT defines the script files paths 
(using <FILE_NAME> field as a script file name).
Variation RUN_STATIC_SCRIPT defines the shell command used to run the scirpts.
It refers to the following fields:
TARGET - full script file path (incarnated using STATIC_SCRIPT 
varation);
ARGUMENTS - script arguments.
Static scripts are generated once and stored in a given directory (with appropriate file permissions). Later these scripts are invoked ty the server to perform system-specific operations. In general, it is better to use static scripts for anything but running other static scripts.
Static script elements have the following structure:
<complexType name="StaticScriptType">
  <simpleContent>
    <extension base="string">
      <attribute name="fileName" type="string" use="required"/>
      <attribute name="arguments" type="string" use="required"/>
    </extension>
  </simpleContent>
</complexType>
Element body is the body of a given static script. It is written as is to the corresponding script file (with name provided by fileName attribute). Later, when the command is used, this script is invoked with arguments, where you can use template substitutions to pass actual parameters to this script.
Fields represent variables that have values specific for a certain instance of template. They can be settable (used as parameters for template) and non-settable (specific properties of entity that is represented by template). They also can be public (possibly seen from outside) and non-public.
Each job submitted to GPE TargetSystem is
actually a set of parameter values for a given application. These values are
used to substitute the occurrences of corresponding replacement in script
template for a given application. For example, if field X has value 
"Hello", then template echo <X>
will be transformed into echo Hello.
Field definitions in OS Profle actually may be used to control the values of corresponding parameters. This is achieved with the help of "tags" and "default values".
Tags are constants somewhat similar to enumeration type tags in high-level programming languages. Each tag has name and value, and is valid in the scope of corresponding field of corresponding template definition. Tags act straight and simple: if value of a given field is equal to tag name, then it is replaced with tag value. This is useful to abstract frequently used command-line options, e.g., Fortran compiler optimization levels.
Field definition can provide either a "fixed value" or "default value". Fixed value is used to provide specific incarnation parameter required by incarnation and overrides everything else, even if incarnation request contains corresponding parameter (although usually it should not). They are often found in system-specific incarnations in case when server requires additional information to perform its operations - for example, it should know, how to extract job identifier from the output of job submission command, and uses incarnation fields to provide regular expression for this task. Default value is used if incarnation request does not provide the value for a field, but it is required in incarnation.
You can also provide limitations for the values of numeric fields. For example, you may specify that a given MPI application should run on no more than 10 nodes. If the value of the field does not fit in a given limit, target system fails to incarnate the job and the user receives corresponding error message.
The complete structure of field definition is defined by the following piece of schema:
<complexType name="TagType">
  <simpleContent>
    <extension base="string">
      <attribute name="name" type="string" 
use="required"/>
    </extension>
  </simpleContent>
</complexType>
<complexType name="FieldType">
  <sequence>
    <element name="Description" minOccurs="0" type="string"/>
    <element name="Min" minOccurs="0" type="double"/>
    <element name="Max" minOccurs="0" type="double"/>
    <element name="Tag" minOccurs="0" maxOccurs="unbounded" 
type="idb:TagType"/>
    <element name="Default" minOccurs="0" type="string"/>
    <element name="Value" minOccurs="0" type="string"/>
  </sequence>
  <attribute name="name" type="string" use="required"/>
  <attribute name="isSettable" type="boolean" 
                 
use="optional" default="true"/>
</complexType>
Description- optional field description.
Tag's represent the collection of tags associated with corresponding field.
Value - fixed value of the field.
Default - default value of the field.
Min, Max - limits for numeric fields.
isSettable (default: true) - true if the field is settable from incarnation request, false if it can not occur in incarnation request
Templates can use several special fields provided by the context in which they are incarnated. Those special fields are:
USER_NAME - user login name
WORKING_DIRECTORY - working directory of
current job (available for jobs only)
TargetSystemInfo:XXX - Target System Resource TextInfoResource or 
NumericInfoResouce property (provided at the target system creation) with the name 
XXX. This is usually used in mission-specific GPE installations, where 
target systems are created with specific parameters.
SSH:HOST - host name used to connect to target system via SSH.
SSH:PORT - port used to connect to target system via SSH.
User profile can also contain fields, generally used to configure file paths in data centres with complex rules for HOME allocation.
Structure of template definition is represented by the following piece of XML Schema:
<complexType name="TemplateType">
  <sequence>
    <element name="Description" type="string" minOccurs="0"/>
    <element name="Invocation" type="idb:InvocationType" minOccurs="0" maxOccurs="unbounded"/>
    <element name="Field" type="idb:FieldType" minOccurs="0" maxOccurs="unbounded"/>
  </sequence>
  <attribute name="name" type="string" use="optional"/>
</complexType>
name is the name of the template.
Description is an option template description.
Invocation elements represent the list of invocation variations for this template.
Field elements represent the list of fields used in this template.
The following example illustrates the definition of script template for Hello application:
<idb:Template name="Hello">
  <idb:Description>Sample Hello
script template</idb:Description>
  <idb:Invocation
name=""><idb:Body><![CDATA[echo <TEXT>]]></idb:Body></idb:Invocation>
  <idb:Field
name="TEXT">
   
<idb:Value>Hello</idb:Value>
  </idb:Field>
</idb:Template>
 
In the example above we have a script template writing a given text to job's standard output, and "Hello" if text was not provided in job submission description.
See GPE configuration files for practical examples of incarnation template definitions.
Several templates are required by GPE server to perform basic operations on a given target system. They are:
START - start job. Script must provide started job id in its output.
Uses the following fields:
TARGET - path to incarnated job scriptSTDIN - path to stdin file (useful for "interactive" applications which use piped input)STDOUT - path to stdout fileSTDERR - path to stderr fileJOB_ID_PATTERN - regular expression used find the job id in script output.JOB_ID_GROUP - number of regular expression group containing job id.JOB_PROLOGUE - prologue added to each incarnated job script. Used to prepare
general environment, e.g., change current directory to job working directory.
JOB_EPILOGUE - epilogue added to each incarnated job script. The GPE server expects
that the job exit status is written to <WORKING_DIRECTORY>/.gpe_exit_status).
GET_JOB_STATUS - script used to get current job status. Uses the following fields:
JOB_ID - job id (as obtained from START)JOB_ENTRY_PATTERN - regular expression pattern used to find the job status
information in the script outputSTATUS_GROUP - number of regular expression group containig job statusRUNNING_PATTERN - if job status group matches this regular expression pattern,
then the job is currently runningHELD_PATTERN - if job status group matches this regular expression pattern,
then the job is currently heldHOLD - script used to hold the job. Uses the following fields:
JOB_ID - job id (as obtained from START)RESUME - script used to resume the held job. Uses the following fields:
JOB_ID - job id (as obtained from START)ABORT - script used to abort the job. Uses the following fields:
JOB_ID - job id (as obtained from START)DELETE - script used to remove the files and directories.
Uses the following fields:
TARGET - path to file or directory to be removedLIST - script used to list file properties or directory contents. Template
defines two variations, FILE for listing file properties, and DIR for
listing directories. Directory entries must be placed in separate output lines. Both variations 
use the following fields:
TARGET - file or directory to be listedFILE_PATTERN - regular expression used to obtain file information (given file for
listing file properties, directory entry for listing directories)NAME_GROUP - number of regular expression group containing file nameSIZE_GROUP - number of regular expression group containing file sizeISOWNER_PATTERN - if file entry matches this pattern, then the user for whom this
command is executed is the owner of this fileISDIR_PATTERN - if file entry matches this pattern, then this file is a directoryREADABLE_PATTERN - if file entry matches this pattern, then the user for whom this
command is executed can read this fileWRITABLE_PATTERN - if file entry matches this pattern, then the user for whom this
command is executed can write to this fileEXECUTABLE_PATTERN - if file entry matches this pattern, then the user for whom this
command is executed can execute this fileDATETIME_GROUP - number of the regular expression group containing file modification
date and timeDATETIME_PATTERN - regular expression pattern used to obtain specific date and time
information; contents of DATETIME_GROUP will be matched against itPM_PATTERN - if contents of DATETIME_GROUP matches this pattern, then
time is treated as "P.M." time in 12-hour time formatDATE_GROUP - regular expression group in DATETIME_PATTERN containing 
the number of the dayMONTH_GROUP - regular expression group in DATETIME_PATTERN containing 
the number of the month (or month name, if MONTH_LIST is provided)YEAR_GROUP - regular expression group in DATETIME_PATTERN containing 
the number of the yearHOUR_GROUP - regular expression group in DATETIME_PATTERN containing 
the number of the hourMINUTE_GROUP - regular expression group in DATETIME_PATTERN containing 
the number of the minuteMONTH_LIST - list of month names separated by semicolon (optional)EXCLUDED_NAMES_PATTERN - names which should not be included in command results, such 
as "." and ".""."ERROR_PATTERN - if the line in script output matches this pattern, then the command
produces an error (e.g., attempt to list the contents for non-existing directory)ERROR_MESSAGE_GROUP - number of the group in ERROR_PATTERN representing
the error message which should be reported to the userStorages information in OS Profile is used to generate file paths on a target system. Storage definitions have the following structure:
<complexType name="StorageTemplateType">
  <sequence/>
  <attribute name="name" type="string" use="required"/>
  <attribute name="template" type="string" use="required"/>
</complexType>
name is the storage name, e.g., ROOT or HOME.
template is the name of template used to generate directory path.
Storage templates often refer to special fields such as 
USER_NAME and fields provided in the user profile.
Application definitions have the following structure:
<complexType name="ApplicationType">
  <sequence>
    <element name="ApplicationName" type="string"/>
    <element name="ApplicationVersion" type="string"/>
    <element name="Description" type="string" minOccurs="0"/>
  </sequence>
  <attribute name="name" type="string" use="required"/>
  <attribute name="template" type="string" use="required"/>
</complexType>
ApplicationName - name of the application (used by the server to identify the application);
ApplicationVersion - version of the application (used by the server to identify the application);
Description - application description (optional).
name is the name used by OS Profile repository to identify application as a part of OS Profiles hierarchy.
template is the name of script template used to run the application.