ecFlow's documentation is now on readthedocs!

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

aborted

Is a node status.

When the ECF_JOB_CMD fails or the job file sends a ecf_client –abort child command, then the task is placed into a aborted state.

active

Is a node status.

If job creation was successful, and job file has started, then the ecf_client –init child command is received by the ecf_server and the task is placed into a active state

autocancel
autocancel is a way to automatically delete a node which has completed. For BNF see autocancel
child command

Child command’s(or task requests) are called from within the ecf script files. They include:

ecf_client –init # Sets the task to the active status

ecf_client –event # Set an event

ecf_client –meter # Change a meter

ecf_client –label # Change a label

ecf_client –msg # Send a message to ecFlow-logfile

ecf_client –wait # wait for a expression to evaluate

ecf_client –abort # Sets the task to the abort status

ecf_client –complete # Sets the task to the complete status

clock

A clock is an attribute of a suite.

A clock always runs in phase with the system clock (UTC in UNIX) but can have any offset from the system clock.

The clock must be either hybrid or real:

Under a hybrid clock, the date never changes unless specifically altered or unless the suite restarts, either automatically or from a begin command.

Under a real clock, the date advances by one day at midnight.

Time and date dependencies work a little differently under the two clocks. The default clock type is hybrid. For BNF see clock

complete

Is a node status.

The node can be set to complete:

By the complete trigger

At job end when the task receives the ecf_client –complete child command

complete trigger

Force a node to be complete if the expression evaluates, without running any of the nodes.

This allows you to have tasks in the suite which a run only if others fail. In practice the node would need to have a trigger also.

cron
Like time, cron defines time dependency for a node, but it can allow the node to be repeated indefinitely. For BNF see cron
date

This defines a date dependency for a node.

There can be multiple date dependencies. The European format is used for dates, which is: dd.mm.yy as in 31.12.2007. Any of the three number fields can be expressed with a wildcard * to mean any valid value. Thus, 01.*.* means the first day of every month of every year. For BNF see date

day

This defines a day dependency for a node.

There can be multiple day dependencies. For BNF see day

defstatus

Defines the default status for a task/family to be assigned to the node when the begin command is issued.

By default node gets queued when you use begin on a suite. defstatus is useful in preventing suites from running automatically once begun or in setting tasks complete so they can be run selectively. For BNF see defstatus

dependencies

Dependencies are attributes of node. They include trigger, date, day, time today, cron, complete trigger, inlimit and limit.

A node that is dependent can not be started as long as some dependency is holding it.

directives

directives are expanded during pre-processing. Examples include:

%include <filename>

%comment : start’s a comment, which is ended by %end directive. The section enclosed by %comment - %end is removed during :term:` pre-processing`

%manual : start’s a manual, which is ended by %end directive. The section enclosed by %manual - %end is removed during :term:` pre-processing` However the manual directive is used to create the manual page

%nopp : stops pre-processing until a line stating with %end is found

%end : End pre-processing of %comment, %manual or %nopp

%VAR% : This direct’s the server to perform variable substitution. This involves searching for a suite definition variable or generated variable of name VAR and substituting in the value of the variable.

ecf script

The ecFlow script refers to an ‘.ecf’ file.

This is similar to a UNIX shell script. The differences, however, includes the addition of “C” like pre-processing directives and ecFlow variable‘s.

ecf_client

This executable is a command line program; it is used for all communication with the server.

To see the full range of commands that can be sent to the ecf_server type the following in a UNIX shell:

ecf_client –help

This functionality is also provided by the ecFlow Python Api see class ecflow.Client

ecf_server

This executable is the server.

It is responsible for scheduling the jobs and responding to ecf_client requests

Multiple servers can be run on the same machine/host providing they are assigned a unique port number.

The server record’s all request’s in the log file.

The server will periodically write out a check point file.

A check point file is the suite definition with additional state information.

ecFlow
ecFlow is the Supervisor Monitoring Scheduler software in place at ECMWF that helps computer jobs design, submission and monitoring both in the research and the operations departments.
ecFlowview

ecFlowview executable is the GUI based client, that is used to visualise and monitor

The hierarchical structure of the suite definition

state changes in the node‘s and the ecf_server, using colour coding

Attributes of the nodes and any dependencies

ecf script file and the corresponding job file

event

The purpose of an event is to signal partial completion of a task and to be able to trigger another job which is waiting for this partial completion.

Only tasks can have events and they can be considered as an attribute of a task.

There can be many events and they are displayed as nodes.

An event has a number and possibly a name. If it is only defined as a number, its name is the text representation of the number without leading zeroes. For BNF see event

extern

This allows an external node to be used in a trigger expression.

All node‘s in trigger‘s must be known to ecf_server by the end of the load command. No cross-suite dependencies are allowed unless the names of tasks outside the suite are declared as external. An external trigger reference is considered unknown if it is not defined when the trigger is evaluated. You are strongly advised to avoid cross-suite dependencies.

Families and suites that depend on one another should be placed in a single suite. If you think you need cross-suite dependencies, you should consider merging the suites together and have each as a top-level family in the merged suite. For BNF see extern

family

Is a node in a suite definition.

A family is a collection of task‘s and families.

Typically you place tasks that are related to each other inside the same family, analogous to the way you create directories to contain related files. For BNF see family

halted

Is a ecf_server state

The following tables reflects the server capabilities in the different states

State User Request Task Request Job Scheduling Auto-Check-pointing
running yes yes yes yes
shutdown yes yes no yes
halted yes no no no
inlimit

The inlimit works in conjunction with limit for providing simple load management

inlimit is added to the node that needs to be limited.

job creation

The process of job creation includes:

o Locating ecf script files , corresponding to the task in the suite definition

o pre-processing

The steps above transforms an ecf script to a job file that can be submitted.

The running jobs will communicate back to the ecf_server by calling child command‘s.

This causes status changes on the node‘s in the ecf_server and flags can be set to indicate various events.

job file

The job file is created by the ecf_server during job creation.

It is derived from the ecf script after expanding the pre-processing directives.

It has the extension ”.job{try number}”, i.e. t1.job1

label
A label has a name and a value and is a way of displaying information in ecFlowview For BNF see label
late

Define a tag for a node to be late.

Suites cannot be late, but you can define a late tag for submitted in a suite, to be inherited by the families and tasks. When a node is classified as being late, the only action ecf_server takes is to set a flag. ecFlowview will display these alongside the node name as an icon (and optionally pop up a window). For BNF see late

limit
limit provides a means of providing simple load management by say limiting the number of tasks submitted to a specific server. Typically you either define limits on suite level or define a separate suite to hold limits so that they can be used by multiple suites. For BNF see limit and inlimit
manual page

Manual pages are part of the ecf script.

This is to ensure that the manual page is updated when the script is updated. The manual page is a very important operational tool allowing you to view a description of a task, and possibly describing solutions to common problems. The pre-processing can be used to extract the manual page from the script file and is visible in ecFlowview. The manual page is the text contained within the %manual and %end directives. They can be seen using the manual button on ecFlowview.

meter
The purpose of a meter is to signal proportional completion of a task and to be able to trigger another job which is waiting on this proportional completion For BNF see meter
node
A node is a suite, family or task
pre-processing

Pre-processing takes place during job creation and acts on directives specified in ecf script file.

This involves:

o expanding any includes file directives. i.e similar to ‘c’ language pre-processing

o removing comments and manual directives

o performing variable substitution

queued

Is a node status.

After the begin command, the task without a defstatus are placed into the queued state

repeat

Repeats provide looping functionality. There can only be a single repeat on a node.

repeat day step [ENDDATE] # only for suites

repeat integer VARIABLE start end [step]

repeat enumerated VARIABLE first [second [third ...]]

repeat string VARIABLE str1 [str2 ...]

repeat file VARIABLE filename

repeat date VARIABLE yyyymmdd yyyymmdd [delta]

The repeat VARIABLE can be used in trigger and complete trigger expressions For BNF see repeat

running

Is a ecf_server state.

The following tables reflects the server capabilities in the different states

State User Request Task Request Job Scheduling Auto-Check-pointing
running yes yes yes yes
shutdown yes yes no yes
halted yes no no no
scheduling

The ecf_server is responsible for task scheduling.

It will check dependencies in the suite definition every minute. If these dependencies are free, the ecf_server will submit the task. See job creation.

shutdown

Is a ecf_server state.

The following tables reflects the server capabilities in the different states

State User Request Task Request Job Scheduling Auto-Check-pointing
running yes yes yes yes
shutdown yes yes no yes
halted yes no no no
status

Each node in suite definition has a status.

Status reflects the state of the node. In ecFlowview the background colour of the text reflects the status.

task status are: unknown, queued, submitted, active, complete, aborted and suspended

ecf_server status are: shutdown, halted, running this is shown on the root node in ecFlowview

submitted

Is a node status.

When the task dependencies are resolved/free the ecf_server places the task into a submitted state. However if the ECF_JOB_CMD fails, the task is placed into the aborted state

suite

Is a node in a suite definition

A suite is a collection of family‘s, variable‘s, repeat and clock definitions. For a complete list of attributes look at BNF for suite

suite definition

The suite definition is the hierarchical node tree.

It describes how your task‘s run and interact.

It can built up using

Once the definition is built, it is loaded into the ecf_server, and started. It is then monitored by ecFlowview

suspended

Is a node state. A node can be placed into the suspended state via a defstatus or via ecFlowview

A suspended node including any of its children can not take part in scheduling until the node is resumed.

task

Is a node in a suite definition, that represents a job that needs to be carried out.

Only tasks can be submitted.

A job inside a task ecf script should generally be re-entrant so that no harm is done by rerunning it, since a task may be automatically submitted more than once if it aborts For BNF see task

time

This defines a time dependency for a node.

Time is expressed in the format [h]h:mm. Only numeric values are allowed. There can be multiple time dependencies for a node, but overlapping times may cause unexpected results. To define a series of times, specify the start time, end time and a time increment. If the start time begins with ‘+’, times are relative to the beginning of the suite or, in repeated families, relative to the beginning of the repeated family. For BNF see time

today

Like time, but “today” does not wrap to tomorrow.

If suites’ begin time is past the time given for the “today” command the node is free to run (as far as the time dependency is concern.) For BNF see today

trigger

Triggers defines a dependency for a task or family.

There can be only one trigger dependency per node, but that can be a complex boolean expression of the status of several nodes. Triggers should be avoided on suites. A node with a trigger can only be activated when its trigger has expired. A trigger holds the node as long as the trigger’s expression evaluation returns false.

unknown

Is a node status.

This is the default node status when a suite definition is loaded into the ecf_server

variable

ECF makes heavy use of different kinds of variables.There are several kinds of variables:

Environment variables: which are set in the UNIX shell before the ecFlow starts. These control ecf_server, and ecf_client .

suite definition variables: Also referred to as user variables. These control ecf_server, and ecf_client and are available for use in job file.

Generated variables: These are generated within the suite definition node tree during job creation and are available for use in the job file.

For BNF see variable

variable inheritance

When a variable is needed at job creation time, it is first sought in the task itself.

If it is not found in the task, it is sought from the task’s parent and so on, up through the node levels until found.

For any node, there are two places to look for variables.

Suite definition variables are looked for first, and then any generated variables.

variable substitution

Takes place during pre-processing

It involves searching each line of ecf script file, for ECF_MICRO character. typically ‘%’

The text between two % character, defines a variable. i.e %VAR%

This variable is searched for in the suite definition.

First the suite definition variables( sometimes referred to as user variables) are searched and then the generated variables.

The value of the variable is replaced between the % characters.

If the variable is not found in the suite definition during pre-processing then job creation fails, and an error message is written to the log file, and the task is placed into the aborted state.

To avoid this variables in the ecf script can be defined as:

%VAR:replacement% : This is similar to %VAR% but if VAR is not found in the suite definition then ‘replacement’ is used.

  • No labels