Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

At ECMWF we use ECFLOW ecFlow to schedule our operational suites. We also separate out critical and non-critical tasks, thus allowing our operators to more easily monitor suites. In the Figure 2 2 we show the coarse structure of one of our operational suites.
The suite is divided into four sections: "main" handles the time critical parts of the suite, such as the actual model, "lag" handles the archiving and other non time-critical tasks, "pop" handles the plotting of results and "msjobs" handle the submission of member state jobs.
Note that each family has its own date repeat labelled YMD. This allows us to use triggers including the suite date (YMD) and also allows the less critical tasks to run even a few days behind if necessary. This is useful when running a test suite not in real-time.

Anchor
_Ref158526538
_Ref158526538
Anchor
_Ref158524624
_Ref158524624
Figure 2 2 Sample suite structure

ECFLOW ecFlow can help in the monitoring of suites in many ways, beyond the indication of task status. For instance the late command in ECFLOW ecFlow can be used to highlight problems with time critical scripts. The command will mark a node as late when certain conditions are met; such as submitted for too long, running for too long or not active by a certain time. This is used in conjunction with a GUI to launch a pop up window once a late condition is reached. To use this option you need to make sure that the GUI option "show/special/late nodes" is selected.
In a number of our suites we have also defined check tasks that interrogate ECFLOW ecFlow using the status command to find out if tasks have, for instance, completed at a given time.