Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Example of task include files enabling communication between a batch job and ecFlow servers are available herefrom ECMWF git repository.

ecFlow delegates the job management tasks such as submission, kill or monitor the status to external applications. For your convenience, you may use troika, a tool that will take care of those tasks. To use it, just make sure you have the following variables defined at the suite level:

Code Block
languagebash
titleJob management variables in your suite.def
edit QUEUE nf
edit SCHOST aa
edit ECF_JOB_CMD troika submit -o %ECF_JOBOUT% %SCHOST% %ECF_JOB%
edit ECF_KILL_CMD troika kill %SCHOST% %ECF_JOB%
edit ECF_STATUS_CMD troika monitor %SCHOST% %ECF_JOB%

Of course, you may change queue to np if you are running bigger parallel jobs, or SCHOST to eventually run on other complexes other than aa

By default scancel doesn't send signals other than SIGKILL to the batch step. Consequently, one should use "-b" or "-f" option to send a signal ecFlow job is designed to trap before notifying ecFlow server the job was killed:

No Format
scancel --signal=TERM -b ${SLURM_JOB_ID}

 In this example, SIGTERM (15) was sent but one can use other signals as well

Connecting to the ecFlow server 

...