Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated for boaccess and Atos

...

Note that the jobs submitted through ECaccess will be kept in the ECaccess spool. Jobs attached to one event of the ECMWF operational suite will remain in standby mode in ECaccess up until they are submitted to the batch service , e.g. Slurm on ecgatethe Atos HPC. The job and job output files can be retrieved using the ECaccess command ecaccess-job-getsubmit.

We offer job rerunning facilities. If one job fails, you can ask ECaccess to rerun your job using the ecaccess-job-submit command.

Job monitoring

The ECMWF operators can monitor your job, via a special interface. In order to report the correct status of your job to this interface, it is important that you make your job fail if an error occurs. The easiest option will be to use the "set -e" command in the Korne shellKorn Shell. Note that this command is radical; it may stop your jobs in unimportant errorsjob's even when an unimportant error occurs. This command "set -e" is also vital for ECaccess to automatically restart your job on failure.

In order to allow the operators to see your job output files, we recommand recommend you not to specify the job standard output and error files. In which case, these files will be managed by ECaccess and they will be visible to the operators.

If you want to change the content of one of your operational jobs, you should delete the job in standby mode and resubmit to ECaccess the modified version to ECaccess. If you want to remove an operational job, you will delete the job in standby mode .using the ecaccess-job-submit

Summary

  1. Take your existing batch job.
  2. Optionally, remove the batch directives redirecting the job output and error files, to allow the operators to see these files.
  3. For your convenience, make use of the dynamic environmental variables starting with MSJ_.
  4. Optionally, include the "set -e" command, to notify the correct status to ECaccess and the monitoring interface.
  5. Check the events available with the command ecaccess-eventjob-listsubmit.
  6. Submit your job to ECaccess and attach it to the appropriate event, using the ECaccess web interface or the ecaccess-job-submit command.
  7. If you have to correct your job, you should delete the job (ecaccess-job-deletesubmit) in standby mode and resubmit the new version.

...