Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


This document describes the service that allows users to automatically submit jobs to be run when certain points in the daily ECMWF operational forecast suites have been reached. The main purpose is to ensure that certain data is available before e.g. submitting a MARS request. This facility is running using the ECaccess environment. It is available either through the Web interface of ECaccess or with the ECaccess Web Toolkit, available on the Atos HPC or installed locally.  This service is monitored by the operators at ECMWF.

Tip

Before submitting your job to run in the ECaccess environment, you need to set up SSH key-based authentication within ECMWF as explained in HPC2020: How to connect.

Enhanced ECaccess batch service

...

When logged in on the web interface for ECaccess, e.g. at https://boaccess.ecmwf.int/ or on your local gateway, you have the possibility to submit a new job from the left margin.

The upper part of the submission page is shown in figure 1.

Image Added

Figure 1: Job submission - Upper part

You can either type in your job script in the window provided, copy and paste it from another window or upload it from a local file. One important addition to make to your jobs is to add the "set -e" command or alternatively to manage the errors in your jobs and exit accordingly - see Job status, below, for more details.

Image Added

Figure 2: Job submission - lower part

The lower part of the job submission window (see Figure 2) - called subscription - allows you to attach your job to the different events available to you. Simply tick the boxes corresponding to the event(s) when you want to run your job.

...

  1. Keep job input/output for:  ECaccess will create one new job for subsequent notifications of an event. E.g. if you have subscribed one job to the event ‘an00h000’ one new ECaccess job will automatically
    be created and submitted every day when the ECMWF analysis for 00UTC is complete. The jobs used for the previous days will be kept in the ECaccess spool; they will be removed after the number of days specified in this field.
  2. Man page for your job:  The ECMWF operators have utilities to monitor your jobs subscribing to any events of the ECMWF operational activity. In this page, you can give some instructions to the operators
    on what to do if the job fails. Operators could rerun your job (see next point) or possibly inform someone about the problem. If no instructions are given, our operators will not take any specific action on your jobs.
  3. Retry frequency and Retry count:  With these options, you can request your job to be rerun automatically (without the intervention of the ECMWF operators) a certain number of times if it fails.   We recommend you use submit your job with Retry count set to 3 and Retry frequency to 15 minutes (900 seconds) to allow the job to retry 3 times with 15 minutes between job resubmissions. This will sometimes allow the job to complete successfully if the initial failure was caused by a temporary issue.
  4. One script to one notification:  If you have ticked several events for your job, by default you will have one job running for each individual event. If you want only one job to run when all the notifications t the events have been received, you can untick the box labelled ‘one script to one One script to one notification:  If you have ticked several events for your job, by default you will have one job running for each individual event. If you want only one job to run when all the notifications t the events have been received, you can untick the box labelled ‘one script to one notification’. This option could be used if, for example, you want to extract in the same job  some ENS meteogram products and raw ENS data. Your single job will seem to be linked to the two events and will be submitted when a notification to the two events has been sent. Be careful, though, to submit such a job at the correct moment, before the two events occur in the Operational suite, not in between. When you have given all the necessary information about your job, you can submit it. Your job will be taken by ECaccess and put in standby mode - status STDBY. You can monitor your jobs by selecting the link ‘Job submission’ under topic ‘Monitor’ in the left margin. The monitoring page is shown in Figure 3.


Image Added
Figure 3: Job monitoring

In this page you will see all your jobs submitted through ECaccess, both those with subscriptions to some events of the operational suite as well as other jobs. You will also see the jobs due for later schedule, as well as those which have already run for the previous notification of some events. Please note that the name of the job is also shown, when available. You can delete jobs from this page. See Changes in job or supression of jobs, below, for more details.

...

No Format
set -e
mkdir $SCRATCHDIR/data

Of course, in this particular example, you could also use the mkdir  " -p " option:

...

data

Of course, in this particular example, you could also use the mkdir  " -p " option:

Code Block
mkdir -p $SCRATCH/data 

Other common problems that prevent jobs from being re-runnable are

  • Using ectrans without the "-overwrite" option.  If some files have already been transferred successfully with ectrans then, if the job needs to be re-run, ectrans will fail with the error "eccmd error: Target file already exist (exit=-36)".    To avoid this, use the "-overwrite" option
  • Using bzip2.  By default, bzip2 will not overwrite a compressed file if it already exists.  Use the bzip2 "-f" option to force the file to be overwritten.

...

Please note that you can submit your job to ECaccess without setting up what is suggested above. Your jobs will run normally but, without this job control, the ECMWF operators will not notice any errors with your jobs and ECaccess will fail to resubmit your jobs, even if you requested some retries.

...