Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

From 2023-01-18 ECMWF will enforce killing jobs if they have reached their wall time if "#SBATCH --time=" or command line option "-–time="

...

were provided with the job.


Alternatively ECMWF accepts jobs without "#SBATCH --time=" or command line option "-–time=" and ECMWF will instead use average runtime of previous

...

"similar" jobs.

"similar" jobs can be identified by a jobtag generated based on user ID, job name, job geometry and job output path.

After the first successful run the estimated runtime will be the average runtime of previous successful runs + one standard deviation.

For jobs were no recent previous runtime could be found, the assumption is 24h runtime plus an allowance of For new jobs we will assume 24h runtime and allow another 24h grace time, allowing a new job to run for up to 48h. After the first successful run it will use the average + one standard deviation of the previous 20 successful runtimes of the job allowing the job 24h grace time.

As on the ECMWF Cray HPC the job will be given a 24h grace time to allow for system sessions and other issues holding up job progressing.


Example output which can be found in job outputExample:

Code Block
[ECMWF-INFO -sbatch] - ------------------------------
[ECMWF-INFO -sbatch] - jobtag: sycw-testit6-1x2-/home/sycw/slurm/slurm-_JOBID_.out
[ECMWF-INFO -sbatch] - ------------------------------
[ECMWF-INFO -sbatch] - Average Walltime 6 with a Standard Deviation 3
[ECMWF-INFO -sbatch] - Runtime history
[ECMWF-INFO -sbatch] -   Date               | Cores   Cluster   Walltime       Mem
[ECMWF-INFO -sbatch] -   03.01.2023 - 15:24 | 2          ad         9          120000M   
[ECMWF-INFO -sbatch] -   12.12.2022 - 12:50 | 2          aa         6          1000M     
[ECMWF-INFO -sbatch] -   06.12.2022 - 15:29 | 2          ad         7          1000M     
[ECMWF-INFO -sbatch] -   05.12.2022 - 08:31 | 2          ac         7          1000M     
[ECMWF-INFO -sbatch] -   02.12.2022 - 10:15 | 2          aa         6          1000M     
[ECMWF-INFO -sbatch] -   02.12.2022 - 10:15 | 2          aa         6          1000M     
[ECMWF-INFO -sbatch] -   01.12.2022 - 11:00 | 2          aa         7          1000M     
[ECMWF-INFO -sbatch] -   30.11.2022 - 14:33 | 2          ad         7          1000M     
[ECMWF-INFO -sbatch] -   23.11.2022 - 13:34 | 2          ac         7          1000M     
[ECMWF-INFO -sbatch] -   23.11.2022 - 13:30 | 2          ac         7          1000M     
[ECMWF-INFO -sbatch] -   23.11.2022 - 13:23 | 2          ac         7          1000M     
[ECMWF-INFO -sbatch] -   23.11.2022 - 13:17 | 2          ac         6          1000M     
[ECMWF-INFO -sbatch] -   23.11.2022 - 13:14 | 2          ac         7          1000M     
[ECMWF-INFO -sbatch] -   23.11.2022 - 13:10 | 2          ac         7          1000M     
[ECMWF-INFO -sbatch] - ['/usr/bin/sbatch', '--cpus-per-task=1', '--job-name=testit6', '--ntasks=1', '--no-requeue', '--nodes=1', '--qos=nf', '--mem=120000m', '--licenses=h2resw01', '--time=00:00:09', '/home/sycw/slurm/gpil.job']

...