You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

From 2022-11-16 ECMWF will enforce killing jobs if they have reached their wall time if "#SBATCH time=" or "–time=" was provided with the job.


Alternatively ECMWF accepts jobs without "#SBATCH time=" or "–time=" and ECMWF will instead use average runtime of previous jobs trying to bunch them using job name and job output path.

For new jobs we will assume 24h runtime and allow another 24h grace time, allowing a new job to run for up to 48h. After the first successful run it will use the average+ one standard deviation of the previous 20 successful runtimes of the job allowing the job 24h grace time.

Example:

[ECMWF-INFO -sbatch] - ------------------------------
[ECMWF-INFO -sbatch] - jobtag: sycw-test-2x512-/home/sycw/slurm/slurm-_JOBID_.out
[ECMWF-INFO -sbatch] - ------------------------------
[ECMWF-INFO -sbatch] - Average Walltime 279 with a Standard Deviation 515
[ECMWF-INFO -sbatch] - Runtime history
[ECMWF-INFO -sbatch] -   Date               | Cores   CPUTime   Walltime       Mem
[ECMWF-INFO -sbatch] -   02.11.2022 - 13:45 | 512        0          1306       200M      
[ECMWF-INFO -sbatch] -   02.11.2022 - 11:27 | 512        0          1306       200M      
[ECMWF-INFO -sbatch] -   31.10.2022 - 10:21 | 512        0          1306       200M      
[ECMWF-INFO -sbatch] -   28.10.2022 - 14:39 | 512        0          1306       200M      
[ECMWF-INFO -sbatch] -   28.10.2022 - 12:28 | 512        0          135        200M      
[ECMWF-INFO -sbatch] -   25.10.2022 - 15:19 | 512        0          5          200M      
[ECMWF-INFO -sbatch] -   25.10.2022 - 15:18 | 512        0          6          200M      
[ECMWF-INFO -sbatch] -   25.10.2022 - 15:18 | 512        0          6          200M      
[ECMWF-INFO -sbatch] -   25.10.2022 - 15:05 | 512        0          7          200M      
[ECMWF-INFO -sbatch] -   20.10.2022 - 08:48 | 512        0          136        200M      
[ECMWF-INFO -sbatch] -   19.10.2022 - 09:44 | 512        2560       5          200M      
[ECMWF-INFO -sbatch] -   19.10.2022 - 09:43 | 512        3072       6          200M      
[ECMWF-INFO -sbatch] -   19.10.2022 - 09:37 | 512        3072       6          200M      
[ECMWF-INFO -sbatch] -   18.10.2022 - 08:52 | 512        3072       6          200M      
[ECMWF-INFO -sbatch] -   18.10.2022 - 08:52 | 512        3072       6          200M      
[ECMWF-INFO -sbatch] -   18.10.2022 - 08:52 | 512        2560       5          200M      
[ECMWF-INFO -sbatch] -   18.10.2022 - 08:52 | 512        4096       8          200M      
[ECMWF-INFO -sbatch] -   18.10.2022 - 08:52 | 512        4096       8          200M      
[ECMWF-INFO -sbatch] -   18.10.2022 - 08:52 | 512        4096       8          200M      
[ECMWF-INFO -sbatch] -   18.10.2022 - 08:51 | 512        3072       6          200M
[ECMWF-INFO -sbatch] - ['/usr/bin/sbatch', '--job-name=test', '--nodes=2', '--qos=np', '--time=00:05', '--mem-per-cpu=100', '--export=EC_user_time_limit=00:05', '/home/sycw/slurm/time.job']



ALL THIS SHOULD GO INTO A SEPARATE PAGE EXPLAINING OUR ESTIMATED RUNTIME STUFF

  • No labels