You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

With 4 identical Atos complexes (see Atos HPCF: System overview), we are now able to provide a more reliable computing service at ECMWF, including for batch work. For example, during a system session on one complex, we will submit batch jobs to a different complex. This enhanced batch service however may require the use of some ECMWF customised SLURM commands.

Job submission: sbatch

The default PATH includes /usr/local/bin, which contains an ECMWF local version of sbatch, that can submit batch jobs to a different complex. For example, before one session, say on complex AA, we may decide to submit HPCF batch jobs to another complex, say, AB. This will happen transparently for all our users.

sbatch

If you use the SLURM sbatch command, in /usr/bin, you will not benefit from the cross-complex job submission. E.g., under cron and by default, PATH only contains /usr/bin; you will then only submit jobs to the complex you cron entry is on.

All SLURM sbatch options are available with the ECMWF customised sbatch command

Job monitoring: ecsqueue

The default SLURM command 'squeue' will list jobs on the current complex. To list the jobs running on another complex, one should use the 'ecqueue' commands.

$ ecsqueue --help
usage: ecsqueue [-u USER] [-h] [-o FORMAT] [-O FORMAT] [-q QOS] [-j JOBID]
                [-M CLUSTERS]
$ ecsqueue -u $USER
# will show all the jobs running for you on the 4 Atos complexes.

ecsqueue

ecsqueue is located in /usr/local/bin. You may need to adapt your PATH.

Only limited SLURM squeue options are available with ecsqueue.

  • No labels