Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

With csWith 4 identical Atos complexes (also known as clusters) (installed in our Data Centre in Bologna - see Atos HPCF: System overview), we are now able to provide a more reliable computing service at ECMWF, including for batch work. For example, during a system session on one complex, we will submit batch jobs to a different complex. This enhanced batch service however may require the use of some ECMWF customised SLURM commands.

...

Note
titlesbatch

If you use the SLURM sbatch command, in /usr/bin, you will not benefit from the cross-complex job submission. E.g., under cron and by default, PATH only contains /usr/bin; you will then only submit jobs to the complex you cron entry is on.

All SLURM sbatch options are available with the ECMWF customised sbatch command

Job IDs are unique amongst all complexes, no risk to have duplicated ones.

Monitoring a job: ecsqueue

...

No Format
$ ecscancel --help
usage: ecscancel [-h] [-u USER] [-t STATE] [-f] [-b] [-i] [-q QOS]
                 [-n JOBNAME] [-s SIGNAL] [-M CLUSTERS]
                 [jobid [jobid ...]]

positional arguments:
  jobid                 list of jobids

optional arguments:
  -h, --help            show this help message and exit
  -u USER, --user USER  scancel for particular user
  -t STATE, --state STATE
                        scancel for particular state
  -f, --full            scancel full
  -b, --batch           scancel batch step
  -i, --interactive     scancel interactive
  -q QOS, --qos QOS     scancel qos
  -n JOBNAME, --jobname JOBNAME
                        scancel jobname
  -s SIGNAL, --signal SIGNAL
                        scancel with a signal
  -M CLUSTERS, --clusters CLUSTERS
                        scancel for particular cluster, or comma separated
                        list of clusters
$ ecscancel <jobid>
# will cancel canceljobjob <jobid> on one of the four complexes.


Note
titleecscancel

ecscancel is located in /usr/local/bin. You may need to adapt your PATH.

Only limited SLURM scancel options are available with ecscancel.