Page History

Slurm is the batch system available. Any script can be submitted as a job with no changes, but you might want to see Writing SLURM jobs to customise it.

To submit a script as a serial job with default options enter the command:

...

And cancel a job with

No Format
scancel <jobid>

Note
Currently the "scancel" command shall be executed on the login node of the same cluster where the job is running.

See the Slurm documentation for more details on the different commands available to submit, query or cancel jobs.

...

These are the different QoS (or queues) available for standard users on the four complexes:

QoS name	Type	Suitable for...	Shared nodes	Maximum jobs per user	Default / Max Wall Clock Limit	Default / Max CPUs	Default / Max Memory
nf	fractional	serial and small parallel jobs. It is the default	Yes	-

2 day

average runtime + standard deviation / 2 days	1 / 64	8 GB / 128 GB
ni	interactive	serial and small parallel interactive jobs	Yes	1

1 day

12 hours / 7 days	1 / 32	8 GB / 32 GB
np	parallel	parallel jobs requiring more than half a node	No	-	average runtime + standard deviation / 2 days	-	240GB / 240 GB per node (all usable memory in a node)

Show If

group	ecmwf

GPU special Partition

On the AC complex there is also the ng queue that gives access to the special partition with GPU-enabled nodes. See HPC2020: GPU usage for AI and Machine Learning for all the details on how to make use of those special resources.

Excerpt

QoS name	Type	Suitable for...	Shared nodes	Maximum jobs per user	Default / Max Wall Clock Limit	Default / Max CPUs	Default / Max Memory per node
ng	GPU	serial and small parallel jobs. It is the default	Yes	-	average runtime + standard deviation / 2 days	1 / -

...

8 GB / 500 GB

ECS

For those using ECS, these are the different QoS (or queues) available for standard users of this service:

QoS name	Type	Suitable for...	Shared nodes	Maximum jobs per user	Default / Max Wall Clock Limit	Default / Max CPUs	Default / Max Memory
ef	fractional	serial and small parallel jobs - ECGATE service	Yes	-	2 day average job runtime + standard deviation / 2 days	1 / 8	8 GB / 16 GB
ei	interactive	serial and small parallel interactive jobs - ECGATE service	Yes	1	1 day 12 hours / 7 days	1 / 4	8 GB / 8 GB
el	long	serial and small parallel interactive jobs - ECGATE service	Yes	-	7 day average job runtime + standard deviation / 7 days	1 / 8	8 GB / 16 GB
et	Time-critical Option 1	serial and small parallel Time-Critical jobs. Only usable through ECACCESS Time Critical Option-1	Yes	-	12 hours average job runtime + standard deviation / 12 hours	1 / 8	8 GB / 16 GB

Info

title	Time limit management

See HPC2020: Job Runtime Management for more information on how the default Wall Clock Time limit is calculated.

Note

title	Limits are not set in stone

Different limits on the different QoSs may be introduced or changed as the system evolves.

Tip

title	Checking QoS setup

If you want to get all the details of a particular QoS on the system, you may run, for example:

No Format
sacctmgr list qos names=nf

Submitting jobs remotely

If you are submitting jobs from a different platform via ssh, please use the *-batch dedicated nodes instead of the *-login equivalents:

For generic remote job submission on HPCF: hpc-batch or hpc2020-batch
For remote job submission on a specific HPCF complex: <complex_name>-batch
For remote job submission to the ECS virtual complex: ecs-batch

For example, to submit a job from a remote platform onto the Atos HCPF:

No Format

ssh hpc-batch "sbatch myjob.sh"

Note

title	Work in progress

Different limits on the different QoSs may be introduced or changed as the system evolves to its final configuration.

HTML
<style> div#content h2 a::after { content: " - [read more]"; } </style>

...

Space shortcuts

Page tree

Versions Compared

Old Version 6

New Version Current

Key

GPU special Partition

ECS

Submitting jobs remotely