Parallel jobs run on the compute partition and use the np QoS for regular users.

This queue is not the default, so make sure you explicitly define it your job directives before submission. 

Parallel jobs are allocated exclusive nodes, so they will not share resources with other jobs.

Efficient use of resources

Make sure the job is configured to fully utilise all the computing resources. For small parallel executions you may want to consider using fractional jobs instead.

Affinity

See HPC2020: Affinity for more information on how to set up the cpu binding properly for your parallel runs

MPI application

To spawn an MPI application you must use srun

MPI job example
#!/bin/bash
#SBATCH --job-name=test-mpi
#SBATCH --qos=np
#SBATCH --ntasks=512
#SBATCH --time=10:00
#SBATCH --output=test-mpi.%j.out
#SBATCH --error=test-mpi.%j.out

srun my_mpi_app

The example above would run a 512 task MPI application

Hybrid MPI + OpenMP

To spawn an MPI application you must use srun

This example runs a hybrid application spawning 128 MPI tasks, with each one of them opening up 4 threads.

Hybrid job example
#!/bin/bash
#SBATCH --job-name=test-hybrid
#SBATCH --qos=np
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=4
#SBATCH --time=10:00
#SBATCH --output=test-hybrid.%j.out
#SBATCH --error=test-hybrid.%j.out

# Ensure OpenMP correct pinning
export OMP_PLACES=threads

srun -c $SLURM_CPUS_PER_TASK my_mpi_openmp_app

See man sbatch or https://slurm.schedmd.com/sbatch.html for the complete set of options that may be used to configure a job.

Heterogeneous job and MPMD

To spawn an MPI application you must use srun

This example runs a hybrid Multiple Program Multiple Data (MPMD) application, requiring different geometries for different parts of the MPI execution. The job allocates 3 nodes, and then uses the first one to run executable1 with 64 tasks and 2 threads per rank, while the remaining two nodes are used to run the second executable on 64 tasks and 4 threads per rank.

MPMD job example with heterogeneous geometry
#!/bin/bash
#SBATCH --job-name=test-het
#SBATCH --qos=np
#SBATCH --nodes=3
#SBATCH --hint=nomultithread
#SBATCH --time=10:00
#SBATCH --output=test-het.%j.out
#SBATCH --error=test-het.%j.out

# Needed to avoid occasional job hang at exit
export SLURM_MPI_TYPE=none

# Ensure OpenMP correct pinning
export OMP_PLACES=threads

srun -N1 -n 64 -c 2 executable1 : -N2 -n 64 -c 4 executable2

The minimum allocation for each part of the heterogeneous execution is one node.