Page History

...

Download and compile the code in your Atos HPCF or ECS shell session with the following commands:

No Format
module load prgenv/gnu hpcx-openmpi wget https://git.ecdf.ed.ac.uk/dmckain/xthi/-/raw/master/xthi.c mpicc -o xthi -fopenmp xthi.c -lnuma

Try to run the program interactively to familiarise yourself with the ouptut:

No Format
$ ./xthi Host=ac6-200 MPI Rank=0 CPU=128 NUMA Node=0 CPU Affinity=0,128

As you can see, only 1 process and 1 thread are run, and they may run on one of two virtual cores assigned to my session (which correspond to the same physical CPU). If you try to run with 4 OpenMP threads, you will see they will effectively fight each other for those same two cores, impacting the performance of your application but not anyone else in the login node:

No Format

$ OMP_NUM_THREADS=4 ./xthi
Host=ac6-200  MPI Rank=0  OMP Thread=0  CPU=128  NUMA Node=0  CPU Affinity=0,128
Host=ac6-200  MPI Rank=0  OMP Thread=1  CPU=  0  NUMA Node=0  CPU Affinity=0,128
Host=ac6-200  MPI Rank=0  OMP Thread=2  CPU=128  NUMA Node=0  CPU Affinity=0,128
Host=ac6-200  MPI Rank=0  OMP Thread=3  CPU=  0  NUMA Node=0  CPU Affinity=0,128

Create a new job script fractional.sh to run xthi with 2 MPI tasks and 2 OpenMP threads, submit it and check the output to ensure the right number of tasks and threads were spawned.

Here is a job template to start with:

Code Block

language	bash
title	broken1.sh
collapse	true

#!/bin/bash
#SBATCH --output=fractional.out
# Add here the missing SBATCH directives for the relevant resources

# Add here the line to run xthi
# Hint: use srun

Expand

title	Solution

Using your favourite editor, create a file called fractional.sh with the following content:

Code Block

language	bash
title	fractional.sh

#!/bin/bash
#SBATCH --output=fractional.out
# Add here the missing SBATCH directives for the relevant resources
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=2

# Add here the line to run xthi
# Hint: use srun
srun -c $SLURM_CPUS_PER_TASK ./xthi

You need to request 2 tasks, and 2 cpus per task in the job. Then we will use srun to spawn our parallel run, which should inherit the job geometry requested, except the cpus-per-task, which must be explicitly passed to srun.

You can submit it with sbatch:

No Format
sbatch fractional.sh

The job should be run shortly. When finished, a new file called fractional.out should appear in the same directory. You can check the relevant output with:

No Format
grep -v ECMWF-INFO fractional.out

You should see an output similar to:

No Format

$ grep -v ECMWF-INFO fractional.out
Host=ad6-202  MPI Rank=0  OMP Thread=0  CPU=  5  NUMA Node=0  CPU Affinity=5,133
Host=ad6-202  MPI Rank=0  OMP Thread=1  CPU=133  NUMA Node=0  CPU Affinity=5,133
Host=ad6-202  MPI Rank=1  OMP Thread=0  CPU=137  NUMA Node=0  CPU Affinity=9,137
Host=ad6-202  MPI Rank=1  OMP Thread=1  CPU=  9  NUMA Node=0  CPU Affinity=9,137

Info

title	Srun automatic cpu binding

You can see srun automatically does certain binding of the cores to the tasks, although perhaps not the best. If you were to instruct srun to avoid any cpu binding with --cpu-bind=none, you would see something like:

No Format

$ grep -v ECMWF-INFO fractional.out
Host=aa6-203  MPI Rank=0  OMP Thread=0  CPU=136  NUMA Node=0  CPU Affinity=4,8,132,136
Host=aa6-203  MPI Rank=0  OMP Thread=1  CPU=  8  NUMA Node=0  CPU Affinity=4,8,132,136
Host=aa6-203  MPI Rank=0  OMP Thread=2  CPU=  8  NUMA Node=0  CPU Affinity=4,8,132,136
Host=aa6-203  MPI Rank=0  OMP Thread=3  CPU=  4  NUMA Node=0  CPU Affinity=4,8,132,136
Host=aa6-203  MPI Rank=1  OMP Thread=0  CPU=132  NUMA Node=0  CPU Affinity=4,8,132,136
Host=aa6-203  MPI Rank=1  OMP Thread=1  CPU=  4  NUMA Node=0  CPU Affinity=4,8,132,136
Host=aa6-203  MPI Rank=1  OMP Thread=2  CPU=132  NUMA Node=0  CPU Affinity=4,8,132,136
Host=aa6-203  MPI Rank=1  OMP Thread=3  CPU=132  NUMA Node=0  CPU Affinity=4,8,132,136

Can you ensure each one of those processes and threads runs on a single physical core, without exploiting the hyperthreading, for optimal performance?

Space shortcuts

Page tree

Versions Compared

Old Version 12

New Version 13

Key