See AG: The Lmod Module system for a complete picture

A number of software, libraries, compilers and utilities are made available through the AG: The Lmod Module system.

If you want to use a specific software package, please check if it is already provided in modules. If a package or utility is not provided, you may install it yourself in your account or alternatively report as a "Problem on computing" through the ECMWF Support Portal.

Example: finding relevant modules for NetCDF libraries:

module spider netcdf

The traditional module avail will only show the modules available for the loaded programming environment. 

And then, load the desired module with

module load netcdf4

You may need to load a prgenv first if the desired package is compiler sensitive (i.e. providing libraries to users).

If a package or utility is not provided, you may install it yourself in your account or alternatively raise an issue through the ECMWF Support Portal

AG: Building your programs

When you log in, the GNU prgenv with GCC would be loaded by default, but can be easily changed. For example, to use the NVIDIA Programming Environment or toolchain:

module load prgenv/nvidia

See AG: Compilers and AG: The Lmod Module system for more details on how to customise your build environment and toolchains.

AG: Compilers

Several compiler families are installed on the Atos HPCF. In particular, you will find several versions of the GNU and NVIDIA HPC2020: Compilers

All the support libraries will be providing an installation for each one of those flavours.

Managing those flavours is easy with modules and the prgenv module.

AG: CUDA

CUDA is made available in two ways:

  • Via dedicated cuda module
  • As part of the NVIDIA SDK which provides also the compiler (nvidia module)

AG: MPI

Several MPI implementations are available on the Atos HPCF: Mellanox HPC-X (OpenMPI based), and OpenMPI (provided by Atos).

They are not compatible amongst them, so you should only use one to build your entire software stack. Support libraries are provided for the different flavours to guarantee maximum compatibility.

AG: ECMWF software and libraries

Some ECMWF tools and libraries have been bundled together in the ecmwf-toolbox package. This now provides, from a single place ecCodes, Magics, Metview and ODC.

Other packages such as ecFlow are still available in their classic standalone modules. If in doubt, you may run module spider <package> to find the right module for you.

AG: Python support

Python 2 support

Only Python 3 is provided, as Python 2 was officially declared end-of-life by the January 1st, 2020. 

While Python 3 can be found in the system /usr/bin/python3, this version does not come with many of the extra modules you may need. In those cases, you may want to use the python3 module or the conda module (still under development).

AG: Container support

Docker support

Docker is not supported on Atos HPCF directly for security reasons.

You may use Apptainer if you wish to run containerised workloads. It does not need root privileges to run the containers, and it supports running its own "SIF" container images as well as standard docker containers pulled from any registry such as Docker Hub. Those will get translated automatically into a SIF image before they run. Apptainer is the new name for Singularity.

AG: Monitoring GPU usage

nvidia-smi

nvidia-smi provides monitoring and management capabilities for the GPUs from the command line and will give you instantaneous information about your GPUs.

$ nvidia-smi
Wed Mar  8 14:39:45 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:03:00.0 Off |                    0 |
| N/A   62C    P0   351W / 400W |  39963MiB / 40960MiB |     93%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    181525      C   python                          39960MiB |
+-----------------------------------------------------------------------------+

AG: The Lmod Module system

Lmod is a Lua-based implementation of the module system, and generally claimed to be backwards compatible with the original TCL modules. 

It is used to configure and manage your session or job environment with all those tools and libraries required for your workload. 

See HPC2020: The Lmod Module system for details on how to use this tool