This is a series of exercises that will walk you through the basic tasks as you get on board the Atos HPCF.

Prerequisites

In order to follow this tutorial, these are the prerequisites you must fulfil before starting:

You must have a valid ECMWF account with privileges to access HPCF or ECS. If you only have access to ECS, you may need to skip certain exercises involving HPCF.
You must have 2 Factor Authentication enabled with TOTP.
You must be able to connect with at least one of the following methods:
- Using the Virtual Desktop Infrastructure (VDI). See the corresponding documentation to get started.
- Using Teleport SSH from your end user device .

Logging into Atos HPCF and ECS

Reference Documentation

HPC2020: How to connect

First of all, let's try to connect to the computing services via SSH:

Accessing a login node

Access the default login node of the ATOS HPCF or ECS and take note of what node you are in
ssh hpc-login hostname
ssh ecs-login hostname
Open a new tab in your terminal and connect again. Did you get the same hostname? Why is that?
hpc-login hostname is an alias to a load-balanced service of login nodes. You may land on a different one every time you connect.
ecs-login is an alias to a specific login node of the ECS virtual cluster. It is not automatically load-balanced, so you will typically land on the same node on consecutive connections.
Both aliases will always point to a working login node, and the actual node and complex behind it may change depending on the load, system sessions or outages.
Now, from your open SSH session on Atos HPCF or ECS, connect to the main login alias again. Did it ask for a password? Can you set your account up so jumps between hosts are done without a password?
Password-less SSH between ECMWF hosts such as Atos HPCF or ECS nodes, or VDI hosts is not set up by default. If you were asked for a password, you can run the following command from your Atos HPCF, ECS or VDI session to set up key-based authentication:
ssh-key-setup
After this you should be able to jump between hosts without having to introduce your password.
Besides being convenient, this setup is also necessary for other tools such as ECACCESS or ecinteractive to work properly.

Interactive session

Reference documentation

HPC2020: Persistent interactive job with ecinteractive

Standard sessions on login nodes do not guarantee access to dedicated resources such as cpus or memory, and strict limits on those are imposed.

Can you get a dedicated interactive session with 10 GB of memory and 4 cpus for 8 hours?
You can use ecinteractive. It is installed and available on all the Atos HPCF and ECS nodes, as well as the VDI, so you can run it from there
ecinteractive -c 4 -m 10 -t 8:00
This will create an interactive job with the requested configuration and land you on a shell in a given node.
If you are connecting from your own computer via teleport, you can download it and run it there (no Windows native support, only Mac, Linux or WSL supported).
Note that by default and unless ran directly on ECS, ecinteractive will use HPCF as the backend. If you wish to force the session to be on ECS, you can do
ecinteractive -p ecs -c 4 -m 10 -t 8:00
Log out of that interactive session. Can you reattach to it?
Your job kept running in the background, and there can only be one interactive job per user. You can attach as many concurrent shells to the same interactive session, for example in different terminal tabs, with:
ecinteractive
If on ECS, you may want need to run:
ecinteractive -p ecs
Cancel your interactive session
ecinteractive -k
If on ECS, you may want need to run:
ecinteractive -p ecs -l

Storage spaces

Reference documentation

HPC2020: Filesystems

We will now explore the different options when it comes to storing your data.

Main filesystems

Connect to Atos HPCF or ECS main login node. What is your default filesystem? Can you try 4 different ways to accessing that space?
The default directory is your HOME directory, which is /home/$USER. It is a dedicated personal space for you, and you can always come back to that with either of the following commands:
cd cd ~ cd $HOME cd /home/$USER
Your HOME directory is accessible across all Atos HPCF, ECS, VDI and EcFlow services.
There are 3 more main storage spaces. Create an empty file called del.me on each one of them? Check that they have been created with ls, and then remove them with rm.
Besides HOME, you also have also access to PERM, HPCPERM and SCRATCH. Like HOME, they are all dedicated personal spaces with their corresponding environment variable. Using those environment variables over hardcoded paths is strongly recommended.
You can use touch to create the test files:
touch $PERM/del.me touch $HPCPERM/del.me touch $SCRATCH/del.me
Check they exist with:
ls -l $PERM/del.me ls -l $HPCPERM/del.me ls -l $SCRATCH/del.me
Remove them with:
rm $PERM/del.me rm $HPCPERM/del.me rm $SCRATCH/del.me

How much space have you used in each of your main 4 filesystems? How much can you store?

All the filesystems have quotas enforced. You can check them with the quota command

quota

For HOME and PERM, the snippet should look similar to:

Quota for $HOME:
home_b             user    1234        <space used>   <space limit>       <number of files stored>       -   *

Quota for $PERM
POSIX User      1234    <space used>   <space limit>       <number of files stored>       none

For SCRATCH and HPCPERM the format is slightly different:

Project quota for $SCRATCH and $SCRATCHDIR:
Disk quotas for prj 1000001798 (pid 1000001798):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
       /ec/res4     XXX     YYY     YYY       -     ZZZ     WWW     WWW       -

Project quota for $HPCPERM:
Disk quotas for prj 2000001798 (pid 2000001798):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
       /ec/res4     XXX     YYY     YYY       -     ZZZ     WWW     WWW       -

If you are on the VDI, open a new terminal there. Can you access your HOME, PERM, SCRATCH and HPCPERM ?
HOME and PERM are NFS-based Filesystems, which are mounted on all user computing platforms at ECMWF. You may access them with $HOME and $PERM environment variables:
ls $HOME ls $PERM
However, SCRATCH and HPCPERM are Lustre Based filesystem only available on the Atos HPCF, so they are not available on other computing platforms such as VDI or ecFlow VMs and the corresponding environment variables are therefore not defined.
EXTRA: For long term archival purposes, users with access to HPCF may also use ECFS. Files will be stored in ECMWF's Data Handling System on Tape. Create a small text file and copy it to your ECFS space, then ensure it is there, retrieve it and remove it.
echo "hello world" > test_file.txt ecp test_file.txt ec: els -l ec:test_file.txt ecp ec:test_file.txt retrieved_test_file.txt diff test_file.txt retrieved_test_file.txt erm ec:test_file.txt

Temporary spaces

There are a number of temporary spaces you can use in your session or job.

Create a file called testfile on the $TMPDIR, $SCRATCHDIR and /tmp/.
touch $TMPDIR/testfile touch $SCRATCHDIR/testfile touch /tmp/testfile
Open another session in the same login node with ssh $HOSTNAME. Can you find the files you have created earlier?
ls -l $TMPDIR/testfile ls -l $SCRATCHDIR/testfile ls -l /tmp/testfile
You will not see the files you created in any of those locations, since every session or job will have a different location. This includes /tmp, which is also a dedicated ramdisk for session.

Filesystem Usage

Can you decide what would be the best filesystem to use in the following cases? Why would you make that choice?

Store the source code, scripts and configuration of your programs and workflows
HOME would be the preferred choice. They are typically small but important files, so convenience of backups, snapshots and availability on all computing platforms is more important than parallel performance.
Store Climate Files to be used by your model runs on Atos HPCF.
HPCPERM is the right choice for big files that are going to be used concurrently by parallel applications such as NWP models.
Working directory for your jobs.
SCRATCH is the go to place for your daily work. Plenty of space, good parallel performance for output data that is transient by nature. Remember to move the data you want to keep after your job somewhere else, since files not used for 30 days will be automatically deleted.
Store data that that you use frequently, which is considerable in size.
PERM if accessibility from other computing platforms or the need of snapshots is important. You can see PERM as an extension to your HOME space.
HPCPERM, if I/O performance is more important than, especially if they are going to be used in parallel jobs on Atos HPCF.
Store data for longer term which is considerable in size, such as experiment results. You are not going to use it often.
ECFS would be the right place for longer term archival or storing backups. This is by far the place where you can store However, data on tapes needs to be retrieved to another disk space before it can be used, so it is costly in terms of time.
In order to use ECFS efficiently, remember to store fewer but bigger files, so it is a good idea to use tools like tar or zip to bundle together big directories with lots of files.
Temporary files that you don't need beyond the end of the session or job
$TMPDIR if performance is important and size is small, since TMPDIR is either in memory (for parallel jobs on HPCF), or on SSD disk.
$SCRATCHDIR if size of the files is big and does not fit TMPDIR.

Recovering Deleted files

Reference documentation

HPC2020: Filesystems

Imagine you have accidentally deleted ~/.profile in your HOME directory. Can you get back the latest version?
You can use the snapshots . You can list all the versions available with:
ls -l ~/.snapshot/*/.profile
To recover, you would just need to copy the file back into place.
For longer time spans, use the utility home_snap to get the locations
Imagine you have accidentally deleted a file in your PERM directory. Can you get back the latest version?
You can use the snapshots . You can list all the versions available with:
perm_snap
Note that the snapshots are less frequent in PERM.
Imagine you have accidentally deleted a file in your SCRATCH or HPCPERM directories. Can you get back the latest version?
Unfortunately there are no snapshots or backups for those filesystems, so the data has been lost permanently.

Managing your software stack environment

Reference documentation

HPC2020: Software stack

Atos HPCF and ECS computing platforms offer a wide range of software, libraries and tools.

Basic software environment management

You want to use CDO, a popular tool to manipulate climate and NWP model data. What do you need to do to get the following result?
```
$ cdo --version
Climate Data Operators version X.Y.Z (https://mpimet.mpg.de/cdo)
System: x86_64-pc-linux-gnu
...
```
If you run the command without any prior action, you may get:
$ cdo --version -bash: cdo: command not found
Many software packages and tools are not part of your default environment, and need to be explicitly loaded via modules.
So the following commands would be sufficient to get to the desired result:
module load cdo cdo --version
ml shortcut
You can also use the ml shortcut to load the module

ml cdo
Note that we did not ask for any specific version. In those cases, you will get the one defined as default.
How many versions of CDO can be used at ECMWF? Can you pick the newest?
There are hundreds of different packages with their corresponding different versions installed at ECMWF. You can use:
module avail
To see what modules can be loaded at any time.
However, not all modules can be loaded at any time, some will only become available if a certain combination of modules is loaded.
You can also use the following command for an overview or all the packages that are installed, including those that may not be visible in module avail:
module spider
In this case we are only interested in CDO so we can do either:
module avail cdo
or
module spider cdo
To load the newest, you can either explicitly pick up the latest version explicitly, so assuming that it was "X.Y.Z":
module load cdo/X.Y.Z
But you can also use the module tag "new":
module load cdo/new
or also ask for the latest with:
module --latest load cdo
No swap needed
If you had another version of the module loaded, the system will automatically swap it by the new one requested.
Load the netcdf4 module. Can you see what modules do you have loaded in your environment now?
To load the netcdf4 module just do:
module load netcdf4
Then, you can see what your software environment looks like with:
module list
or with just the shortcut:
ml
You should see both the CDO and netcdf4, beside the default modules loaded in your environment.
Remove the netcdf4 module from your environment and check it is gone.
To unload the netcdf4 module just do:
module unload netcdf4
or with just the shortcut:
ml -netcdf4
Then, you can see what your software environment looks like with:
module list
Can you check what is the installation directory of the default netCDF4 library?
All modules at ECMWF will define a <PACKAGE_NAME>_DIR environment variable that can be useful to pass to configuration fiiles or scripts. Packages providing libraries such as netCDF4 will also typically define <PACKAGE_NAME>_LIB and <PACKAGE_NAME>_INCLUDE.
You can check the values of all those variables that a module would define without loading it running:
module show netcdf4
or with just the shortcut:
ml show netcdf4
You can then spot there the value of NETCDF4_DIR pointing to /usr/local/apps/netcdf4/X.Y.Z/COMPILER_FAMILY/COMPILER_VERSION
Can you restore the default environment you had when you logged in? Check that the environment is back to the desired state.
If you log out of your session, next time you log in you will start with a fresh default environment. Modules are only loaded for that specific session.
However, if you don't want to log out, you can also reset your module environment with:
module reset
You can then check the effects with
module list
reset vs purge
There is a subtile difference between module reset and module purge. While the former will go back the default environment, which typically contains some default modules, the latter will completely unload all modules and leave you with a blank environment.

ECMWF tools

Reference documentation

HPC2020: ECMWF software and libraries

Can you run codes_info tool, which is part of ecCodes?
If you run the command without any prior action, you may get:
$ codes_info -bash: codes_info: command not found
ecCodes, along with other ECMWF tools such as Metview or Magics are bundled into the ECMWF toolbox. You need to load that module in order to access them:
module load ecmwf-toolbox
Can you see what versions of ECMWF software are part of that module?
You can use the help option in modules to get additional information from the module, which in the case of the ecmwf-toolbox will include the versions of all the packages in the bundle:
module help ecmwf-toolbox
or with just the shortcut:
ml help ecmwf-toolbox
Can you run the ecflow_client command and get the version?
ecFlow is not part of the ecmwf-toolbox module. Since It has its own standalone module, you will need to load that separately:
module load ecflow
or with just the shortcut:
ml ecflow
Once the module is loaded, you can get the version with:
ecflow_client --version

Python and Conda

Reference documentation

HPC2020: Python support

Try to run the command below. Why does it fail? Can you make it work without installing pandas yourself?
```
$ python3 -c "import pandas as pd; print(pd.__version__)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas'
```
The system Python 3 installation is very limited and does not come with many popular extra packages such as pandas. You may use the Python3 stack available in modules, which comes with almost 400 of those extra packages :
module load python3
After that, if you repeat the command it should complete successfully and print pandas version.
python3 -c "import pandas as pd; print(pd.__version__)"
You need to use the latest version of pandas to run a given application. What can you do (without using conda)?
In that case you could use pip to install it yourself. However, installing it directly into your user environment is highly discouraged since it may interfere with other applications you may run or after default software updates on the system side. Instead, for small additions to the default environment it is much more robust to use a python virtual environment.
In this case, you may create a virtual environment based on the installations provided, and just add the new version of pandas:
module load python3 mkdir -p $PERM/venvs cd $PERM/venvs python3 -m venv --system-site-packages myvenv
Then you can activate it only when you need it with:
source $PERM/venvs/myenv/bin/activate
Note that we used $PERM/venvs as the location of these virtual environments, but you may decide to put them in another location.
With the environment activated, you can now install the new version of pandas:
pip install -U pandas
Then you can rerun the version command to check you got the latest
python3 -c "import pandas as pd; print(pd.__version__)"
When you have finished with your environment, you can deactivate it with:
deactivate
You may also use conda to create your own software stack with python packages and beyond. In order to use conda, you can load the corresponding module:
```
module load conda
```
What happened?
While conda may be seen as a way to set up custom Python environments, it also manages software beyond that, installing other packages and libraries not necessarily related to Python itself.
Because those may conflict with the software made available through modules, loading the conda module effectively disables all the other modules that may be loaded in your environment.
You have seen how the module system may have disabled a number of modules. You can also check it by running:
module list
You would then need to install everything you need to run your application or workflow in your conda environment.
If you want to go back to the previous environment without conda but with all the other modules, the recommended way is to reset the environment and then load explicitly all the necessary modules again
module reset module load python3
Create your new conda environment with latest pandas in it. Check the version Hint: you can also use mamba to speed up the environment creation process
In that case you could use pip to install it yourself. However, installing it directly into your user environment is highly discouraged since it may interfere with other applications you may run or after default software updates on the system side. Instead, for small additions to the default environment it is much more robust to use a python virtual environment.
In this case, you may create a virtual environment based on the installations provided, and just add the new version of pandas:
mamba create -n mypandas -c conda-forge python pandas conda activate mypandas python3 -c "import pandas as pd; print(pd.__version__)"

Using Containerised applications

Reference documentation

HPC2020: Container support

The default psql command, part of the PostgreSQL package is not up to date. You need to run the latest version, but you do not want to build it from source. A possible solution is to use a containerised version of this application. Can you run this on Atos HPCF or ECS?
You can use Apptainer to run docker or any OCI-compatible container images.
module load apptainer apptainer exec docker://postgres:latest psql --version
You can also download the image and run it directly later with:
apptainer pull docker://postgres:latest ./postgres_latest.sif psql --version

Space shortcuts

Page tree

Prerequisites

Logging into Atos HPCF and ECS

Accessing a login node

Interactive session

Storage spaces

Main filesystems

Temporary spaces

Filesystem Usage

Recovering Deleted files

Managing your software stack environment

Basic software environment management

ECMWF tools

Python and Conda

Using Containerised applications

Space shortcuts

Page tree

Atos HPCF - ECS Introduction Tutorial

Prerequisites

Logging into Atos HPCF and ECS

Accessing a login node

Interactive session

Storage spaces

Main filesystems

Temporary spaces

Filesystem Usage

Recovering Deleted files

Managing your software stack environment

Basic software environment management

ECMWF tools

Python and Conda

Using Containerised applications