Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is a series of exercises that will walk you through the basic tasks as you get on board the Atos HPCF or ECS computing platforms.

Table of Contents

Prerequisites

In order to follow this tutorial, these are the prerequisites you must fulfil before starting:

  • You must have a valid ECMWF account with privileges to access HPCF or ECS. If you only have access to ECS, you may need to skip certain exercises involving HPCF.
  • You must have 2 Factor Authentication enabled with TOTP.
  • You must be able to connect with at least one of the following methods:

Logging into Atos HPCF and ECS

Info

Reference: HPC2020: How to connect

First of all, let's try to connect to the computing services via SSH:

Accessing a login node

Access the default login node of the ATOS HPCF or ECS and take note of what node you are in 

...

titleSolution - HPCF
No Format
ssh hpc-login
hostname

...

titleSolution - ECS
No Format
ssh ecs-login
hostname

Open a new tab in your terminal and connect again. Did you get the same hostname? Why is that?

Expand
titleAnswer

hpc-login hostname is an alias to a load-balanced service of login nodes. You may land on a different one every time you connect.

ecs-login is an alias to a specific login node of the ECS virtual cluster. It is not automatically load-balanced, so you will typically land on the same node on consecutive connections.

Both aliases will always point to a working login node, and the actual node and complex behind it may change depending on the load, system sessions or outages.

Now, from your open SSH session on Atos HPCF or ECS, connect to the main login alias again. Did it ask for a password? Can you set your account up so jumps between hosts are done without a password?

Expand
titleAnswer

Password-less SSH between ECMWF hosts such as Atos HPCF or ECS nodes, or VDI hosts is not set up by default. If you were asked for a password, you can run the following command from your Atos HPCF, ECS or VDI session to set up  key-based authentication:

No Format
ssh-key-setup

After this you should be able to jump between hosts without having to introduce your password.

Besides being convenient, this setup is also necessary for other tools such as ECACCESS or ecinteractive to work properly.

Interactive session

Info

Reference: HPC2020: Persistent interactive job with ecinteractive

Standard sessions on login nodes do not guarantee access to dedicated resources such as cpus or memory, and strict limits on those are imposed.

Can you get a dedicated interactive session with 10 GB of memory and 4 cpus for 8 hours?

...

titleSolution

You can use ecinteractive. It is installed and available on all the Atos HPCF and ECS nodes, as well as the VDI, so you can run it from there

No Format
ecinteractive -c 4 -m 10 -t 8:00

This will create an interactive job with the requested configuration and land you on a shell in a given node.

...

    • own computer

...

Log out of that interactive session. Can you reattach to it?

...

titleSolution

Your job kept running in the background, and there can only be one interactive job per user. You can attach as many concurrent shells to the same interactive session, for example in different terminal tabs, with:

No Format
ecinteractive

Cancel your interactive session

...

titleSolution
No Format
ssh hpc-login
hostname

Storage spaces

Info

Reference: HPC2020: Filesystems

We will now explore the different options when it comes to storing your data.

Main filesystems

Connect to Atos HPCF or ECS main login node. What is your default filesystem? Can you try 4 different ways to accessing that space?

Expand
titleAnswer

The default directory is your HOME directory, which is /home/$USER. It is a dedicated personal space for you, and you can always come back to that with either of the following commands:

No Format
cd
cd ~
cd $HOME
cd /home/$USER

Your HOME directory is accessible across all Atos HPCF, ECS, VDI and EcFlow services.

There are 3 more main storage spaces. Create an empty file called del.me on each one of them? Check that they have been created with ls, and then remove them with rm.

...

titleAnswer

Besides HOME, you also have also access to PERM, HPCPERM and SCRATCH. Like HOME, they are all dedicated personal spaces with their corresponding environment variable. Using those environment variables over hardcoded paths is strongly recommended.

You can use touch to create the test files:

No Format
touch $PERM/del.me
touch $HPCPERM/del.me
touch $SCRATCH/del.me

Check they exist with:

No Format
ls -l $PERM/del.me
ls -l $HPCPERM/del.me
ls -l $SCRATCH/del.me

Remove them with:

No Format
rm $PERM/del.me
rm $HPCPERM/del.me
rm $SCRATCH/del.me

How much space have you used in each of your main 4 filesystems? How much can you store?

...

titleAnswer

All the filesystems have quotas enforced. You can check them with the quota command

No Format
quota

For HOME and PERM, the snippet should look similar to:

No Format
Quota for $HOME:
home_b             user    1234        <space used>   <space limit>       <number of files stored>       -   *

Quota for $PERM
POSIX User      1234    <space used>   <space limit>       <number of files stored>       none

For SCRATCH and HPCPERM the format is slightly different:

    • .

Children Display
styleh2
excerptTyperich content

HTML
<style>
div#content h2 a::after {
 content: " - [read more]";
}
</style>
No Format
Project quota for $SCRATCH and $SCRATCHDIR:
Disk quotas for prj 1000001798 (pid 1000001798):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
       /ec/res4     XXX     YYY     YYY       -     ZZZ     WWW     WWW       -

Project quota for $HPCPERM:
Disk quotas for prj 2000001798 (pid 2000001798):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
       /ec/res4     XXX     YYY     YYY       -     ZZZ     WWW     WWW       -

If you are on the VDI, open a new terminal there. Can you access your HOME, PERM, SCRATCH and HPCPERM ?

Expand
titleAnswer

HOME and PERM are NFS-based Filesystems, which are mounted on all user computing platforms at ECMWF. You may access them with $HOME and $PERM environment variables:

No Format
ls $HOME
ls $PERM

However, SCRATCH and HPCPERM  are Lustre Based filesystem only available on the Atos HPCF, so  they are not available on other computing platforms such as VDI or ecFlow VMs and the corresponding environment variables are therefore not defined.

EXTRA: For long term archival purposes, users with access to HPCF may also use ECFS. Files will be stored in ECMWF's Data Handling System on Tape. Create a small text file and copy it to your ECFS space, then ensure it is there, retrieve it and remove it.

...

titleSolution
No Format
echo "hello world" > test_file.txt
ecp test_file.txt ec:
els -l ec:test_file.txt
ecp ec:test_file.txt retrieved_test_file.txt
diff test_file.txt retrieved_test_file.txt
erm ec:test_file.txt

Temporary spaces

There are a number of temporary spaces you can use in your session or job.

Create a file called testfile on the $TMPDIR, $SCRATCHDIR and /tmp/.

...

titleSolution
No Format
touch $TMPDIR/testfile
touch $SCRATCHDIR/testfile
touch /tmp/testfile

Open another session in the same login node with ssh $HOSTNAME. Can you find the files you have created earlier?

...

titleSolution
No Format
ls -l $TMPDIR/testfile
ls -l $SCRATCHDIR/testfile
ls -l /tmp/testfile

You will not see the files you created in any of those locations, since every session or job will have a different location. This includes /tmp, which is also a dedicated ramdisk for session.

Filesystem Usage

Can you decide what would be the best filesystem to use in the following cases? Why would you make that choice?

Store the source code, scripts and configuration of your programs and workflows

Expand
titleAnswer

HOME would be the preferred choice. They are typically small but important files, so convenience of backups, snapshots and availability on all computing platforms is more important than parallel performance. 

Store Climate Files to be used by your model runs on Atos HPCF.

Expand
titleAnswer

HPCPERM is the right choice for big files that are going to be used concurrently by parallel applications such as NWP models.

Working directory for your jobs.

Expand
titleAnswer

SCRATCH is the go to place for your daily work. Plenty of space, good parallel performance for output data that is transient by nature. Remember to move the data you want to keep after your job somewhere else, since files not used for 30 days will be automatically deleted.

Store data that that you use frequently, which is considerable in size.

Expand
titleAnswer

PERM if accessibility from other computing platforms or the need of snapshots is important. You can see PERM as an extension to your HOME space.

HPCPERM, if I/O performance is more important than, especially if they are going to be used in parallel jobs on Atos HPCF.

Store data for longer term which is considerable in size, such as experiment results. You are not going to use it often.

Expand
titleAnswer

ECFS would be the right place for longer term archival or storing backups. This is by far the place where you can store  However, data on tapes needs to be retrieved to another disk space before it can be used, so it is costly in terms of time.

In order to use ECFS efficiently, remember to store fewer but bigger files, so it is a good idea to use tools like tar or zip to bundle together big directories with lots of files.

...

Temporary files that you don't need beyond the end of the session or job

Expand
titleAnswer

$TMPDIR if performance is important and size is small, since TMPDIR is either in memory (for parallel jobs on HPCF), or on SSD disk.

$SCRATCHDIR if size of the files is big and does not fit TMPDIR.

Recovering Deleted files

Imagine you have accidentally deleted ~/.profile in your HOME directory. Can you get back the latest version?

Expand
titleSolution

You can use the snapshots . You can list all the versions available with:

No Format
ls -l ~/.snapshot/*/.profile

To recover, you would just need to copy the file back into place. 

For longer time spans, use the utility home_snap to get the locations

Imagine you have accidentally deleted a file in your PERM directory. Can you get back the latest version?

Expand
titleSolution

You can use the snapshots . You can list all the versions available with:

No Format
perm_snap

Note that the snapshots are less frequent in PERM.

Imagine you have accidentally deleted a file in your SCRATCH or HPCPERM directories. Can you get back the latest version?

Expand
titleSolution

Unfortunately there are no snapshots or backups for those filesystems, so the data has been lost permanently.

...