Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

First of all, let's try to connect to the computing services via SSH:

Accessing a login node

  1. Access the default login node of the ATOS HPCF or ECS and take note of what node you are in 

    Expand
    titleSolution - HPCF


    No Format
    ssh hpc-login
    hostname



    Expand
    titleSolution - ECS


    No Format
    ssh ecs-login
    hostname



  2. Open a new tab in your terminal and connect again. Did you get the same hostname? Why is that?

    Expand
    titleAnswer

    hpc-login hostname is an alias to a load-balanced service of login nodes. You may land on a different one every time you connect.

    ecs-login is an alias to a specific login node of the ECS virtual cluster. It is not automatically load-balanced, so you will typically land on the same node on consecutive connections.

    Both aliases will always point to a working login node, and the actual node and complex behind it may change depending on the load, system sessions or outages.


  3. Now, from your open SSH session on Atos HPCF or ECS, connect to the main login alias again. Did it ask for a password? Can you set your account up so jumps between hosts are done without a password?

    Expand
    titleAnswer

    Password-less SSH between ECMWF hosts such as Atos HPCF or ECS nodes, or VDI hosts is not set up by default. If you were asked for a password, you can run the following command from your Atos HPCF, ECS or VDI session to set up  key-based authentication:

    No Format
    ssh-key-setup

    After this you should be able to jump between hosts without having to introduce your password.

    Besides being convenient, this setup is also necessary for other tools such as ECACCESS or ecinteractive to work properly.


Interactive session

Info

Reference: HPC2020: Persistent interactive job with ecinteractive

Standard sessions on login nodes do not guarantee access to dedicated resources such as cpus or memory, and strict limits on those are imposed.

  1. Can you get a dedicated interactive session with 10 GB of memory and 4 cpus for 8 hours?

    Expand
    titleSolution

    You can use ecinteractive. It is installed and available on all the Atos HPCF and ECS nodes, as well as the VDI, so you can run it from there

    No Format
    ecinteractive -c 4 -m 10 -t 8:00

    This will create an interactive job with the requested configuration and land you on a shell in a given node.

    If you are connecting from your own computer via teleport, you can download it and run it there (no Windows native support, only Mac, Linux or WSL supported).


  2. Log out of that interactive session. Can you reattach to it?

    Expand
    titleSolution

    Your job kept running in the background, and there can only be one interactive job per user. You can attach as many concurrent shells to the same interactive session, for example in different terminal tabs, with:

    No Format
    ecinteractive



  3. Cancel your interactive session

    Expand
    titleSolution


    No Format
    ssh hpc-login
    hostname



Storage spaces

Info

Reference: HPC2020: Filesystems

...

  1. Store the source code, scripts and configuration of your programs and workflows

    Expand
    titleAnswer

    HOME would be the preferred choice. They are typically small but important files, so convenience of backups, snapshots and availability on all computing platforms is more important than parallel performance. 


  2. Store Climate Files to be used by your model runs on Atos HPCF.

    Expand
    titleAnswer

    HPCPERM is the right choice for big files that are going to be used concurrently by parallel applications such as NWP models.


  3. Working directory for your jobs.

    Expand
    titleAnswer

    SCRATCH is the go to place for your daily work. Plenty of space, good parallel performance for output data that is transient by nature. Remember to move the data you want to keep after your job somewhere else, since files not used for 30 days will be automatically deleted.


  4. Store data that that you use frequently, which is considerable in size.

    Expand
    titleAnswer

    PERM if accessibility from other computing platforms or the need of snapshots is important. You can see PERM as an extension to your HOME space.

    HPCPERM, depending whether if I/O performance is more important than, especially if they are going to be used in parallel jobs on Atos HPCF.


  5. Store data for longer term which is considerable in size, such as experiment results. You are not going to use it often.

    Expand
    titleAnswer

    ECFS would be the right place for longer term archival or storing backups. This is by far the place where you can store  However, data on tapes needs to be retrieved to another disk space before it can be used, so it is costly in terms of time.

    In order to use ECFS efficiently, remember to store fewer but bigger files, so it is a good idea to use tools like tar or zip to bundle together big directories with lots of files or accessibility from other computing platforms.


  6. Temporary files that you don't need beyond the end of the session or job

    Expand
    titleAnswer

    $TMPDIR if performance is important and size is small, since TMPDIR is either in memory (for parallel jobs on HPCF), or on SSD disk.

    $SCRATCHDIR if size of the files is big and does not fit TMPDIR.

Recovering Deleted files

  1. Imagine you have accidentally deleted ~/.profile in your HOME directory. Can you get back the latest version?

    Expand
    titleSolution

    You can use the snapshots . You can list all the versions available with:

    No Format
    ls -l ~/.snapshot/*/.profile

    To recover, you would just need to copy the file back into place. 

    For longer time spans, use the utility home_snap to get the locations


  2. Imagine you have accidentally deleted a file in your PERM directory. Can you get back the latest version?

    Expand
    titleSolution

    You can use the snapshots . You can list all the versions available with:

    No Format
    perm_snap

    Note that the snapshots are less frequent in PERM.


  3. Imagine you have accidentally deleted a file in your SCRATCH or HPCPERM directories. Can you get back the latest version?

    Expand
    titleSolution

    Unfortunately there are no snapshots or backups for those filesystems, so the data has been lost permanently.


Managing your software stack environment