Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Logging into Atos HPCF and ECS

Info
titleReference Documentation

: HPC2020: How to connect

First of all, let's try to connect to the computing services via SSH:

...

  1. Access the default login node of the ATOS HPCF or ECS and take note of what node you are in 

    Expand
    titleSolution - HPCF


    No Format
    ssh hpc-login
    hostname



    Expand
    titleSolution - ECS


    No Format
    ssh ecs-login
    hostname



  2. Open a new tab in your terminal and connect again. Did you get the same hostname? Why is that?

    Expand
    titleAnswer

    hpc-login hostname is an alias to a load-balanced service of login nodes. You may land on a different one every time you connect.

    ecs-login is an alias to a specific login node of the ECS virtual cluster. It is not automatically load-balanced, so you will typically land on the same node on consecutive connections.

    Both aliases will always point to a working login node, and the actual node and complex behind it may change depending on the load, system sessions or outages.


  3. Now, from your open SSH session on Atos HPCF or ECS, connect to the main login alias again. Did it ask for a password? Can you set your account up so jumps between hosts are done without a password?

    Expand
    titleAnswer

    Password-less SSH between ECMWF hosts such as Atos HPCF or ECS nodes, or VDI hosts is not set up by default. If you were asked for a password, you can run the following command from your Atos HPCF, ECS or VDI session to set up  key-based authentication:

    No Format
    ssh-key-setup

    After this you should be able to jump between hosts without having to introduce your password.

    Besides being convenient, this setup is also necessary for other tools such as ECACCESS or ecinteractive to work properly.


Interactive session

Info
titleReference documentation

: HPC2020: Persistent interactive job with ecinteractive

...

  1. Can you get a dedicated interactive session with 10 GB of memory and 4 cpus for 8 hours?

    Expand
    titleSolution

    You can use ecinteractive. It is installed and available on all the Atos HPCF and ECS nodes, as well as the VDI, so you can run it from there

    No Format
    ecinteractive -c 4 -m 10 -t 8:00

    This will create an interactive job with the requested configuration and land you on a shell in a given node.

    If you are connecting from your own computer via teleport, you can download it and run it there (no Windows native support, only Mac, Linux or WSL supported).

    Note that by default and unless ran directly on ECS, ecinteractive will use HPCF as the backend. If you wish to force the session to be on ECS, you can do

    No Format
    ecinteractive -p ecs -c 4 -m 10 -t 8:00



  2. Log out of that interactive session. Can you reattach to it?

    Expand
    titleSolution

    Your job kept running in the background, and there can only be one

    Log out of that interactive session. Can you reattach to it?

    Expand
    titleSolution

    Your job kept running in the background, and there can only be one interactive job per user. You can attach as many concurrent shells to the same interactive session, for example in different terminal tabs, with:

    No Format
    ecinteractive

    If on ECS, you may want need to run:

    No Format
    ecinteractive -p ecs



  3. Cancel your interactive Cancel your interactive session

    Expand
    titleSolution


    No Format
    ecinteractive -k

    If on ECS, you may want need to run:

    No Format
    ssh hpc-login
    hostnameecinteractive -p ecs -l



Storage spaces

Info
titleReference documentation

HPC2020: Filesystems

We will now explore the different options when it comes to storing your data.

...

Recovering Deleted files

Info
titleReference documentation

HPC2020: Filesystems

  1. Imagine you have accidentally deleted ~/.profile in your HOME directory. Can you get back the latest version?

    Expand
    titleSolution

    You can use the snapshots . You can list all the versions available with:

    No Format
    ls -l ~/.snapshot/*/.profile

    To recover, you would just need to copy the file back into place. 

    For longer time spans, use the utility home_snap to get the locations


  2. Imagine you have accidentally deleted a file in your PERM directory. Can you get back the latest version?

    Expand
    titleSolution

    You can use the snapshots . You can list all the versions available with:

    No Format
    perm_snap

    Note that the snapshots are less frequent in PERM.


  3. Imagine you have accidentally deleted a file in your SCRATCH or HPCPERM directories. Can you get back the latest version?

    Expand
    titleSolution

    Unfortunately there are no snapshots or backups for those filesystems, so the data has been lost permanently.


Managing your software stack environment

Info
titleReference documentation

HPC2020: Software stack

Atos HPCF and ECS computing platforms offer a wide range of software, libraries and tools.

Basic software environment management

  1. You want to use CDO, a popular tool to manipulate climate and NWP model data. What do you need to do to get the following result?

    No Format
    $ cdo --version
    Climate Data Operators version X.Y.Z (https://mpimet.mpg.de/cdo)
    System: x86_64-pc-linux-gnu
    ...


    Expand
    titleSolution

    If you run the command without any prior action, you may get:

    No Format
    $ cdo --version
    -bash: cdo: command not found

    Many software packages and tools are not part of your default environment, and need to be explicitly loaded via modules.

    So the following commands would be sufficient to get to the desired result:

    No Format
    module load cdo
    cdo --version


    Tip
    titleml shortcut

    You can also use the ml shortcut to load the module

    No Format
    ml cdo


    Note that we did not ask for any specific version. In those cases, you will get the one defined as default.


  2. How many versions of CDO can be used at ECMWF? Can you pick the newest?

    Expand
    titleSolution

    There are hundreds of different packages with their corresponding different versions installed at ECMWF. You can use:

    No Format
    module avail

    To see what modules can be loaded at any time.

    However, not all modules can be loaded at any time, some will only become available if a certain combination of modules is loaded.

    You can also use the following command for an overview or all the packages that are installed, including those that may not be visible in module avail:

    No Format
    module spider

    In this case we are only interested in CDO so we can do either:

    No Format
    module avail cdo

    or 

    No Format
    module spider cdo

    To load the newest, you can either explicitly pick up the latest version explicitly, so assuming that it was "X.Y.Z":

    No Format
    module load cdo/X.Y.Z

    But you can also use the module tag "new":

    No Format
    module load cdo/new

    or also ask for the latest with:

    No Format
    module --latest load cdo


    Tip
    titleNo swap needed

    If you had another version of the module loaded, the system will automatically swap it by the new one requested.



  3. Load the netcdf4 module. Can you see what modules do you have loaded in your environment now?

    Expand
    titleSolution

    To load the netcdf4 module just do:

    No Format
    module load netcdf4

    Then, you can see what your software environment looks like with:

    No Format
    module list

    or with just the shortcut:

    No Format
    ml

    You should see both the CDO and netcdf4, beside the default modules loaded in your environment.


  4. Remove the netcdf4 module from your environment and check it is gone/.

    Expand
    titleSolution

    To unload the netcdf4 module just do:

    No Format
    module unload netcdf4

    or with just the shortcut:

    No Format
    ml -netcdf4

    Then, you can see what your software environment looks like with:

    No Format
    module list



  5. Can you restore the default environment you had when you logged in? Check that the environment is back to the desired state.check what is the installation directory of the default netCDF4 library? 

    Expand
    titleSolution

    All modules at ECMWF will define a <PACKAGE_NAME>_DIR environment variable that can be useful to pass to configuration fiiles or scripts. Packages providing libraries such as netCDF4 will also typically define <PACKAGE_NAME>_LIB and <PACKAGE_NAME>_INCLUDE.

    You can check the values of all those variables that a module would define without loading it running:

    No Format
    module show netcdf4

    or with just the shortcut:

    No Format
    ml show netcdf4

    You can then spot there the value of NETCDF4_DIR pointing to /usr/local/apps/netcdf4/X.Y.Z/COMPILER_FAMILY/COMPILER_VERSION


  6. Can you restore the default environment you had when you logged in? Check that the environment is back to the desired state.

    Expand
    titleSolution

    If you log out of your session, next time you log in you will start with a fresh default environment. Modules are only loaded for that specific session.

    However, if you don't want to log out, you can also reset your module environment with:

    No Format
    module reset

    You can then check the effects with

    No Format
    module list


    Tip
    titlereset vs purge

    There is a subtile difference between module reset and module purge. While the former will go back the default environment, which typically contains some default modules, the latter will completely unload all modules and leave you with a blank environment.



ECMWF tools

Info
titleReference documentation

HPC2020: ECMWF software and libraries

  1. Can you run codes_info tool, which is part of ecCodes?

    Expand
    titleSolution

    If you run the command without any prior action, you may get:

    No Format
    $ codes_info
    -bash: codes_info: command not found

    ecCodes, along with other ECMWF tools such as Metview or Magics are bundled into the ECMWF toolbox. You need to load that module in order to access them:

    No Format
    module load ecmwf-toolbox



  2. Can you see what versions of ECMWF software are part of that module?

    Expand
    titleSolution

    You can use the help option in modules to get additional information from the module, which in the case of the ecmwf-toolbox will include the versions of all the packages in the bundle:

    No Format
    module help ecmwf-toolbox

    or with just the shortcut:

    No Format
    ml help ecmwf-toolbox



  3. Can you run the ecflow_client command and get the version?

    Expand
    titleSolution

    ecFlow is not part of the ecmwf-toolbox module. Since It has its own standalone module, you will need to load that separately:

    No Format
    module load ecflow

    or with just the shortcut:

    No Format
    ml ecflow

    Once the module is loaded, you can get the version with:

    No Format
    ecflow_client --version



Python and Conda

Info
titleReference documentation

HPC2020: Python support

  1. Try to run the command below. Why does it fail? Can you make it work without installing pandas yourself?

    No Format
    $ python3 -c "import pandas as pd; print(pd.__version__)"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No module named 'pandas'


    Expand
    titleSolution

    The system Python 3 installation is very limited and does not come with many popular extra packages such as pandas. You may use the Python3 stack available in modules, which comes with almost 400 of those extra packages :

    No Format
    module load python3

    After that, if you repeat the command it should complete successfully and print pandas version.

    No Format
    python3 -c "import pandas as pd; print(pd.__version__)"



  2. You need to use the latest version of pandas to run a given application. What can you do (without using conda)?

    Expand
    titleSolution

    In that case you could use pip to install it yourself. However, installing it directly into your user environment is highly discouraged since it may interfere with other applications you may run or after default software updates on the system side. Instead, for small additions to the default environment it is much more robust to use a python virtual environment.

    In this case, you may create a virtual environment based on the installations provided, and just add the new version of pandas:

    No Format
    module load python3
    mkdir -p $PERM/venvs
    cd $PERM/venvs
    python3 -m venv --system-site-packages myvenv

    Then you can activate it only when you need it with:

    No Format
    source $PERM/venvs/myenv/bin/activate

    Note that we used $PERM/venvs as the location of these virtual environments, but you may decide to put them in another location. 

    With the environment activated, you can now install the new version of pandas:

    No Format
    pip install -U pandas

    Then you can rerun the version command to check you got the latest

    No Format
    python3 -c "import pandas as pd; print(pd.__version__)"

    When you have finished with your environment, you can deactivate it with:

    No Format
    deactivate



  3. You may also use conda to create your own software stack with python packages and beyond. In order to use conda, you can load the corresponding module:

    No Format
    module load conda

    What happened?

    Expand
    titleAnswer

    While conda may be seen as a way to set up custom Python environments, it also manages software beyond that, installing other packages and libraries not necessarily related to Python itself.

    Because those may conflict with the software made available through modules, loading the conda module effectively disables all the other modules that may be loaded in your environment.

    You have seen how the module system may have disabled a number of modules. You can also check it by running:

    No Format
    module list

    You would then need to install everything you need to run your application or workflow in your conda environment.

    If you want to go back to the previous environment without conda but with all the other modules, the recommended way is to reset the environment and then load explicitly all the necessary modules again

    No Format
    module reset
    module load python3



  4. Create your new conda environment with latest pandas in it. Check the version Hint: you can also use mamba to speed up the environment creation process

    Expand
    titleSolution

    In that case you could use pip to install it yourself. However, installing it directly into your user environment is highly discouraged since it may interfere with other applications you may run or after default software updates on the system side. Instead, for small additions to the default environment it is much more robust to use a python virtual environment.

    In this case, you may create a virtual environment based on the installations provided, and just add the new version of pandas:

    No Format
    mamba create -n mypandas -c conda-forge python pandas
    conda activate mypandas
    python3 -c "import pandas as pd; print(pd.__version__)"


Using Containerised applications

Info
titleReference documentation

HPC2020: Container support

  1. The default psql command, part of the PostgreSQL package is not up to date. You need to run the latest version, but you do not want to build it from source. A possible solution is to use a containerised version of this application. Can you run this on Atos HPCF or ECS? 

    Expand
    titleSolution

    You can use Apptainer to run docker or any OCI-compatible container images.

    No Format
    module load apptainer
    apptainer exec docker://postgres:latest  psql --version

    You can also download the image and run it directly later with:

    No Format
    apptainer pull docker://postgres:latest
    ./postgres_latest.sif psql --version

    If you log out of your session, next time you log in you will start with a fresh default environment. Modules are only loaded for that specific session.

    However, if you don't want to log out, you can also reset your module environment with:

    No Format
    module reset

    You can then check the effects with

    No Format
    module list
    Tip
    titlereset vs purge
    There is a subtile difference between module reset and module purge. While the former will go back the default environment, which typically contains some default modules, the latter will completely unload all modules and leave you with a blank environment.