Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

To understand the design of the new Linux Virtual Desktop system please check System overview - Linux Virtual Desktop VDI and Software stack - Linux Virtual Desktop VDI for a more detailed explanation of how to work with the Linux Virtual Desktop.

Tasks that used to run in the desktop-class Linux machines or other Reading based platforms should typically be migrated to the Atos HPCF. In some cases, if not possible or practical, other options may be explored depending on the use cases, such as the using resources on the internal ECMWF managed cloud or the European Weather Cloud.

Please contact User Services via the ECMWF Support Portal if you are not sure where to run your workflow and need some advice

The ATOS supercomputer is envisaged to absorb all the computing activities and workloads that have traditionally run not only on the HCPF, but also from other services such as ECGATE and internal ECMWF Linux clusters and workstations.

If you are testing your workload on TEMS, there are a number of the things you should pay attention at when porting your activities, both as a general advice as well as specific information for each of the origin platforms.

Table of Contents

General considerations

No csh support

If you are still using csh, please move to a supported shell in advance. Csh is no longer available on the Atos HPCF. You may choose bash, which is now the default for any newly created user, or alternatively ksh is also supported.

See Linux Virtual Desktop VDI: Shells for more information.

No cross-mounted filesystems

The filesystems available (HOME,  PERM, SCRATCH) have the same names and corresponding environment variables, but they are not the same as the ones on older platforms such as the Cray HPC, ECGATE or other Linux Clusters and Workstations. If you need to get to data stored on those platforms, you will need to copy it over.

See Linux Virtual Desktop VDI: Filesystems and Linux Virtual Desktop VDI: File transfers for more information.

New filesystem structure

You will notice your filesystems have now a flatter and simpler name structure. If you port any scripts or code that had paths to filesystems hardcoded from older platforms, please make sure you update them. Where possible, try and use the environment variables provided, which should work on both sides pointing to the right location in each case.

See Linux Virtual Desktop VDI: Filesystems for more information.

Improved module system

TEMS uses Lmod as the module infrastructure, and it brings a number of improvements and features. You will see that most basic commands for common tasks such as load, list or unload are the same, so in the majority of cases no action is needed. However, you may find some cosmetic changes in how module reports what is done. Modules are now less verbose and only generate output on failure and if other active modules are being modified.

Another key difference is that module avail will only display the modules that can be loaded depending on the active compiler/mpi environment (prgenv). If you can't find a module there, try module spider

See The new module system: Lmod for more information.

Old and deprecated software

If you are using a legacy software package, check if it is being discontinued in Bologna - New Data Centre

Even if a certain package is available, you may still need to adapt your scripts or programs to more recent versions. Only the default versions or newer for a number of software packages and libraries on other systems in Reading will be made available.

Some environment variables corresponding to legacy packages may no longer be defined by the corresponding module.  For example, the GRIB_API_* environment variables are not exported in the ecmwf-toolbox module, but you may use ECCODES_* equivalents.

See Linux Virtual Desktop VDI: Software stack for more details.

New location for some ECMWF packages and libraries: ecmwf-toolbox

You may find that some modules that you used to load for ECMWF packages such as ecCodes,  Magics or Metview are no longer available. Please note they have been bundled together for a greater inter-compatibility in the ecmwf-toolbox module. You can just replace any loads of the old modules by a new load of ecmwf-toolbox.

See Linux Virtual Desktop VDI: ECMWF software and libraries for more details.

No Python 2 support

Python 2 reached End Of Life on the 1st of January 2020. Although there is a version of Python 2.7 installed as part of the Operating System, it does not contain any extra modules and it may not be sufficient for your needs. You must make sure your Python programs can use version 3, which is indeed supported. Currently you may choose between the traditional ECMWF maintained Python installation in the python3 module, or use conda for greater environment customisation.

See Linux Virtual Desktop VDI: Python support for more details.

Moving from ECGATE or Linux Clusters

Batch system differences

Slurm is the batch system on ATOS HPCF, so writing, submitting and managing jobs should feel very familiar. However, note that the queue names are different, so if porting existing jobs from older platforms pay attention to those. If you just want to run a simple serial job, your default queue would be enough.

The helper command sqos is not available, but you can get the same information using other commands such as squeue or sacctmgr.

See Linux Virtual Desktop VDI: Batch system for more details.

Moving from Cray XC40 - CCA / CCB

Batch system differences

Slurm is the batch system on ATOS HPCF, so you will need to translate your PBS job headers and get used to a new set of commands for your batch job management.

Main command line tools

The table summarises the main Slurm user commands and their PBS equivalents.

...

User commands

...

PBS

...

Queues

The queue names are very similar to those on Cray, but note that the serial queue ns has been merged into the fractional nf queue. 

Job geometries

The node configuration in terms of number of cores and memory per core changes in respect to the Cray XC40. If running parallel workloads, make sure you take into account the Atos HPCF node configuration to efficiently use the allocated resources.

Panel
titleExample

If your parallel job on Cray explicitly requests 72 total tasks and 36 tasks per node, that would effectively use 2 Cray nodes and all it's physical cores. Running with the same geometry on Atos HPCF would use 2 nodes as well. However, you would be only using 36 of the 128 physical cores in each node, wasting 92 of them per node.

Directives

The table summarises the main Slurm directives and their PBS equivalents.

...

PBS

...

Slurm

...

Description

...

Default

...

-l EC_billing_account=<account>

...

--account=<account>

-A <account>

...

-l EC_job_name=<job_name>

...

--job-name=<name>

-J <name>

...

-o <path>

...

--output=<path>

-o <path>

...

-e <path>

...

--error=...

-e <path>

...

-q <queue>

...

--qos=<qos>

-q <qos>

...

-l walltime=<hh:mm:ss>

...

--time=<time>

-t <time>

...

Wall clock limit of the job. Note that this is not cpu time limit

The format can be: m, m:s, h:m:s, d-h, d-h:m or d-h:m:s

...

-l EC_total_tasks=<tasks>

...

--ntasks=<tasks>

-n <tasks>

...

-l EC_nodes=<nodes>

...

--nodes=<nodes>

-N <nodes>

...

-l EC_threads_per_task=<threads>

...

--cpus-per-task=<threads>

--c <threads>

...

-l EC_tasks_per_node=<tasks>

...

--ntasks-per-node=<tasks>

...

--threads-per-core=<threads>

...

--threads-per-core=<threads>

...

-l EC_hyperthreads=2 / 1

...

--hint=[no]multithread

...

-l EC_memory_per_task=<memory>

...

--mem-per-cpu=<mem>

...

-V

--export=<vars>

...

Export variables to the job, comma separated entries of the form VAR=VALUE.

ALL means export the entire environment from the submitting shell into the job.

NONE means getting a fresh session.

...

NONE

See Linux Virtual Desktop VDI: Batch system for more details.

...

groupecmwf

Moving from Linux Workstations

If you use your Linux Workstation at ECMWF to run some heavy computations, please note you will not be able to run those on the interactive login node. As those nodes are shared, limits are in place to guarantee a fair usage by all users.

You should aim to get those workloads in scripts you may run as batch jobs, where you can request all the resources needed, and they will be allocated to perform those tasks.

...

.