Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents

Missing Features

Comprehensive software stack

We have provided a basic software stack that should satisfy most users, but some software packages or libraries you require may not be present. If that is the case, let us know by reporting as a "Problem on computing" through the ECMWF Support Portal mentioning "Atos HPCF" in the summary.

End of job information

A basic report is provided at the end of the job with information about its execution.

No Format
## INFO[ECMWF-INFO -ecepilog] --------------------------------------------------------------------------------------------- 
## [ECMWF-INFO -ecepilog] This is the ECMWF job Epilogue.
[ECMWF-INFO -ecepilog] +++ Please report problems to ServiceDesk, servicedesk@ecmwf.int
## INFO issues using the Support portal +++
[ECMWF-INFO -ecepilog] +++ https://support.ecmwf.int                     +++
[ECMWF-INFO -ecepilog] ---------------------------------------------------------------------------------------------
## INFO
## INFO[ECMWF-INFO -ecepilog]
[ECMWF-INFO -ecepilog] Run at 2021-09-28T06:21:25 on aa
## INFO[ECMWF-INFO -ecepilog] Job Name                  : eci
## INFO[ECMWF-INFO -ecepilog] Job ID                    : 1009559
## INFO[ECMWF-INFO -ecepilog] Submitted                 : 2021-09-28T06:05:23
## INFO[ECMWF-INFO -ecepilog] Dispatched                : 2021-09-28T06:05:23
## INFO[ECMWF-INFO -ecepilog] Completed                 : 2021-09-28T06:21:25
## INFO[ECMWF-INFO -ecepilog] Waiting in the queue      : 0.0
## INFO[ECMWF-INFO -ecepilog] Runtime                   : 962
## INFO[ECMWF-INFO -ecepilog] Exit Code                 : 0:0
## INFO[ECMWF-INFO -ecepilog] State                     : COMPLETED
## INFO[ECMWF-INFO -ecepilog] Account                   : myaccount
## INFO[ECMWF-INFO -ecepilog] Queue                     : nf
## INFO[ECMWF-INFO -ecepilog] Owner                     : user
## INFO[ECMWF-INFO -ecepilog] STDOUT                    : slurm-1009559.out
## INFO[ECMWF-INFO -ecepilog] STDERR                    : slurm-1009559.out
## INFO[ECMWF-INFO -ecepilog] Nodes                     : 1
## INFO[ECMWF-INFO -ecepilog] Logical CPUs              : 8
## INFO[ECMWF-INFO -ecepilog] SBU                       : 20.460 units
## INFO

[ECMWF-INFO -ecepilog]


There is no charge made to the project accounts for any SBUs used on the Atos HPCF system until it becomes operational.
Warning
iconfalse
  • We are unable to provide a figure for the memory used at this time.

Alternatively, you may use sacct to get some of the statistics from SLURM once the job has finished.

Connectivity

...

.

...

...

Filesystems

PERM is temporarily supported by Lustre (no backups, no snapshots), but in the future will be provided by external NFS services. Once ready, the contents would be migrated without the user intervention.

The select/delete policy in SCRATCH has not been enforced yet.

See HPC2020: Filesystems for all the details.

prepIFS

The prepIFS environment for running IFS experiments on Atos is still under development, and is not yet ready for general use. A further announcement will be forthcoming where users will be invited to start running prepIFS experiments on Atos and migrate their workflow.

ECACCESS and Time-Critical Option 1 (and 2) features

The ECACCESS web toolkit services, such as the job submission, including Time-Critical Option 1 jobs, file transfers and ectrans have not been set up yet to use the Atos HPCF.

Time-Critical Option 2 users enjoy a special setup with additional redundancy in terms of filesystems to minimise the impacts of failures or planned maintenances. However, this has not been finalised yet so we would recommend not to start using these accounts until the configuration is complete.

ECFlow service

While the availability of virtual infrastructure to run ecFlow servers remains limited, you may start your ecFlow servers in the interim HPCF dedicated node to be able to run your suites. 

At a later stage, those ecFlow servers will need to be moved to dedicated Virtual Machines outside the HPCF, where practically no local tasks will be able to run. All ecFlow tasks will need to be submitted to one of the HPCF complexes through the corresponding Batch system.

Please do keep that in mind when migrating or designing your solution.

See HPC2020: Using ecFlow for all the details.

Known issues

Intel MKL

...

greater than 19.0.5: performance issues on AMD chips

Recent versions of MKL do not use the AVX2 kernels for certain operations on non-intel chips, such as the AMD Rome on TEMSour HPCF. The consequence is a significant drop in performance.