If you find any problem or any feature missing that you think should be present, and it is not listed here, please let us know by reporting as a "Problem on computing" through the ECMWF Support Portal mentioning "Atos" in the summary.
Atos HPCF is not operational platform yet, and many features or elements may be gradually added as complete setup is finalised. Here is a list of the known limitations, missing features and issues.
Missing Features
Comprehensive software stack
We have provided a basic software stack that should satisfy most users, but some software packages or libraries you require may not be present. If that is the case, let us know by reporting as a "Problem on computing" through the ECMWF Support Portal mentioning "Atos HPCF" in the summary.
End of job information
A basic report is provided at the end of the job with information about its execution.
## INFO --------------------------------------------------------------------------------------------- ## INFO This is the ECMWF job Epilogue. Please report problems to ServiceDesk, servicedesk@ecmwf.int ## INFO --------------------------------------------------------------------------------------------- ## INFO ## INFO Run at 2021-09-28T06:21:25 on aa ## INFO Job Name : eci ## INFO Job ID : 1009559 ## INFO Submitted : 2021-09-28T06:05:23 ## INFO Dispatched : 2021-09-28T06:05:23 ## INFO Completed : 2021-09-28T06:21:25 ## INFO Waiting in the queue : 0.0 ## INFO Runtime : 962 ## INFO Exit Code : 0:0 ## INFO State : COMPLETED ## INFO Account : myaccount ## INFO Queue : nf ## INFO Owner : user ## INFO STDOUT : slurm-1009559.out ## INFO STDERR : slurm-1009559.out ## INFO Nodes : 1 ## INFO Logical CPUs : 8 ## INFO SBU : 20.460 units ## INFO
Alternatively, you may use sacct
to get some of the statistics from SLURM once the job has finished.
Connectivity
- Direct access to the Atos HPCF through ECACCESS or Teleport is not yet available. See HPC2020: How to connect for more information.
- SSH connections to/from VMs in Reading running ecFlow servers are not available. For more details on ecFlow usage, see HPC2020: Using ecFlow.
- Load balancing between Atos HPCF interactive login nodes is not ready yet. When implemented, an ssh connection into the main alias for the HPCF may create a session in an arbitrary login node.
Filesystems
PERM is temporarily supported by Lustre (no backups, no snapshots), but in the future will be provided by external NFS services. Once ready, the contents would be migrated without the user intervention.
The select/delete policy in SCRATCH has not been enforced yet.
See HPC2020: Filesystems for all the details.
prepIFS
The prepIFS environment for running IFS experiments on Atos is still under development, and is not yet ready for general use. A further announcement will be forthcoming where users will be invited to start running prepIFS experiments on Atos and migrate their workflow.
ECACCESS and Time-Critical Option 1 features
The ECACCESS web toolkit services, such as the job submission, including Time-Critical Option 1 jobs, file transfers and ectrans have not been set up yet to use the Atos HPCF. The ECACCESS toolkit has not been
Time-Critical Option 2
Time-Critical Option 2 users enjoy a special setup with additional redundancy in terms of filesystems to minimise the impacts of failures or planned manitenances. However, this has not been finalised yet so we would recommend not to start using these accounts until the configuration is complete.
Known issues
Intel MKL > 19.0.5 performance issues on AMD chips
Recent versions of MKL do not use the AVX2 kernels for certain operations on non-intel chips, such as the AMD Rome on TEMS. The consequence is a significant drop in performance.