ECMWF scheduled a network system session to be held on Wednesday 13 March 2024 which impacted work on the Atos HPC. The session lasted for 4.5 hours from 12:00 UTC to 16:30 UTC.

During the session, there was NO login access to hpc-login and NO user batch jobs submitted from hpc-login or via hpc-batch could run. Batch jobs submitted before the session which were not expected to complete before the sessions started were queued until after the session finished. Time-Critical Option 1 and 2 workloads were not affected and continued to run during the session.

Users with batch jobs to submit on Wednesday morning and which were expected to complete before 12:00 UTC should have ensured an appropriate Wallclock time limit was set via the "#SBATCH --time" directive so that the job could be scheduled to run. Interactive jobs started with the ecinteractive command or running in JupyterHub before the system session may have been terminated, but it was possible to start new ones during the session.

The ECS class of service (ecs-login and ecs-batch) remained available during the system session.

Check the ECMWF Service Status for up-to-date information on the service.

Please accept our apologies for any inconvenience caused. You may get in touch with us if you have any questions or concerns through our ECMWF Support Portal.

  • No labels