Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Under the Framework for time-critical applications Member States can run ecFlow suites monitored by ECMWF. Known as the option 2 within that framework, they enjoy a special technical setup to maximise the robustness and high availability similar to ECMWF's own operational production. When moving from a standard user account to a time-critical one (typically starting with a "z"followed by two or three characters) there are a number of things you must be aware of:.

Table of Contents

Special filesystems

...

Time critical option 2 users, or zids, have a special set of filesystems different from the regular user. They are served from different storage servers in different computing halls, and are not kept in sync automatically. It is the user's responsibility to ensure the required files and directory structures are present on both sides and synchronise them if and when needed. On This means, for example, that zids will have 2 HOMES, one on each Storage Host, zids will have:. All the following storage locations can be referenced by the corresponding environment variables, which will be defined automatically for each session or job.  

File System

Suitable for ...

TechnologyFeaturesQuota
HOMEpermanent files, e. g. profile, utilities, sources, libraries, etc. 

Lustre

(on ws1 and ws2)

  • No Backup
  • No snapshots
  • No automatic deletion
  • Unthrottled I/O bandwidth

100GB

TCWORKpermanent large files. Main storage for your jobs and experiments input and output files.

Lustre

(on ws1 and ws2)

  • No Backup
  • No snapshots
  • No automatic deletion
  • Unthrottled I/O bandwidth

50 TB

SCRATCHDIR

Big temporary data for an individual session or job, not as fast as TMPDIR but higher capacity. Files accessible from all cluster.

Lustre

(on ws1 and ws2)

Deleted at the end of session or job

Created per session/ job

part of TCWORK quota

TMPDIR


Fast temporary data for an individual session or job, small files only. Local to every node.

SSD on shared nodes 

(*f QoSs)

Deleted at the end of session or job

Created per session/ job


3 GB per session/job by default.

Customisable up to 40 GB with 

--gres=ssdtmp:<size>G


RAM on exclusive compute nodes 

(*p QoSs)

no limit (maximum memory of the node)

...


Note

Note that there is no PERM or SCRATCH, and the corresponding environment variables will not be defined.


Info

HOME, TCWORK and SCRATCHDIR are all based on the Lustre Parallel filesystem for maximum reliability. Those will not be accessible from outside the HPCF, including VDI instances or VMs running the ecFlow servers

...