Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Horizontal Navigation Bar


Button Group

Button Hyperlink
titlePrevious
typestandard
urlhttps://confluence.ecmwf.int/display/ECFLOW/Limit-families
Button Hyperlink
titleUp
typestandard
urlhttps://softwareconfluence.ecmwf.int/wiki/display/ECFLOW/Advanced+Topics
Button Hyperlink
titleNext
typestandard
urlhttps://confluence.ecmwf.int/display/ECFLOW/Late+Attribute



In the real world, suites can have several thousand tasks. These tasks are not required all the time.

Having a server with an extremely large number of tasks can cause performance issues.

  • The server writes to the checkpoint file periodically. This disk i/o can interfere with job scheduling , when dealing with an excessively large number of tasks.
  • Clients like GUI(ecflow_ui), are also adversely affected by the memory requirements, and slow interactive experience 
  • Network traffic is heavily affected

...

  • Archives suite or family nodes *IF* they have child nodes(otherwise does nothing).
  • Saves the suite/family nodes to disk, and then removes the in-memory child nodes from the definition.
  •  It improves time taken to checkpoint and reduces network bandwidth
  •  If archived node is re-queued or begun, the child nodes are automatically restored
  • The nodes are saved to ECF_HOME/<host>.<port>.ECF_NAME.check, where '/' has been replaced with ':' in ECF_NAME
  • Care must be taken if you have trigger reference to the archived nodes

...

  • ecflow_client --restore=/s1/f1     # restore family /s1/f1
  • ecflow_client  --restore=/s1 /s2  # restore suites /s1 and /s2

Text

Let us modify the suite definition file again. To avoid waiting this exercise will archive immediately.

...

  1. Type in the changes, cp -r f5 lf1; cp -r f5 lf2; cp -r f5 lf3 
  2. Replace the suite definition
  3. Run the suite, you should see nodes getting archived, then restored in ecflow_ui
  4. Experiment with archive and restore in ecflow_ui.
  5. Experiment with archive and restore from the command line.

...

Note

The Autoarchive(0) can take up to one minute to take effect. The server has a 1-minute resolution.


Button Group

Button Hyperlink
titlePrevious
typestandard
urlhttps://confluence.ecmwf.int/display/ECFLOW/Limit-families
Button Hyperlink
titleUp
typestandard
urlhttps://softwareconfluence.ecmwf.int/wiki/display/ECFLOW/Advanced+Topics
Button Hyperlink
titleNext
typestandard
urlhttps://confluence.ecmwf.int/display/ECFLOW/Late+Attribute

...