ecFlow's documentation is now on readthedocs!


In the real world, suites can have several thousand tasks. These tasks are not required all the time.

Having a server with an extremely large number of tasks can cause performance issues.

  • The server writes to the checkpoint file periodically. This disk i/o can interfere with job scheduling when dealing with an excessively large number of tasks.
  • Clients like GUI(ecflow_ui), are also adversely affected by the memory requirements, and slow interactive experience 
  • Network traffic is heavily affected

This is where autoarchive becomes useful.

autoarchive example
autoarchive +01:00 # archive one hour after complete
autoarchive 01:00  # archive at 1 am in morning after complete
autoarchive 10     # archive 10 days after complete
autoarchive 0      # archive immediately after complete, can take up to a minute

autoarchive will write a portion of the definition to disk.

  • Archives suite or family nodes *IF* they have child nodes(otherwise does nothing).
  • Saves the suite/family nodes to disk, and then removes the in-memory child nodes from the definition.
  •  It improves time taken to checkpoint and reduces network bandwidth
  •  If archived node is re-queued or begun, the child nodes are automatically restored
  • The nodes are saved to ECF_HOME/<host>.<port>.ECF_NAME.check, where '/' has been replaced with ':' in ECF_NAME
  • Care must be taken if you have trigger reference to the archived nodes

Use  ecflow_client --archive to archive manually

  • ecflow_client --archive=/s1                       # archive suite s1
  • ecflow_client --archive=/s1/f1 /s2            # archive family /s1/f1 and suite /s2
  • ecflow_client --archive=force /s1 /s2      # archive suites /s1,/s2 even if they have active tasks

Autorestore can also be done automatically, but is only applied when a node completes.

To restore archived nodes manually use : 

  • ecflow_client --restore=/s1/f1     # restore family /s1/f1
  • ecflow_client  --restore=/s1 /s2  # restore suites /s1 and /s2

Text

Let us modify the suite definition file again. To avoid waiting this exercise will archive immediately.

# Definition of the suite test.
suite test
 edit ECF_INCLUDE "$HOME/course"
 edit ECF_HOME    "$HOME/course"
 edit SLEEP 20
 family lf1
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family lf2
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family lf3
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family restore
    trigger ./lf1<flag>archived and ./lf2<flag>archived and ./lf3<flag>archived 
    task t1
       autorestore ../lf1 ../lf2 ../lf3.   # restore when t1 completes
 endfamily
endsuite

Python

$HOME/course/test.py
import os
from ecflow import Defs,Suite,Family,Task,Edit,Trigger,Complete,Event,Meter,Time,Day,Date,Label, \
                   RepeatString,RepeatInteger,RepeatDate,InLimit,Limit,Autoarchive,Autorestore
         
def create_family(name) :
    return Family(name, 
                  Autoarchive(0),
                  [ Task('t{}'.format(i)) for i in range(1,10) ] )

def create_family_restore() :
    return Family("restore",
                 Trigger("./lf1<flag>archived and ./lf2<flag>archived and ./lf3<flag>archived"),
                 Task('t1', Autorestore(["../lf1","../lf2","../lf3"])))
     
print("Creating suite definition") 
home = os.path.join(os.getenv("HOME"),"course")
defs = Defs(
        Suite("test",
            Edit(ECF_INCLUDE=home,ECF_HOME=home,SLEEP=20),
            create_family("lf1"),create_family("lf2"),create_family("lf3"),
            create_family_restore()
        )
      )
print(defs)
 
print("Checking job creation: .ecf -> .job0") 
print(defs.check_job_creation())
 
print("Checking trigger expressions and inlimits")
assert len(defs.check()) == 0,defs.check()
 
print("Saving definition to file 'test.def'")
defs.save_as_defs("test.def")

What to do

  1. Type in the changes, cp -r f5 lf1; cp -r f5 lf2; cp -r f5 lf3 
  2. Replace the suite definition
  3. Run the suite, you should see nodes getting archived, then restored in ecflow_ui
  4. Experiment with archive and restore in ecflow_ui.
  5. Experiment with archive and restore from the command line.

The Autoarchive(0) can take up to one minute to take effect. The server has a 1-minute resolution.