In the real world suites can have several thousand tasks. These tasks are not required all the time.
Having a server with a large number of tasks can cause performance issues.
- The server writes to the checkpoint file periodically. This disk i/o can interfere with job scheduling, when dealing with excessively large number of tasks.
- Clients like GUI(ecflow_ui), are also adversely affected by the memory requirements, and slow interactive experience
- Network traffic is heavily affected
This is where autoarchive becomes useful.
autoarchive example
autoarchive +01:00 # archive one hour after complete autoarchive 01:00 # archive at 1 am in morning after complete autoarchive 10 # archive 10 days after complete autoarchive 0 # archive immediately after complete, can take up to a minute
autoarchive will write a portion of the definition to disk.
- Archives suite or family nodes *IF* they have child nodes(otherwise does nothing).
- Saves the suite/family nodes to disk, and then removes the in memory child nodes from the definition.
- It improves time taken to checkpoint and reduces network bandwidth
- If archived node is re-queued or begun, the child nodes are automatically restored
- The nodes are saved to ECF_HOME/<host>.<port>.ECF_NAME.check, where '/' has been replaced with ':' in ECF_NAME
- Care must be taken if you have trigger reference to the archived nodes
Use ecflow_client --archive to archive manually
- ecflow_client --archive=/s1 # archive suite s1
- ecflow_client --archive=/s1/f1 /s2 # archive family /s1/f1 and suite /s2
- ecflow_client --archive=force /s1 /s2 # archive suites /s1,/s2 even if they have active tasks
Autorestore can also be done automatically, but is only applied when a node completes.
To restore archived nodes manually use :
- ecflow_client --restore=/s1/f1 # restore family /s1/f1
- ecflow_client --restore=/s1 /s2 # restore suites /s1 and /s2
Text
Let us modify the suite definition file again
# Definition of the suite test.
suite test
edit ECF_INCLUDE "$HOME/course"
edit ECF_HOME "$HOME/course"
edit SLEEP 20
family lf1
autoarchive 0
task t1 ; task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
endfamily
family lf2
autoarchive 0
task t1 ; task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
endfamily
family lf3
autoarchive 0
task t1 ; task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
endfamily
family restore
# from ecflow 5.3.2 we can have
# trigger ./lf1<flag>archived and ./lf2<flag>archived and ./lf3<flag>archived
# and there will also be *no* need for the 'edit SLEEP 60'
trigger ./lf1 == complete and ./lf2 == complete and ./lf3 == complete
task t1
edit SLEEP 60 # wait for autoarchive to complete
autorestore ../lf1 ../lf2 ../lf3. # restore when t1 completes
endfamily
endsuite
Python
$HOME/course/test.py
import os
from ecflow import Defs,Suite,Family,Task,Edit,Trigger,Complete,Event,Meter,Time,Day,Date,Label, \
RepeatString,RepeatInteger,RepeatDate,InLimit,Limit,Autoarchive,Autorestore
def create_family(name) :
return Family(name,
Autoarchive(0),
[ Task('t{}'.format(i)) for i in range(1,10) ] )
def create_family_restore() :
return Family("restore",
# from ecflow 5.3.2 we can have
# Trigger("./lf1<flag>archived and ./lf2<flag>archived and ./lf3<flag>archived")
# and there will also be *no* need for the SLEEP=60
Trigger("./lf1 == complete and ./lf2 == complete and ./lf3 == complete"),
Task('t1',
Edit(SLEEP=60),
Autorestore(["../lf1","../lf2","../lf3"])))
print("Creating suite definition")
home = os.path.join(os.getenv("HOME"),"course")
defs = Defs(
Suite("test",
Edit(ECF_INCLUDE=home,ECF_HOME=home,SLEEP=20),
create_family("lf1"),create_family("lf2"),create_family("lf3"),
create_family_restore()
)
)
print(defs)
print("Checking job creation: .ecf -> .job0")
print(defs.check_job_creation())
print("Checking trigger expressions and inlimits")
assert len(defs.check()) == 0,defs.check()
print("Saving definition to file 'test.def'")
defs.save_as_defs("test.def")
What to do
- Type in the changes
- Replace the suite definition
- Run the suite, you should see nodes getting archived, then restored in ecflow_ui
- Experiment with archive and restore in ecflow_ui.
- Experiment with archive and restore from the command line.