Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Example: For tasks under suite “s1” add a zombie attribute, such that child label commands(i.e ecflow_client –label) never blocks the job: (not strictly needed as this is the default behaviour in from release 4.0.5 onwards)

  • python

    s1 = ecflow.Suite('s1')
    child_list = [ ChildCmdType.label ]
    zombie_attr = ZombieAttr(ZombieType.ecf, child_list, ZombieUserActionType.fob, 300)
    s1.add_zombie(zombie_attr)
    
  • text

    suite s1
       zombie ecf:fob:label:
  • alter
         ecflow_client --alter add zombie "ecf:fob:label:"  /s1

Example: For tasks under suite “s1” add a zombie attribute, such that job that issues the child commands( event, meter, label) never blocks: (not strictly needed as this is the default behaviour in from release 4.0.5 onwards)

  • python

    s1 = ecflow.Suite('s1')
    child_list = [ ChildCmdType.label, ChildCmdType.event, ChildCmdType.meter ]
    zombie_attr = ZombieAttr(ZombieType.ecf, child_list, ZombieUserActionType.fob, 300)
    s1.add_zombie(zombie_attr)
    
  • text

    suite s1
       zombie ecf:fob:label,event,meter:
  • alter
         ecflow_client --alter add zombie "ecf:fob:label,event,meter:"  /s1

...

Sometimes zombies can arise for more obscure reasons. i.e The job sends a --init message to the server, meanwhile the server is busy(i.e processing jobs), when finally the server makes the task active, and sends a message back to the client/job the ecflow_client has timed out. This causes the ecflow_client to send the same message again. However this time the server treats the command as a zombie, since the task is already active. hence we get these false zombies.

These scenario's are very rare, but tends to happen, for the following situations:

  • High disk latencies  (i.e  Check pointing takes a lot of time, or job processing take to long.   tends to happen Typically happens when using virtual machines, with non local data)
  • very large scripts ( i.e in the megabytes), this can inflate the server memory, and cause job processing to take longer.
  • Extremely large definitions, which are requested by many users, via the GUI. (  The download size, can be reduced, by only requesting the suite you are interested in)
  • Very busy machine, where the server is competing for the resources, and where server is overloaded.

To diagnose these cases, we need to look at the log file. Typically you will see two or more child commands (--init/complete commands), where the second will then be treated as a zombie.

...