Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

How can zombie’s be handled ?

The default behaviour for init, complete, abort and wait child commands, is to block the job, and for event, label, meter to continue(fob). (from version 4.0.4, previously all zombie, child commands, blocked)

The child command continues attempting to contact the ecflow_server.
This is done for period of 24 hours. (This period is configurable see ECF_TIMEOUT on ecflow_client).
The jobs can also configured, so that if the server denies the communication, then
the child command can be set to fail immediately. (See ECF_DENIED on ecflow_client)

 ecflow_ui provides a tab which lists all the zombies and the actions that can be taken.

Note

The zombies tab is shown, in the info panel when the server node( i.e top most) is selected.

The actions include:

  • Terminate:

    The child command is asked to fail.
    Depending on your scripts,this may cause the abort child command to be called.
    Which again will be flagged as a zombie.
  • Fob:

    Allow the job to continue. The child command completes and hence no longer blocks the job.

    Great care should be taken when this action is chosen.
    If we have two jobs running, they may cause data corruption.
    Even when we have a single job, issues can arise.
    i.e. if the associated command was an event child command, then the
    event would not be set. If this event was used in a trigger expression,it would never evaluate.
  • Delete:

    Remove the zombie from the server. The job will continue blocking, hence
    when the child command next contacts the ecflow_server, the zombie will re-appear.
    If the job is killed manually, then this option can be used.
  • Rescue:

    Adopt the zombie and update the node tree.
    The unique password(ECF_PASS) on the zombie is copied over to the task, so that the next child command will continue as normal.
    This should only be used when the user is sure there are no additional jobs.
  • Kill:

    Applies the kill command (ECF_KILL_CMD ) using the process id stored on the zombie.
    If the script has correct signal trapping, this should end up calling abort.
    Note: path zombies will need to be killed manually.

 

Warning
Of the four action above, only Rescue will allow child command to change the state of the node tree.

 

What to do

  1. Create a zombie by starting a task, and setting it to complete immediately via  ecflow_ui
  2. Inspect the log file, it will show you how the zombie has arisen.
  3. Inspect the zombie tab in ecflow_ui (select the host node, then select the zombies tab)
  4. Experiment with the different actions on the zombie

        

...