A zombie is a running job that fails authentication when communicating with the ecflow_server
How are zombies created ?
- The node tree is deleted, replaced or reloaded whilst jobs are running
- A task is rerun, whilst in a submitted or active state
- A job is forced to new state, i.e complete
More rarer causes might be:
- ecf script errors, where we have multiple calls to init and complete child command s
- The child command s in the ecf script are placed in the background. In this case order in which the child command contact the server, may be indeterminate.
- Load leveler submitting a job twice
- Server crash and recovered check point file is out of date
- Machine crash
How can zombie’s be handled ?
The default behaviour is to block the job.
ecflowview provides a dialog which lists all the zombies and the actions that can be taken. These include:
Terminate:
The child command is asked to fail.Depending on your scripts,this may cause the abort child command to be called.Which again will be flagged as a zombie.Fob:
Allow the job to continue. The child command completes and hence no longer blocks the job.
Great care should be taken when this action is chosen.If we have two jobs running, they may cause data corruption.Even when we have a single job, issues can arise.i.e if the associated command was an event child command, then theit would never evaluate.Delete:
Remove the zombie from the server. The job will continue blocking, henceIf the job is killed manually, then this option can be used.Rescue:
Adopt the zombie and update the node tree.The ECF_PASS on the zombie is copied over to the task, so that the nextchild command will continue as normal.Kill:
Applies the kill command (ECF_KILL_CMD ) using the process id stored on the zombie.If the script has correct signal trapping, this should end up calling abort.Note: path zombies will need to be killed manually.
Warning
Of the four action above, only Rescue will allow child command to change the state of the node tree.
What to do:
- Create a zombie by starting a task, and setting it to complete immediately via ecflowview
- Inspect the log file, it will show you how the zombie has arisen.
- Inspect the zombie dialog in ecflowview (right mouse button selection on the host node)
- Experiment with the different actions on the zombie
- Select host node and invoke the option... menu selection. Select the Zombies button. This enables zombie notification via window pop up
A zombie is a running job that fails authentication when communicating with the ecflow_server
How are zombies created ?¶
- The node tree is deleted, replaced or reloaded whilst jobs are running
- A task is rerun, whilst in a submitted or active state
- A job is forced to new state, i.e complete
More rarer causes might be:
- ecf script errors, where we have multiple calls to init and complete child command s
- The child command s in the ecf script are placed in the background. In this case order in which the child command contact the server, may be indeterminate.
- Load leveler submitting a job twice
- Server crash and recovered check point file is out of date
- Machine crash
How can zombie’s be handled ?¶
The default behaviour is to block the job.
ecflowview provides a dialog which lists all the zombies and the actions that can be taken. These include:
Terminate:
The child command is asked to fail.Depending on your scripts,this may cause the abort child command to be called.Which again will be flagged as a zombie.Fob:
Allow the job to continue. The child command completes and hence no longer blocks the job.
Great care should be taken when this action is chosen.If we have two jobs running, they may cause data corruption.Even when we have a single job, issues can arise.i.e if the associated command was an event child command, then theit would never evaluate.Delete:
Remove the zombie from the server. The job will continue blocking, henceIf the job is killed manually, then this option can be used.Rescue:
Adopt the zombie and update the node tree.The ECF_PASS on the zombie is copied over to the task, so that the nextchild command will continue as normal.Kill:
Applies the kill command (ECF_KILL_CMD ) using the process id stored on the zombie.If the script has correct signal trapping, this should end up calling abort.Note: path zombies will need to be killed manually.
Warning
Of the four action above, only Rescue will allow child command to change the state of the node tree.
What to do:
- Create a zombie by starting a task, and setting it to complete immediately via ecflowview
- Inspect the log file, it will show you how the zombie has arisen.
- Inspect the zombie dialog in ecflowview (right mouse button selection on the host node)
- Experiment with the different actions on the zombie
- Select host node and invoke the option... menu selection. Select the Zombies button. This enables zombie notification via window pop up