Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • ecflow_client –zombie_get
    This will list all the zombies in the server.
  • ecflow_client –zombie_fail <task-path>
    Ask the zombie to fail. This may result in another zombie because abort child command in the job, will be called.
  • ecflow_client –zombie_fob <task-path>
    Used to unblock the child, allows the job to proceed. However this will only work for zombies where the password does not match.
  • ecflow_client –zombie_adopt <task-path>
    Copies the password stored on the zombie onto the task. Allows the job to proceed, and update the state in the server( i.e due to init,complete,abort). It is up to the user, to ensure that the zombie has been dealt with  before doing this.
  • ecflow_client –zombie_remove <task-path>
    Remove the zombie representation in the server. Typically this is done, when we are sure we have handled the zombie. The zombie will re-appear next time it communicates with server, if this is not the case.
  • ecflow_client –zombie_block <task-path>
    Ask the jobs to block at the child command in the job. Prevents the job from proceeding. (This is the default behaviour for the init, complete and abort child commands)

Sometimes we may want the job to proceed but  "ecflow_client –zombie_adopt <task-path>" does not work. i.e we have the case where zombies password matches, but the process id (ECF_RID) are different.

ecflow_client –zombie_adopt <task-path>, will not allow this, due to the potential for data corruption.  In this case the normal behaviour would be kill both process, and re-queue the task.

In the extreme, we can by pass the authentication. (i.e allowing the request to be handled by the server). This SHOULD only be done when user is sure they handle the zombie and they do not want to re-queue the job.

    > ecflow_client --alter add variable ECF_PASS FREE  < path to task>

This is also available from the GUI. Select the task. RMB->Special-> Free password.

After the job has completed, be sure to delete this variable. Otherwise if zombies arise again, there is a considerable risk of data corruption.

Automated

It is also possible to ask ecflow_server to make the same response in an automated fashion. How ever very careful consideration should be made before doing this. Otherwise it could mask a serious underlying problem.

...