Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

       ecflow_client --alter=add zombie "ecf:adopt:complete:" /suiteZ

Semi-Automated

Sometimes zombies can arise for more obscure reasons. i.e. The job sends a --init message to the server, meanwhile the server is busy(i.e. processing jobs), when finally the server makes the task active, and sends a message back to the client/job, the ecflow_client has timed out. This causes the ecflow_client to send the same message again. However this time the server treats the child command as a zombie, since the task is already active. Hence we get these false zombies.

These scenario's are very rare, but tends to happen, for the following situations:

...

 

...

To diagnose these cases, we need to look at the log file. Typically you will see two or more child commands (--init/complete), where the second will then be treated as a zombie.

To get round these issue you can add a variable ECF_NONSTRICT_ZOMBIES, which will reduce these false zombies.

       ecflow_client --alter=add variable ECF_NONSTRICT_ZOMBIES 1 /              # adds the variable to the root/server level, and hence affect all suites on the server

       ecflow_client --alter=add variable ECF_NONSTRICT_ZOMBIES 1 /suiteX      # adds the variable at the suite level,, and hence only affects this suite.