Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt
hiddentrue

When ecflow calls the default ECF_JOB_CMD this is spawned as a separate child process. This child process is then monitored for abnormal termination.

When this happens ecflow will call abort, and sets a special flag which prevents ECF_TRIES from working.

Problem

Pure python tasks, the server does not honour ECF_TRIES ?

Solution

When ecflow calls the default ECF_JOB_CMD this is spawned as a separate child process. This child process is then monitored for abnormal termination.

...

This abnormal job termination prevents the aborted job from rerunning. When second process starts running, the task in the server  is already aborted, leading to zombies.


To fix your problem.

Use a dedicated script for job submission. and use a bash/korn shell to invoke your python scripts. Using korn shell trapping for robust error handling. Like above.

Alternatively If you want to stick with pure python tasks, you need to detach from the spawned of process.  Modify your ECF_JOB_CMD

Code Block
edit ECF_JOB_CMD "nohup python $ECF_JOB$ > $ECF_JOBOUT$ 2>&1 &"


Alternatively always make sure your python jobs exits cleanly after calling ecflow abort.  by calling exit(0)

Code Block
 def signal_handler(self,signum, frame):
    print 'Aborting: Signal handler called with signal ', signum
    self.ci.child_abort("Signal handler called with signal " + str(signum));
    sys.exit(0)


Code Block
    def __exit__(self,ex_type,value,tb):
        print "Client:__exit__: ex_type:" + str(ex_type) + " value:" + str(value) + "\n" + str(tb)
        if ex_type != None:
            self.ci.child_abort("Aborted with exception type " + str(ex_type) + ":" + str(value))
            sys.exit(0)
            return False
        self.ci.child_complete()
        return False 


Content by Label
showLabelsfalse
max5
spaces~usa
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel = "kb-troubleshooting-articleecflow-faqs" and label = "python" and type = "page" and space = "UDOC"
labelskb-troubleshooting-article

...