Step-by-step guide

There are several ways to do this.

If the tasks are not very robust and are known to fail regularly then, we can have a custom ERROR trapping, which instead of aborting the task, log the aborts, and sets the task to complete. The logging could be anything

# Defined a error handler
ERROR() {
    echo "ERROR called"
    set +e                     # Clear -e flag, so we don't fail
    wait                       # wait for background process to stop

    # Record the failure in the log file for later analysis
    ecflow_client --msg="ERROR task %ECF_NAME% failed"

    ecflow_client --complete   # replace abort with a complete
    trap 0                     # Remove the trap
    exit 0                     # End the script
}

Have a special task whose job is to monitor failure in the other tasks. This task will then log the failures and then automatically set the family/repeat to complete.

suite suite
  family main
    repeat date YMD 20170101 20180101 1
    task dodgy    # this task may fail
    task ok
    task fix                                          # handle failures, so repeat will advance even if other tasks fail
      complete dodgy == complete and ok  == complete  # If there are no failures, complete fix, so repeat will advance
      time 23:30                                      # run at 23:30, allowing users to address task dodgy otherwise automatically advance the REPEAT
  endfamily
endsuite
fix.ecf
# task fix.ecf
%include <head.h>

trap 0
ecflow_client --force=complete recursive %SUITE%/%FAMILY%
exit 0

%include <tail.h>