Skip to end of metadata
Go to start of metadata

Since SMS will not be ported to ECMWF's new Bologna data centre, users are required to migrate their active SMS suites to ecFlow.

Below, you can find an example of migrating a time-critical SMS suite called "cleps_timecrit" to ecFlow which follows the guidance given in the ecFlow documentation.

The conversion is described in the following steps:


  • Copy the suite's files and structure from original directory to test directory under your user name. In this example, the suite's directory name and the suite name is "cleps_timecrit" but user should adjust it to match its real paths:

cp  -r /home/ms/it/zcl/cleps_timecrit/cleps_timecrit/  /perm/us/usbk/ecflow/cleps_timecrit/


  • Go to your created directory:
cd /perm/us/usbk/ecflow/cleps_timecrit/ # enter your real path


  • Copy tools required for this conversion to your home directory:
cp  -r  /home/us/usbk/shared/  ~/sms2ecf/


  • Load SMS and ecFlow modules:

module load sms

module load ecflow


  • Start the ecFlow server

#!/bin/ksh

> ecflow_start.sh


  • Play the SMS suite and export definition file:

export SMS_PROG=916924                                 # SMS server number, specific for each user and must be changed. User can obtain it from sms_server log files.

export SMS_NODE=ecgb11                                  #  host name (ecgb11 in this case)

cdp

login ecgate usbk usbk                                           # usbk usbk are username and password respectively. Instead, enter your user name and password.

play cleps_timecrit.def                                           # cleps_timecrit.def is the name of SMS definition file. Only needed if the suite has not been loaded to SMS server.

begin /cleps_timecrit                                               # cleps_timecrit is the name of SMS suite. Only needed if the suite is not running in SMS.

get cleps_timecrit

show cleps_timecrit > cleps_timecrit_sms.def       # cleps_timecrit_sms.def is now the name of produced suite definition which is ready to be converted to ecFlow.

exit


  •  Replace SMS system variables with ecFlow counterparts by running a simple filter:

sed -f ~/sms2ecf/sms2ecf-min.sed cleps_timecrit_sms.def > cleps_timecrit.ecf   # where cleps_timecrit is a suite name.


  • The same filter can be used to adjust task wrappers (sms to ecf). Change to the directory containing .sms wrappers files and run the following commands:

#!/bin/ksh

files=`find -type f -name "*.sms"  `                            ## all sms wrappers

for f in $files ; do

ecf=$(basename $f .sms).ecf                                  ## ecf task name

sed -f ~/sms2ecf/sms2ecf-min.sed $f > $ecf   ## translate

done


  • In ecFlow, like in SMS, it is also common to have header files that can be shared among multiple tasks.

    Convert SMS header files in task wrappers to ecFlow ones and replace/adjust include files. In this example, there is only one include file that can be replaced without any modification:
SMS: ws_endt.hecFlow: tail.h

%comment

#=============================================

#  Send end-task and exit.

#=============================================

%end

if [[ $HOST = @(cc*) ]]; then

  [[ -L $_running_output ]] && rm -f $_running_output

fi

smscomplete

trap 0;


exit

wait

ecflow_client --complete    # Notify ecFlow of a normal end


# On the Cray HPC, remove the link to the PBS running output

if [[ $HOST = @(cc*) ]]; then

  [[ -L $_running_output ]] && rm -f $_running_output

fi


trap 0                 # Remove all traps

exit 0                 # End the shell



tail.h -  file is already placed in your ~/sms2ecf/

Replacement can be done inside the script wrappers (%ECF_FILES%) directory from a command line:

example from a command line

sed -i 's/ws_endt.h/tail.h/g' *.ecf


In addition, the include files

ws_init_serial.h
ws_init_parallel.h
ws_init_onecgate.h

are also used in this example. One part of these files is shared among all of them and it is often placed into a separate include file: head.h.

It is up to the user to decide on whether to extract this shared part into head.h or to leave it in the original files and only make necessary modifications.

An example of head.h can be also found in ~/sms2ecf/include/ directory.

Regardless of the approach, one should first apply sms2ecf-min.sed to these include files to replace SMS variables by ecFlow ones:

from a command line

sed -f ~/sms2ecf/sms2ecf-min.sed ws_init_serial.h > init_serial.h

sed -f ~/sms2ecf/sms2ecf-min.sed ws_init_parallel.h > init_parallel.h

sed -f ~/sms2ecf/sms2ecf-min.sed ws_init_onecgate.h > init_onecgate.h


Then, replace the SMS part with the corresponding ecFlow part for each include file as presented below:

SMS:

(init_serial.h)

ecFlow:

(init_serial.h)

#==========================================================================

#  Tell sms that it started

#==========================================================================


    smsinit $SMSRID


    ERROR() {

      smsabort

      printenv

      trap 0


          exit

}

trap ERROR 0

trap '{ echo "Killed by a signal"; ERROR ; }' \

     1 2 3 4 5 6 7 8 10 12 13 14 15 24 30


if [[ $HOST = @(cc*) ]]; then

  _real_pbs_outputfile=/var/spool/PBS/spool/${PBS_JOBID}.OU

  _pbs_outputfile=/nfs/moms/$HOST${_real_pbs_outputfile}

  _running_output=${SMSJOBOUT}.running

  ln -sf $_pbs_outputfile $_running_output

fi

set -ex



module load ecflow/%ECF_VERSION%

#==========================================================================

#  Tell ecFlow we have started

#==========================================================================

export ECF_RID=$$ # allow all child to know process_id, for better zombie detection

ecflow_client --init=$$


# On the Cray HPC link the output to the PBS output file

if [[ $HOST = @(cc*) ]]; then

  _real_pbs_outputfile=/var/spool/PBS/spool/${PBS_JOBID}.OU

  _pbs_outputfile=/nfs/moms/$HOST${_real_pbs_outputfile}

  _running_output=%ECF_JOBOUT%.running

  ln -sf $_pbs_outputfile $_running_output

fi


# Defined a error handler

ERROR() {

   set +e                      # Clear -e flag, so we don't fail

   wait

   ecflow_client --abort=trap   # Notify ecFlow that something went wrong, using 'trap' as the reason

   trap - 0 $SMS_SIGNAL_LIST   # Remove the traps

   echo "The environment was:"

   printenv | sort

   exit 0                      # End the script

}


# Trap any signal that may cause the script to fail

case $ARCH in

  hpia64 ) export SMS_SIGNAL_LIST='1 2 3 4 5 6 7 8 10 12 13 15 24 30 33';;

  ibm_power* ) export SMS_SIGNAL_LIST='1 2 3 4 5 6 7 8 10 12 13 15 24 30';;

  rs6000 ) export SMS_SIGNAL_LIST='1 2 3 4 5 6 7 8 10 12 13 15 24 30';;

  linux ) export SMS_SIGNAL_LIST='1 2 3 4 5 6 7 8 13 15 24 31';;

  *) export SMS_SIGNAL_LIST='1 2 3 4 5 6 7 8 13 15 24 31';;

esac


for signal in $SMS_SIGNAL_LIST

do

  name=$(kill -l $signal)

  trap "{ echo \"Signal $name ($signal) received \"; trap - 0 $SMS_SIGNAL_LIST ; ERROR ; }" $signal

done


# Trap any calls to exit and errors caught by the -e flag


trap "{ echo \"Signal EXIT (0) received \"; trap - 0 $SMS_SIGNAL_LIST ; ERROR ; }" 0


trap


set -ex


As already said, another option would be to discard the left (SMS) column presented in the table above from all init*.h and add it to a separate include file 'head.h'. Then you should include this additional include file to all task wrappers.

example from a command line

sed -i "/%include <init_serial.h>/a%include <head.h>" *.ecf

sed -i "/%include <init_parallel.h>/a%include <head.h>" *.ecf

sed -i "/%include <init_onecgate.h>/a%include <head.h>" *.ecf

  • All tasks with a child command calling CDP directly must be rewritten to use ecflow_client instead on which ecFlow relies. In “cleps_timecrit”, CDP was used to get family/task status and to force task to complete:

CDP command

cdp << ENDCDP2

set  SMS_PROG 170290

login ecgate $USER 1

status -a

exit

ENDCDP2

This is replaced with its ecFlow counterpart (ecflow_client) where ECF_PORT variable is used to determine whether job runs under ecFlow or not:

ecFlow command (universal for both: SMS and ecFlow)

if [ %ECF_PORT:0% -gt 0 ] ; then

ecflow_client --status /cleps_timecrit

else

cdp << ENDCDP2

set SMS_PROG 170290

login ecgate $USER 1

status -a

exit

ENDCDP2

fi


And similarly for the "force complete" child command:

ecFlow command (universal for both: SMS and ecFlow)

if [ %ECF_PORT:0% -gt 0 ] ; then

ecflow_client --force complete recursive /%SUITE%/cleps_00

else

cdp << EOF > $tmpf

login -t 60 %SMSNODE% %SMSNAME% %SMSPASS%

if(rc==0) then

echo "Login to %SMSNODE% failed."

exit 1

endif

force -r complete /%SUITE%/cleps_00

exit

EOF

fi


All other child commands inside CDP calls have to be replaced following the same approach. Some of the examples for CDP child commands and their ecFlow equivalents are presented in the table below:

CDP commands ecFlow commands
alterecflow_client --alter
requeueecflow_client --requeue
resumeecflow_client --resume
suspendecflow_client --suspend
runecflow_client --run
force queueecflow_client --force queued

                                                 ..............


  • Search all .ecf scripts for usage of SMS specific commands and replace them with ecFlow counterparts:

    SMSecFlow
    smseventecflow_client –event=
    smslabelecflow_client –label=
    smsmeterecflow_client –meter=
    ......

    The full list can be found at:

    Child commands comparison

  • Check directories and paths. Also, edit source paths under %manual section of .ecf scripts.

  • Convert suite’s .ecf definition file to python file:

    python ~/sms2ecf/def2def.py cleps_timecrit.ecf > cleps_timecrit.py


    The ECF_PORT and ECF_HOST are set automatically to values specific for the user running def2def.py ('18424' and ecgb11 in this case).

    print "loading on ecgb11@18424"

    client = ecf.Client("ecgb11", 18424)

    One can obtain its ECF_PORT number by typing:

    from the command line

    > ECF_PORT=$(($(id -u) + 1500)) > echo $ECF_PORT
  • Adjust paths to ECF_HOME, ECF_INCLUDE, and ECF_FILES in the suite’s python file, in this example cleps_timecrit.py. It can be done from a text editor such as 'vim':

    :%s/\/perm\/us\/usbk\/sms\//\/perm\/us\/usbk\/ecflow\//g          # Note! User should add its own path

    or from command line:

    sed -i 's,/perm/us/usbk/sms/,/perm/us/usbk/ecflow/,g' cleps_timecrit.py        # Note! User should add its own path


  • Set ECF_JOB_CMD and ECF_KILL_CMD commands:

    on ecgate

    ECF_JOB_CMD= '/usr/local/apps/schedule/1.4/bin/schedule %USER% %SCHOST% %ECF_JOB% %ECF_JOBOUT%'

    ECF_KILL_CMD= '/usr/local/apps/schedule/1.4/bin/schedule %USER% %SCHOST% %ECF_RID% %ECF_JOB% %ECF_JOBOUT% kill'

    ECF_STATUS_CMD='/usr/local/apps/schedule/1.4/bin/schedule %USER% %HOST%  %ECF_RID% %ECF_JOB% %ECF_JOBOUT% status'

  • Set ECF_LOGPORT variable to a proper number:

    from the command line

    > ECF_LOGPORT=$((35000 + $(id -u)))
    > echo $ECF_LOGPORT


  • Initial testing

The suite’s definition file can be tested without actually running the ecf scripts, see How can I test a definition without writing scripts.

To perform the test, copy the entire suite structure to a test directory:

cp -r ecflow/cleps_timecrit/ ecflow/test_cleps_timecrit/  

          Go to the test directory and edit python file cleps_timecrit.py:

          Change the suite name from cleps_timecrit to test_cleps_timecrit to avoid overlapping with the original suite:        

cleps_timecrit.py

suite0 = Suite('cleps_timecrit')                        ->                        suite0 = Suite('test_cleps_timecrit')

         And also, 'comment' part of the script which uploads the suite to the ecFlow server because at this point we just want to create the suite definition file to test it and not to upload the entire suite.

cleps_timecrit.py     (before)cleps_timecrit.py     (after)
if __name__ == '__main__':
    defs = Defs()
    defs.add(suite0);
    defs.auto_add_externs(True)
    if 0:
      import cli_proc, ecf
      cli_proc.process(ie.Seed(defs), compare=False)
    else:
        print "loading on localhost@18424"
        client = ecf.Client("localhost", 18424)
        client.replace("/%s" % suite0.name(), defs)
if __name__ == '__main__':
    defs = Defs()
    defs.add(suite0);
    defs.auto_add_externs(True)
#    if 0:
#      import cli_proc, ecf
#      cli_proc.process(ie.Seed(defs), compare=False)
#    else:
#        print "loading on localhost@18424"
#        client = ecf.Client("localhost", 18424)
#        client.replace("/%s" % suite0.name(), defs)

        

        Now, it is safe to run the python file in the test directory.

python cleps_timecrit.py

        Which will produced the definition file test_cleps_timecrit.def


Finally, use that definition file to run the test:

~/sms2ecf/test_bench.py test_cleps_timecrit.def --port 18424   # Do not forget to add your own ecFlow port number

Track the progress using the ecFlowUI.



  • If the initial test was successfully, the ecFlow definition file can be created and the ecFlow suite uploaded to the server by running:

cd ecflow/cleps_timecrit/

python cleps_timecrit.py

This will start the full suite.


There is also an alternative way of uploading the suite to the server from a definition file. load.sh script can be used to upload the suite to ecFlow server and start it.

load.sh

#!/bin/bash

set -xe

export ECF_PORT=18424                                                                      # ECF_PORT number is specific for each user

ecflow_client --load=cleps_timecrit.def                                            # cleps_timecrit is the name of the suite in this example

#ecflow_client --replace=/cleps_timecrit cleps_timecrit.def

ecflow_client --begin cleps_timecrit


There is no content with the specified labels




1 Comment

  1. Thanks Helen.
    I fixed it. It wasn't meant to be a link.