You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

While the availability of virtual infrastructure to run ecFlow servers remains limited, you may start your ecFlow servers in the interim HPCF dedicated node to be able to run your suites. 

At a later stage, those ecFlow servers will need to be moved to dedicated Virtual Machines outside the HPCF, where practically no local tasks will be able to run. All ecFlow tasks will need to be submitted to one of the HPCF complexes through the corresponding Batch system.

Please do keep that in mind when migrating or designing your solution.

Starting the ecFlow server

The server needs to be started using the usual procedure on one of the AA login nodes, not through an interactive job.

module load ecflow troika
ecflow_start.sh <options>

You may wish to pass extra options to configure the port or your ecflow home. 

Preparing your suites and tasks

Remember that all tasks will need to be submitted as jobs through the batch system, so you should avoid running tasks locally on the node where the server runs. Make sure that your task header contains the necessary SBATCH directives to run the job. As a minimum:

head.h snipped
#!/bin/bash
#SBATCH --job-name=%ECF_JOB%
#SBATCH --qos=%QUEUE%
#SBATCH --output=%ECF_JOBOUT%
#SBATCH --error=%ECF_JOBOUT%

You may need to add more directives for parallel jobs to define the resources needed. See HPC2020: Batch system for more examples and potential options you may wish to include.

ecFlow delegates the job management tasks such as submission, kill or monitor the status to external applications. For your convenience, you may use troika, a tool that will take care of those tasks. To use it, just make sure you have the following variables defined at the suite level:

Job management variables in your suite.def
edit QUEUE nf
edit SCHOST aa
edit ECF_JOB_CMD troika submit -o %ECF_JOBOUT% %SCHOST% %ECF_JOB%
edit ECF_KILL_CMD troika kill %SCHOST% %ECF_JOB%
edit ECF_STATUS_CMD troika monitor %SCHOST% %ECF_JOB%

Of course, you may change queue to np if you are running bigger parallel jobs, or SCHOST to eventually run on other complexes other than aa.

Connecting to the ecFlow server 

Due to the current limitation in network connectivity to arbitrary ports between our Reading and Bologna Data Centres, it is not possible to connect to that ecflow server in AA from your usual ecflow_ui in Reading.

There are several ways to work around this issue:

Through a graphical VNC session

You may spin up a graphical VNC session on the HPCF with ecinteractive. Once in the VNC session, you can then do the following from a terminal within that VNC session:

module load ecflow
ecflow_ui

Through an SSH tunnel 

You may alternatively use the native ecflow_ui client in your End User Device or VDI, but an additional step is required to ensure connectivity between both ends. You will need to create an SSH tunnel, forwarding the port where the ecflow server is running. 

  1. Start your ecflow server with your preferred settings on one of the login nodes of AA with ecflow_start.sh
  2. Once you know the hostname and port of the server, from your Linux Desktop or VDI create the SSH tunnel

    ssh -N -L<ecflow_port>:localhost:<ecflow_port> <ecflow_host>

    For example, if the server is started on the host aa6-100, port 34567:

    ssh -N -L34567:localhost:34567 aa6-100
  3. Open ecflow_ui on your End User Device or VDI and configure the new server, using "localhost" as the host and the ecflow port used above.

As the local port, you may use any other free port if that particular one is in use

X11 forwarding

This should be your last resort, since the experience running heavy graphical applications through X11 forwarding tends to be poor.

You may also run ecflow_ui remotely on the Atos HPCF, and use X11 forwarding to display on your screen:

ssh -X aa
module load ecflow
ecflow_ui

In this case, when adding the server remember it needs to be configured with the real name of the host running the ecflow server. 

  • No labels