Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated to ecinteractive 0.5, Bologna support and jupyter

...

By doing that, you will get a dedicated allocation of CPUs and memory to run your application interactively. There are several ways to do this, depending on your use case:

Table of Contents

Using srun directly

If you have a single script or a command you wish to run interactively, one way to do this through the batch system is with a direct call to srun from within session in the login node. It would feel as if you were running locally, but it is instead using a job with dedicated resources:

...

Check man srun for a complete list of options.

Persistent interactive job with ecinteractive

However, you may want to To facilitate that task, we are providing the ecinteractive tool. Its main features are the following:

  • Only one interactive job is allowed at a time
  • Your job keeps on running after you exit the interactive shell, so you can reattach to it any time or open multiple interactive shells within the same job.
  • You may open a basic graphical desktop for X11 applications.
  • You may open a Jupyter Lab instance and connect to it through your browser.
  • By default it will submit to AA, but you can choose what complex (platform) to use.
  • You can run ecinteractive from any Atos HPCF complex, Red Hat Linux VDI. You may also copy the script to your end user device and use it from there. It should work from Linux, Mac, or WSL under windows, and requires the Teleport tsh client to be installed.
No Format
$ ecinteractive -h
Usage :  /usr/local/bin/ecinteractive [options] [--]

    -d|desktop     Submits a vnc job (default is interactive ssh job)
    -j|jupyter     Submits a jupyter job (default is interactive ssh job)

    More Options:
    -h|help        Display this message
    -v|version     Display script version
    -p|platform    Platform (default aa. Choices: aa, ab, ac, ad)
    -u|user        ECMWF User (default user)
    -A|account     Project account
    -c|cpus        Number of CPUs (default 2)
    -m|memory      Requested Memory (default 8G)
    -s|tmpdirsize  Requested TMPDIR size (default 4G3 GB)
    -t|time        Wall clock limit (default 06:00:00)
    -r|reservation Submit the job into a SLURM reservation
  -g|cgroupsk|kill        Cancel any running interactive job
    -q|query       LaunchCheck cgroupsrunning watcherjob
    -ko|killoutput      Output file scancelfor the runninginteractive job (if any). To cancel vnc jobs, use together with -d
default /dev/null)
    -x             set -x

Main features

  • You may specify the project account where it will be accounted for, as well as the resources needed (cpu, memory and time). Some defaults are set if those options are not specified. 
  • Only one interactive job is allowed at a time, but if run again, ecinteractive will reattach to your existing job.
  • You may manually reattach with an ssh to the allocated node given to you by ecinteractive.
  • You can use ecinteractive to kill an existing interactive job with the -k option.
  • You may open a basic graphical desktop for X11 applications.

...

Creating an interactive job

You can get an interactive shell running on an allocated node within the Atos HCPF with just calling ecinteractive. By default it will just use the default settings which are:

Cpus2
Memory8 GB
Time6 hours
TMPDIR size3 GB


If you need more resources, you may use the ecinteractive options when creating the job. For example, to get a shell with 4 cpus and 16 GB or memory for 12 hours:

No Format
[user@at1user@aa6-11100 ~]$ ecinteractive -c4 -m -k
cancelling job...
    JOBID       NAME  USER   QOS    STATE16G -t 12:00:00
Submitted batch job 10225018
Waiting 5 seconds for the job to be ready...
Using interactive job:
CLUSTER JOBID       STATE       EXEC_HOST  TIME TIME_LIMIT NODES TIME_LEFT MAX_CPUS    FEATURES NODELIST(REASON)
MIN_MEMORY TRES_PER_NODE
aa    63769 user-ecint  user10225018    ni  RUNNING       0:56   12:00:00     1        (null) at1-103
[user@at1-11 ~]$ ecinteractive -c 4 -m 16G -taa6-104      12:00:00

Interactive batch job is launched with following resources:
  Maximum run time (hours:min:sec): 12:00:00
  Maximum memory (MB): 16G
  Number of cores/threads: 4
Submitted batch job 63770
Found 1 interactive job running on at1-103 ... attaching to it
To manually re-attach:
        ssh at1-103 11:59:55        4        16G     ssdtmp:3G

To cancel the job on tems:
        /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -kk

Last login: Mon Dec 13 09:39:09 2021
[ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on at1aa6-103104 at 2021031920211213_174542093914.052794, PID: 4288741736962, JOBID: 6377010225018
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/lusec/pfs1res4/scratch/user
[ECMWF-INFO-z_ecmwf_local.sh] $PERM=/ec/res4/perm/user
[ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/lusec/pfs1res4/hpcperm/user
[ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.6377010225018
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/lusec/pfs1res4/scratchdir/user/08/6377010225018
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A
[user@at1-103ECMWF-INFO-z_ecmwf_local.sh] Job 10225018 time left:   11:59:54
[user@aa6-104 ~]$ 


Note

If you log out, the job continues to run until explicietly cancelled or reaching the time limit.

Reattaching to an existing interactive job

Once you have an interactive job running, you may reattach to it, or open several shells within that job calling ecinteractive again. 

Note

If you have a job already running, ecinteractive will always attach you to that one regardless of the resources options you pass. If you wish to run a job with different settings, you will have to cancel it first


No Format
[user@at1user@aa6-11100 ~]$ ecinteractive 
Using interactive -c 4 -m 16G -tjob:
CLUSTER JOBID       STATE       EXEC_HOST  TIME_LIMIT  TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE
aa      10225018    RUNNING     aa6-104      12:00:00
Found 1 interactive job running on at1-103 ... attaching to it
To manually re-attach:
        ssh at1-103   11:57:56        4        16G     ssdtmp:3G

WARNING: Your existing job 10225018 may have a different setup than requested. Cancel the existing job and rerun if you with to run with different setup

To cancel the job on tems:
        /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -kk

Last login: Mon Dec 13 09:39:14 2021 from aa6-100.bullx
[ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on at1aa6-103104 at 2021031920211213_174956094114.074197, PID: 4292521742608, JOBID: 6377010225018
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/lusec/pfs1res4/scratch/user
[ECMWF-INFO-z_ecmwf_local.sh] $PERM=/ec/res4/perm/user
[ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/lusec/pfs1res4/hpcperm/user
[ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.6377010225018
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/lusec/pfs1res4/scratchdir/user/08/6377010225018
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A
[ECMWF-INFO-z_ecmwf_local.sh] Job 10225018 time left:   11:57:54
[user@at1-103 ~]$ 

Killing a running interactive job

[user@aa6-104 ~]$ 

Checking the status of a running interactive job

You may query ecinteractive for existing interactive jobs, and you can do so from within or outside the job. It may be useful to see how much time is left

No Format
[user@aa6-100 ~]$ ecinteractive -q
CLUSTER JOBID       STATE       EXEC_HOST  TIME_LIMIT  TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE
aa      10225018    RUNNING     aa6-104      12:00:00   11:55:40        4        16G     ssdtmp:3G

Killing/Cancelling a running interactive job

Logging out of your interactive shells spawn through ecinteractive will not cancel the job. If you have finished working with it, you should cancel it with:

No Format
[user@aa6-100
No Format
[user@at1-11 ~]$ ecinteractive -k
cancelling job 10225018...
CLUSTER JOBID   JOBID    STATE    NAME  USER EXEC_HOST  QOSTIME_LIMIT  TIME_LEFT  STATE MAX_CPUS MIN_MEMORY TRES_PER_NODE
aa      TIME10225018 TIME_LIMIT NODES  RUNNING    FEATURES NODELIST(REASON)
 aa6-104    63770 user-ecint  user    ni  RUNNING 12:00:00   11:55:34        4   5:31   12:00:00  16G   1  ssdtmp:3G
Cancel job_id=10225018 name=user-ecinteractive partition=inter [y/n]?  (null) at1-103

Opening a graphical desktop within your interactive job

y
Connection to aa-login closed.

Opening graphical applications within your interactive job

if you need to run graphical applications, you can do so through the standard x11 forwarding.

  • If running it from an Atos HPCF login node, make sure you have connected there with ssh -X and that you have a working X11 server on your end user device (i.e. XQuartz on MAC, MobaXterm, Xming or similar on Windows)
  • If running it from the Red Hat Linux VDI, it should work out of the box
  • If running it from your end user device, make sure you have a working X11 server on your end user device (i.e. XQuartz on MAC, MobaXterm, Xming or similar on Windows)

Alternatively, you may use ecinteractive to open a basic window manager running on the allocated interactive node, which will open a VNC client on your end to connect to the running desktop in the allocated node:

No Format
[user@aa6-100
No Format
[user@at1-11 ~]$ ecinteractive -cd
Submitted 4batch -m 16G -t 12:00:00 -d

Interactive batchjob 10225277
Waiting 5 seconds for the job isto launched with following resources:
  Maximum run time (hours:min:sec): 12:00:00
  Maximum memory (MB): 16G
  Number of cores/threads: 4
Submitted batch job 63771
A vnc session job is running on tems node at1-103 - this tool will re-attach to it.
To manually re-attach:
be ready...
Using interactive job:
CLUSTER JOBID       STATE       EXEC_HOST  TIME_LIMIT  TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE
aa      10225277    RUNNING     aa6-104       6:00:00    5:59:55        2         8G  vncviewer -passwd ~/.vnc/passwd at1-103:9598 ssdtmp:3G

To cancel the job on tems:
        /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -d -k

TigerVNC Viewer 64-bit v1.10.1
Built on: 2020-10-06 13:51
Copyright (C) 1999-2019 TigerVNC Team and many others (see README.rst)
See https://www.tigervnc.org for information on TigerVNC.

Fri Mar 19 17:52:35 2021
 DecodeManager: Detected 256 CPU core(s)
 DecodeManager: Creating 4 decoder thread(s)
 CConn:       Connected to host at1-103 port 9598
 CConnection: Server supports RFB protocol version 3.8
 CConnection: Using RFB protocol version 3.8
 CConnection: Choosing security type VeNCrypt(19)
 CVeNCrypt:   Choosing security type VncAuth (2)
 CConn:       Using pixel format depth 24 (32bpp) little-endian rgb888
 CConnection: Enabling continuous updatesk

Attaching to vnc session...
To manually re-attach:
        vncviewer -passwd ~/.vnc/passwd aa6-104:9598

Opening a Jupyter Lab instance

You can also use ecinteractive to open up a Jupyter Lab instance very easily:

No Format
[user@aa6-100 ~]$ ./ecinteractive -j
Using interactive job:
CLUSTER JOBID       STATE       EXEC_HOST  TIME_LIMIT  TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE
aa      10225277    RUNNING     aa6-104       6:00:00    5:58:07        2         8G     ssdtmp:3G

To cancel the job:
        ./ecinteractive -k

Attaching to Jupyterlab session...
To manually re-attach go to http://aa6-104.ecmwf.int:33698/?token=b1624da17308654986b1fd66ef82b9274401ea8982f3b747