You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

If you wish to run interactively but are constrained by the limits on the CPUs, CPU Time or memory, you may run a small interactive job requesting the resources you want.

By doing that, you will get a dedicated allocation of CPUs and memory to run your application interactively.

Using srun directly

If you have a single script or a command you wish to run interactively, one way to do this through the batch system is with a direct call to srun from within session in the login node. It would feel as if you were running locally, but it is instead using a job with dedicated resources:

$ cat myscript.sh 
#!/bin/bash
echo "This is my super script"
echo "Doing some heavy work on $HOSTNAME..."
$ ./myscript.sh 
This is my super script
Doing some heavy work on at1-11...
$ srun ./myscript.sh 
This is my super script
Doing some heavy work on at1-105...

In that example the submitted job would have run using the default settings (default qos, with just 1 cpu and default memory). You can of course pass additional options to srun to customise the resources allocated to this interactive job. For example, to run with 4 cpus, 12 GB with a limit of 6 hours:

$ srun -c 4 --mem=12G -t 06:00:00 ./myscript.sh

Check man srun for a complete list of options.

Persistent interactive job with ecinteractive

However, you may want to To facilitate that task, we are providing the ecinteractive tool

$ ecinteractive -h
Usage :  /usr/local/bin/ecinteractive [options] [--]

  -d|desktop     Submits a vnc job (default is interactive ssh job)

  More Options:
  -h|help        Display this message
  -v|version     Display script version
  -A|account     Project account
  -c|cpus        Number of CPUs (default 2)
  -m|memory      Requested Memory (default 4G)
  -t|time        Wall clock limit (default 06:00:00)
  -r|reservation Submit the job into a SLURM reservation
  -g|cgroups     Launch cgroups watcher
  -k|kill        scancel the running job (if any). To cancel vnc jobs, use together with -d
  -x             set -x

Main features

  • You may specify the project account where it will be accounted for, as well as the resources needed (cpu, memory and time). Some defaults are set if those options are not specified. 
  • Only one interactive job is allowed at a time, but if run again, ecinteractive will reattach to your existing job.
  • You may manually reattach with an ssh to the allocated node given to you by ecinteractive.
  • You can use ecinteractive to kill an existing interactive job with the -k option.
  • You may open a basic graphical desktop for X11 applications.

Getting a shell with 4 cpus and 16 GB or memory for 12 hours

[user@at1-11 ~]$ ecinteractive -k
cancelling job...
    JOBID       NAME  USER   QOS    STATE       TIME TIME_LIMIT NODES      FEATURES NODELIST(REASON)
    63769 user-ecint  user    ni  RUNNING       0:56   12:00:00     1        (null) at1-103
[user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00

Interactive batch job is launched with following resources:
  Maximum run time (hours:min:sec): 12:00:00
  Maximum memory (MB): 16G
  Number of cores/threads: 4
Submitted batch job 63770
Found 1 interactive job running on at1-103 ... attaching to it
To manually re-attach:
        ssh at1-103
To cancel the job on tems:
        /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -k
[ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on at1-103 at 20210319_174542.052, PID: 428874, JOBID: 63770
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/lus/pfs1/scratch/user
[ECMWF-INFO-z_ecmwf_local.sh] $PERM=/perm/user
[ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/lus/pfs1/hpcperm/user
[ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.63770
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/lus/pfs1/scratchdir/user/0/63770
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A
[user@at1-103 ~]$ 

Reattaching to an existing interactive job

[user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00
Found 1 interactive job running on at1-103 ... attaching to it
To manually re-attach:
        ssh at1-103
To cancel the job on tems:
        /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -k
[ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on at1-103 at 20210319_174956.074, PID: 429252, JOBID: 63770
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/lus/pfs1/scratch/user
[ECMWF-INFO-z_ecmwf_local.sh] $PERM=/perm/user
[ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/lus/pfs1/hpcperm/user
[ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.63770
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/lus/pfs1/scratchdir/user/0/63770
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A
[user@at1-103 ~]$ 

Killing a running interactive job

[user@at1-11 ~]$ ecinteractive -k
cancelling job...
    JOBID       NAME  USER   QOS    STATE       TIME TIME_LIMIT NODES      FEATURES NODELIST(REASON)
    63770 user-ecint  user    ni  RUNNING       5:31   12:00:00     1        (null) at1-103

Opening a graphical desktop within your interactive job

[user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00 -d

Interactive batch job is launched with following resources:
  Maximum run time (hours:min:sec): 12:00:00
  Maximum memory (MB): 16G
  Number of cores/threads: 4
Submitted batch job 63771
A vnc session job is running on tems node at1-103 - this tool will re-attach to it.
To manually re-attach:
        vncviewer -passwd ~/.vnc/passwd at1-103:9598
To cancel the job on tems:
        /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -d -k

TigerVNC Viewer 64-bit v1.10.1
Built on: 2020-10-06 13:51
Copyright (C) 1999-2019 TigerVNC Team and many others (see README.rst)
See https://www.tigervnc.org for information on TigerVNC.

Fri Mar 19 17:52:35 2021
 DecodeManager: Detected 256 CPU core(s)
 DecodeManager: Creating 4 decoder thread(s)
 CConn:       Connected to host at1-103 port 9598
 CConnection: Server supports RFB protocol version 3.8
 CConnection: Using RFB protocol version 3.8
 CConnection: Choosing security type VeNCrypt(19)
 CVeNCrypt:   Choosing security type VncAuth (2)
 CConn:       Using pixel format depth 24 (32bpp) little-endian rgb888
 CConnection: Enabling continuous updates
  • No labels