Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

By doing that, you will get a dedicated allocation of CPUs and memory to run your application interactively. There are several ways to do this, depending on your use case:

Table of Contents

Using srun directly

If you have a single script or a command you wish to run interactively, one way to do this through the batch system is with a direct call to srun from within session in the login node. It would feel as if you were running locally, but it is instead using a job with dedicated resources:

...

Check man srun for a complete list of options.

Persistent interactive job with ecinteractive

However, you may want to To facilitate that task, we are providing the ecinteractive tool

No Format
$ ecinteractive -h
Usage :  /usr/local/bin/ecinteractive [options] [--]

  -d|desktop     Submits a vnc job (default is interactive ssh job)

  More Options:
  -h|help        Display this message
  -v|version     Display script version
  -A|account     Project account
  -c|cpus        Number of CPUs (default 2)
  -m|memory      Requested Memory (default 4G)
  -t|time        Wall clock limit (default 06:00:00)
  -r|reservation Submit the job into a SLURM reservation
  -g|cgroups     Launch cgroups watcher
  -k|kill        scancel the running job (if any). To cancel vnc jobs, use together with -d
  -x             set -x

Main features

  • You may specify the project account where it will be accounted for, as well as the resources needed (cpu, memory and time). Some defaults are set if those options are not specified. 
  • Only one interactive job is allowed at a time, but if run again, ecinteractive will reattach to your existing job.
  • You may manually reattach with an ssh to the allocated node given to you by ecinteractive.
  • You can use ecinteractive to kill an existing interactive job with the -k option.
  • You may open a basic graphical desktop for X11 applications.

Getting a shell with 4 cpus and 16 GB or memory for 12 hours

No Format
[user@at1-11 ~]$ ecinteractive -k
cancelling job...
    JOBID       NAME  USER   QOS    STATE       TIME TIME_LIMIT NODES      FEATURES NODELIST(REASON)
    63769 user-ecint  user    ni  RUNNING       0:56   12:00:00     1        (null) at1-103
[user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00

Interactive batch job is launched with following resources:
  Maximum run time (hours:min:sec): 12:00:00
  Maximum memory (MB): 16G
  Number of cores/threads: 4
Submitted batch job 63770
Found 1 interactive job running on at1-103 ... attaching to it
To manually re-attach:
        ssh at1-103
To cancel the job on tems:
        /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -k
[ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on at1-103 at 20210319_174542.052, PID: 428874, JOBID: 63770
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/lus/pfs1/scratch/user
[ECMWF-INFO-z_ecmwf_local.sh] $PERM=/perm/user
[ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/lus/pfs1/hpcperm/user
[ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.63770
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/lus/pfs1/scratchdir/user/0/63770
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A
[user@at1-103 ~]$ 

Reattaching to an existing interactive job

No Format
[user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00
Found 1 interactive job running on at1-103 ... attaching to it
To manually re-attach:
        ssh at1-103
To cancel the job on tems:
        /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -k
[ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on at1-103 at 20210319_174956.074, PID: 429252, JOBID: 63770
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/lus/pfs1/scratch/user
[ECMWF-INFO-z_ecmwf_local.sh] $PERM=/perm/user
[ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/lus/pfs1/hpcperm/user
[ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.63770
[ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/lus/pfs1/scratchdir/user/0/63770
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A
[ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A
[user@at1-103 ~]$ 

Killing a running interactive job

No Format
[user@at1-11 ~]$ ecinteractive -k
cancelling job...
    JOBID       NAME  USER   QOS    STATE       TIME TIME_LIMIT NODES      FEATURES NODELIST(REASON)
    63770 user-ecint  user    ni  RUNNING       5:31   12:00:00     1        (null) at1-103

Opening a graphical desktop within your interactive job

...

Include Page
HPC2020: Persistent interactive job with ecinteractive
HPC2020: Persistent interactive job with ecinteractive