Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Let's explore how to use the Slurm Batch System to the ATOS HPCF or ECS.

Table of Contents

Basic

...

job submission

Access the default login node of the ATOS HPCF or ECS.

  1. Create a directory for this tutorial so all the exercises and outputs are contained inside:

    No Format
    mkdir ~/batch_tutorial
    cd ~/batch_tutorial


  2. Create and submit a job called simplest.sh with just default settings that runs the command hostname. Can you find the output and inspect it? Where did your job run?

    Expand
    titleSolution

    Using your favourite editor, create a file called simplest.sh with the following content

    Code Block
    languagebash
    titlesimplest.sh
    #!/bin/bash
    hostname

    You can submit it with:

    No Format
    sbatch simplest.sh

    The job should be run shortly. When finished, a new file called slurm-<jobid>.out should appear in the same directory. You can check the output with:

    No Format
    $ cat $(ls -r1 slurm-*.out | head -n1)
    ab6-202.bullx
    [ECMWF-INFO -ecepilog] ----------------------------------------------------------------------------------------------------
    [ECMWF-INFO -ecepilog] This is the ECMWF job Epilogue
    [ECMWF-INFO -ecepilog] +++ Please report issues using the Support portal +++
    [ECMWF-INFO -ecepilog] +++ https://support.ecmwf.int                     +++
    [ECMWF-INFO -ecepilog] ----------------------------------------------------------------------------------------------------
    [ECMWF-INFO -ecepilog] Run at 2023-10-25T11:31:53 on ecs
    [ECMWF-INFO -ecepilog] JobName                   : simplest.sh
    [ECMWF-INFO -ecepilog] JobID                     : 64273363
    [ECMWF-INFO -ecepilog] Submit                    : 2023-10-25T11:31:36
    [ECMWF-INFO -ecepilog] Start                     : 2023-10-25T11:31:51
    [ECMWF-INFO -ecepilog] End                       : 2023-10-25T11:31:53
    [ECMWF-INFO -ecepilog] QueuedTime                : 15.0
    [ECMWF-INFO -ecepilog] ElapsedRaw                : 2
    [ECMWF-INFO -ecepilog] ExitCode                  : 0:0
    [ECMWF-INFO -ecepilog] DerivedExitCode           : 0:0
    [ECMWF-INFO -ecepilog] State                     : COMPLETED
    [ECMWF-INFO -ecepilog] Account                   : myaccount
    [ECMWF-INFO -ecepilog] QOS                       : ef
    [ECMWF-INFO -ecepilog] User                      : user
    [ECMWF-INFO -ecepilog] StdOut                    : /etc/ecmwf/nfs/dh1_home_a/user/slurm-64273363.out
    [ECMWF-INFO -ecepilog] StdErr                    : /etc/ecmwf/nfs/dh1_home_a/user/slurm-64273363.out
    [ECMWF-INFO -ecepilog] NNodes                    : 1
    [ECMWF-INFO -ecepilog] NCPUS                     : 2
    [ECMWF-INFO -ecepilog] SBU                       : 0.011
    [ECMWF-INFO -ecepilog] ----------------------------------------------------------------------------------------------------

    You can then see that the script has run on a different node than the one you are on.

    If you repeat the operation, you may get your job to run on a different node every time, whichever happens to be free at the time.


  3. Configure your simplest.sh job to direct the output to simplest-<jobid>.out, the error to simplest-<jobid>.err both in the same directory, and the job name to just "simplest". Note you will need to use a special placeholder for the -<jobid>.

    Expand
    titleSolution

    Using your favourite editor, open the simplest.sh job script and add the relevant #SBATCH directives:

    Code Block
    languagebash
    titlesimplest.sh
    #!/bin/bash
    #SBATCH --job-name=simplest
    #SBATCH --output=simplest-%j.out
    #SBATCH --output=simplest-%j.err
    hostname

    You can submit it again with:

    No Format
    sbatch simplest.sh

    After a few moments, you should see the new files appear in your directory (job id will be different than the one displayed here):

    No Format
    $ ls simplest-*.*
    simplest-64274497.err  simplest-64274497.out

    You can check that the job name was also changed in the end of job report:

    No Format
    $ grep -i jobname $(ls -r1 simplest-*.err | head -n1)
    [ECMWF-INFO -ecepilog] JobName                   : simplest


Basic job management

  1. Create a new job script sleepy.sh with the following contents and submit it. Then check :


    No Format
    #!/bin/bash
    sleep 120

    Expand
    titleSolution

    Using your favourite editor, open the simplest.sh job script and add the relevant #SBATCH directives:

    Code Block
    languagebash
    titlesimplest.sh
    #!/bin/bash
    #SBATCH --job-name=simplest
    #SBATCH --output=simplest-%j.out
    #SBATCH --output=simplest-%j.err
    hostname

    You can submit it again with:

    No Format
    sbatch simplest.sh

    After a few moments, you should see the new files appear in your directory (job id will be different than the one displayed here):

    No Format
    $ ls simplest-*.*
    simplest-64274497.err  simplest-64274497.out

    You can check that the job name was also changed in the end of job report:

    No Format
    $ grep -i jobname $(ls -r1 simplest-*.err | head -n1)
    [ECMWF-INFO -ecepilog] JobName                   : simplest