You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 81 Next »

In this article we explain how to use OpenIFS in a container environment. At present this has been tested in two different ways using Docker to run the model interactively on a Linux workstation:

  1. The user works interactively from inside the container and the external experiment directory is mounted as a sub-directory inside the container environment. Depending on the set up the user can either have access to the entire OpenIFS installation inside the container or the user may be prevented from accessing the source code. 
     
  2. The user only works from the experiment directory, and instead of executing the model binary program the OpenIFS run script starts up a container environment wherein the experiment runs in isolation. Immediately after the experiment has completed the container is removed. The user has no access to any part of the model installation. 

Motivation

Setting up the computing environment (the libraries, directory structure, etc) required by OpenIFS can present a challenge when it is necessary to run the model on a new hardware infrastructure, for instance during workshops and training events. It is time consuming to install and compile the model and all of its required software packages. Also, the libraries that are available on the local system may not be compatible with the model requirements.

Many of these issues can be avoided by running a containerised version of the model which is a self-sufficient code package that can be used in a consistent way on different hardware platforms. The computational overhead (the "costs") of the container environment itself is often outweighed by performance increases due to the local availability and the instant access to all required libraries and data within the container.

We have used the Docker platform to produce a container image for OpenIFS. This requires the design of a "Dockerfile" which describes the build process for the model code and all its dependent libraries, and which results in a binary Docker image. This image can be uploaded onto other computers that make use of the Docker platform or which use other compatible software. A “container” is the running instance of the Docker image. 

Pre-compiled OpenIFS Docker images can be made available for download from container image repositories (e.g. Docker Hub or Harbor) and should be able to run on any computer that uses the Docker platform without the need to install and compile the model or any additional software. The OpenIFS Docker images should also work with other container software compatible with the Open Container Initiative (OCI) standards such as Singularity or Sarus.


Contents

Dockerfiles

The Dockerfile describes the build process of the container image. Examples for several of these files are provided in the OpenIFS git repository. The naming convention for Dockerfiles is as such:

Dockerfile.oifs<MODELRELEASE>.<GITHASH>.<NOTE>.<ARCH>.<TYPE>

MODELRELEASE:   A string generated from the IFS cycle, release number and OpenIFS version , e.g.  40r1v2 for CY40R1 OpenIFS release v2.

GITHASH:   The first five to seven characters of the OpenIFS repository git commit from which this image is built.

NOTE:          An optional comment string that describes features of this build.

ARCH:         The architecture for which this image is built, e.g. x86_64, amd64, i386 etc.

TYPE:          dev, test or bld.  bld (build) to be used only for production images, such as from existing OpenIFS releases. dev should be used for images created from development branch commits.

Example:    Dockerfile.oifs40r1v2.41537.user.x86_64.bld

Example Build Process

This section describes the generation process of a container image from the Dockerfile. You need the following files in your build directory:

lrwxrwxrwx 1   45  Dockerfile -> Dockerfile.oifs40r1v2.415374.root.x86_64.bld 
-rw-r----- 1 2.1K  Dockerfile.oifs40r1v2.415374.root.x86_64.bld 
-rw-r----- 1  21M  oifs40r1v2.415374d.tar.gz

The Dockerfile can be obtained from the git repository. As described through the naming convention it will generate an image of OpenIFS 40r1 version 2. The partial git hash relates the Dockerfile (and image) to a specific git commit in the OpenIFS repository (in this case it relates to 415374d which is tagged as model release v2 ). The note 'root' indicates that when the image is loaded as a container the user will have root privileges. This will allow us to explore the directory structure of the container image. For convenience it is recommended to create a symbolic link to the generic name Dockerfile.

The tar archive oifs40r1v2.41537.tar.gz is created from the model sources after they have been checked out from the git repository (again the partial commit is specified). The Dockerfile will expect the tar archive in the same directory and the file name is specified.

The following command builds the image oifs40r1v2.415374.root. The generic command is  docker build -t <image_name>  however at ECMWF four variables need to be set for network proxies in order to access the internet from within the container.

docker build -t oifs40r1v2.415374.root --build-arg http_proxy="$http_proxy" --build-arg ftp_proxy="$ftp_proxy" --build-arg https_proxy="$https_proxy" --build-arg no_proxy="$no_proxy" .

This starts the generation process of the image which contains the minimum of software that is required to run OpenIFS. The image is based on a Ubuntu Linux LTS version and in a first step the necessary developer tools are installed (e.g. GNU compiler, MPI and maths libraries). Afterwards the ecCodes library is obtained via download from the web and compiled with minimum settings. Then the OpenIFS sources are added from the tar archive, required environment variables are set and the model binaries are compiled. During a last step various file permissions are set and the model executable is moved to a globally accessible location.

At the end of the build process the successful image creation is shown as: 
Successfully tagged oifs40r1v2.415374.root:latest

Now we can verify the that the image is available and we will load it into a container using the  docker run  command:

$ docker images
REPOSITORY                TAG                 IMAGE ID            CREATED             SIZE
oifs40r1v2.415374d.root   latest              982f6e82bb93        39 minutes ago      873MB
ubuntu                    latest              72300a873c2c        13 days ago         64.2MB

$ docker run -it oifs40r1v2.415374.root
root@38b1649e05b9:/#

Our command line prompt has changed as we are now 'root' inside the container. A file listing shows the directory structure inside the container. 

root@38b1649e05b9:/# ls -F
bin/   dev/  home/  lib64/  mnt/   opt/   root/  sbin/  sys/  usr
boot/  etc/  lib/   media/  oifs/  proc/  run/   srv/   tmp/  var/

The OpenIFS model is installed in /oifs. The ecCodes library is found in its default destination under /usr/local/lib

In order to run the acceptance test the file /oifs/t21test/job needs editing:

EXPID=epc8
MASTER=/usr/local/bin/master.exe

When setting GRIB_SAMPLES_PATH replace grib_api with eccodes.

In order to run the executable with mpirun as root the following option needs to be added:  $OIFS_RUNCMD --allow-run-as-root $MASTER -e $EXPID

With the command 'exit' the container is removed and all created or changed files in the container are lost. The next section will show how results can be retained and OpenIFS experiments can be run using a container.

Docker Image with User Account

Dockerfiles with the note 'user' instead of 'root' in their filename contain an additional build step wherein a user account is created in the Docker image.

In this case, once the image has been loaded into a container, we have taken on the identity 'oifs_user' and no longer have root privileges. We are now in the home directory /home/oifs_user and we have no longer access to in the files and sub-directories under /oifs. As the model binaries were moved to /usr/local/bin they can be accessed and an experiment can be run from the home directory location. The model sources are hidden from the user and only a copy of the t21test directory is available in the home directory. 

Running OpenIFS Experiments in a Container

In this section we describe a method how the containerised version of OpenIFS can be used to run a case study interactively on the user's workstation (i.e. no batch job submission). Due to the temporary nature of containers all model results that are created in an experiment need to be stored outside the container. One possible method is to mount an external experiment directory inside the container. Data written to the mounted directory will be retained once the container is removed.

We assume an experiment directory has been created and shall be located at /scratch/rd/user/exp/. Some preparation is required as this directory needs to contain all the necessary experiment data. Sub-directories are allowed however symbolic links to other file system locations will not work; hence the symbolic links created by oifs_run at its first run will need to be manually created as sub-directories. 

This experiment directory is mounted to the container when it is invoked:

docker run -v /scratch/rd/user/exp:/exp:rw -it oifs40r1v2.415374.root

A linked duplicate of the experiment directory can now be found inside the container in sub-directory /exp with read and write permissions.  If a Dockerfile with user account is used then the experiment directory needs to be mounted within the user home directory of the container.

In order to mount the external experiment directory successfully all the files or sub-directories therein need to have full read-write-executable access:  chmod -R 777 /scratch/rd/user/exp

All the files in the mounted directory that were newly created or modified are owned by the container user, and seen from outside the container their file ownership will be different. 

Invoking the Container from the OpenIFS run script

An alternative method of using OpenIFS in a container consists of including the docker call inside the oifs_run script, replacing the execution of the model binary with mpirun. This method is also only suitable for running the model interactively (i.e. no batch job submission with aprun or srun). The modification in the script is as follows:

  1. set variable:  export OIFS_EXE=/usr/local/bin/master.exe
  2. comment out the code block that checks for the OIFS executable:  ###if [ -d "$OIFS_EXE" ]; then
  3. do not copy the executable:  ##\cp -f "$OIFS_EXE" . || true
  4. replace the call of the RUNCMD with:
    -  $RUNCMD ./$(basename "$OIFS_EXE") || {
    +  docker run -v /scratch/rd/damk/exp/:/home/oifs_user/exp:rw <oifs_image> bash -c "cd exp && ulimit -s unlimited && $OIFS_EXE" || {
    <oifs_image> is the name of the OpenIFS docker image

  When using this method the Docker container environment remains relatively "concealed" from the model user and requires no further interaction with it.

Batch Job Submission

The use of Docker containers when running OpenIFS on HPC facilities has been tested successfully and with good scalability on the Piz Daint Cray XC50 at the Swiss National Supercomputing Centre in December 2019 using local computing support. At present we do not yet offer this capability at ECMWF. This is work in progress and any updates will be reported here when available.

.


Crib Sheet: Important Docker commands

This section contains a listing of frequently used Docker commands some of which are specific to the ECMWF computing environment.


Start the Docker deamon on your machine (ECMWF):

sudo systemctl start docker
sudo systemctl restart docker
sudo systemctl status docker

which is actually:    sudo /usr/bin/systemctl status docker


Which images are on my machine:

docker images
docker rmi oifs                       remove image oifs, might need -f option 
docker rmi $(docker images -qa)       removes all images, might need -f option 
docker save -o oifs_image.tar oifs    saves image oifs to a tar file 
docker load -i oifs_image.tar         loads saved docker image into memory


Which containers are running:

docker ps
docker ps -a             show all containers
docker rm 6skd897asd     removes container beginning with 6sk...
docker rm $(docker ps -qa)   removes all containers, might need -f option


Build docker image:

docker build -t <image name> .        uses file called Dockerfile 
docker build -t <image name> -f <docker file>

At ECMWF:    docker build -t oifs --build-arg http_proxy="$http_proxy" --build-arg ftp_proxy="$ftp_proxy" --build-arg https_proxy="$https_proxy" --build-arg no_proxy="$no_proxy" .


Run docker images in container:

docker run -it ubuntu        run interactively with tty output
docker run -it oifs          run image oifs interactively
docker run -v /scratch/rd/
damk:/scratch:rw -it oifs                                    mount volume $SCRATCH inside container
docker run -v /tmp/.
X11-unix:/tmp/.X11-unix -e DISPLAY=unix$DISPLAY metview metview    allows Metview to open X Window from inside the container


Use Harbor online container registry:

Do this first:   docker login eccr.ecmwf.int

The build command below makes an image that can be pushed to harbor:   docker build -t eccr.ecmwf.int/openifs/oifs:0.0.1 -f --build-arg http_proxy="$http_proxy" --build-arg ftp_proxy="$ftp_proxy" --build-arg https_proxy="$https_proxy" --build-arg no_proxy="$no_proxy"

Then push it to harbor, manually specifying version number.  Careful: Existing version numbers are overwritten!     docker push eccr.ecmwf.int/openifs/oifs:0.0.1

Pull image from repository into memory:    docker pull eccr.ecmwf.int/openifs/oifs:0.0.1

  • No labels