You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 56 Next »


Prerequisites

 You should have completed:

  • Installation of data files
  • Successful compilation of OpenIFS.

Introduction

The OpenIFS tarfile distribution includes a simple low resolution (T21) test job that can be used to verify the model is working correctly in the directory: oifs/t21test. It is strongly recommended that after compiling the model successfully, these short tests are run to verify the model before any development or production is started. These short tests are also a good way to become familiar with the model.

  • Single task / single thread.
  • 2 x OpenMP threads
  • 2 x MPI tasks
  • 2 x threads + 2 tasks.
  • Acceptance (reference) tests.

These are described below.

Test directory

The directory oifs/t21test contains a number of files:

% cd oifs
% ls t21test
ICMGGepc8      ICMGGepc8INIUA  ICMSHepc8INIT  ifsdata  namelists
ICMGGepc8INIT  ICMSHepc8       README         job      ref_021_0072

Files beginning with 'ICM'.
These are the input files for this T21 experiment. They are in GRIB format. Do not move them from this directory. OpenIFS expects to find it's input files in the same directory as the main executable.

epc8 - this is the Experiment ID.
ICMGGepc8 - 'GG' indicates these contain gridpoint fields.
ICMSHepc8 - 'SH' indicates these contains spherical harmonic fields.

You can use the 'grib_ls' and 'grib_dump' commands to see the contents of these files (the grib_ls & grib_dump commands will be in the 'bin' directory of your grib_api installation).

job
Simple shell script to run the model. Described in more detail below.

ifsdata
Climate data fields used for T21 test integration. You should not move or rename this directory as the model will expect to find the climate files it needs in a directory of this name.

namelists
This file contains all of the input model fortran NAMELISTS. Not all of the namelists have their variables listed, only the variables commonly changed are listed here. Users should copy this file and modify it for the tests described below.

ref_021_0072
This file is reference output for the model tests. The model can be run in 'reference' mode where it checks it is working correctly by comparing some mathematical norms against these files. Reference runs are described in more detail under 'Acceptance testing' below.

Test integrations

A number of short model runs are recommended to verify the model is working correctly. Once you have compiled the model without errors, follow these steps.

These tests will ensure the model can run with multiple OpenMP threads, with MPI tasks and in mixed OpenMP/MPI mode. A further acceptance test can be run which compares the model output on your machine with reference data obtained from machines at ECMWF.

Serial : single task, single thread.

a) Build the model in the oifs/make directory. Use the optimized ('opt') build configuration. If you find these tests do not work with this configuration, use the 'noopt' configuration and then experiment by raising the optimization level.
    Copy the model executable, master.exe, from the make directory 'make/gnu-opt/bin/master.exe' to the t21job directory.

b) In directory oifs/t21test  edit the file 'job' to change the line beginning:

export GRIB_SAMPLES_PATH=...

to give the location of the file'grib1_mlgrib2' in your grib_api installation, in the directory 'ifs_samples'. For example, if you followed the walk-through on the grib_api install page for the gnu compilers this would be:

export GRIB_SAMPLES_PATH=$HOME/ecmwf/grib_api_gcc/share/grib_api/ifs_samples/grib1_mlgrib2

c) Copy the namelists and run the model with a single task and single thread by executing the job script:

% cp namelists fort.4
% ./job -e epc8 -x ./master.exe

The model will expect to find a file called fort.4 in the same directory as the executable. This script copies the executable from oifs/make/build/bin.

If the run works you will see output like:

...
signal_drhook(SIGSYS=31): New handler installed at 0x4d06cf; old preserved at 0x0
MPL_BUFFER_METHOD:  2           0
   16:03:46 STEP    0 H=   0:00 +CPU=  3.598
   16:03:46 STEP    1 H=   0:10 +CPU=  0.535
   16:03:47 STEP    2 H=   0:20 +CPU=  0.537
   16:03:48 STEP    3 H=   0:30 +CPU=  0.537
   16:03:48 STEP    4 H=   0:40 +CPU=  0.527
   16:03:49 STEP    5 H=   0:50 +CPU=  0.526
   16:03:49 STEP    6 H=   1:00 +CPU=  0.530

This test runs only 6 timesteps.

If the job command can't find the model executable, make sure you have copied the master.exe file from the 'make/gnu-opt/bin' directory to the t21test directory.

Model output

The model writes its output to a several files.

NODE_001.01 contains the text output (WRITE/PRINT statements). The numbers refer to task number and thread number. Only output from the master task & thread is normally output but this can be changed for debugging purposes.

ICM*epc8+0000 is the model output in GRIB format split into 2 files; one for the gridpoint, the other for spectral fields. These contain only a few output variables in this test. This file is a mix of GRIB1 and GRIB2 messages. See the Documentation for how to process this output.

ifs.stat is a small file that prints the model steps, time taken for each step and a 'norm' measure. This file can be usually ignored but is useful for debugging.

Likely errors

The model will fail with an error if it cannot find the 'ifs_samples' directory in the grib_api installation:

signal_drhook(SIGSYS=31): New handler installed at 0x4d06cf; old preserved at 0x0
MPL_BUFFER_METHOD:  2           0
GRIB_API ERROR   :  Unable to locate sample file gg_sfc_grib1.tmpl
                    in /home/rd/openifs/software/grib_api/1.9.18/grib_api-gcc-4.5.0/ifs_samples/grib1_mlgrib2
 GRIB_NEW_FROM_TEMPLATE gg_sfc_grib1 FAILED          -2

Check the location of the grib samples and correct the 'job' script.

If you see an error like this:

mpirun noticed that job rank 0 with PID 7429 on node elvira exited on
signal 11 (Segmentation fault).

or

MEMORY FAULT

it is most likely OpenIFS requires more 'stack' memory than your default setting. This can happen when increasing the number of OpenMP threads.

To solve the problem add the line:

ulimit -s unlimited

to the file 'job' just before the mpirun line. This will increase the per-process stack memory limit to the maximum the operating system allows.

If the model still fails, contact openifs-support@ecmwf.int for assistance.

Parallel: 2 threads and 2 tasks

These next short tests verify the model works correctly with either OpenMP parallel threading, MPI tasks and both and follow on from the serial tests above.

a) Edit the file 'job' and change the line: export OMP_NUM_THREADS=1 to export OMP_NUM_THREADS=2 and then run ./job as above.

OpenMP threads is only enabled for optimized 'opt' builds 

If this works, look in the NODE_001.01 output file for the line:

NUMBER OF THREADS                 2

to verify the model ran with 2 OpenMP threads.

b) Edit the file 'job' and change OMP_NUM_THREADS back to 1. Change the line: NPROC=1 to NPROC=2. Also, edit the fort.4 file and change NPROC to 2. Note that increasing the number of tasks requires changing the number of tasks in two places.

Rerun the job:

./job

and again look in the NODE_001.01 output file for the line: "NUMBER OF TASKS   2"  to verify that two MPI tasks was used.

Mixed mode: OpenMP and MPI

If the short tests above succeed, edit 'job' again and change OMP_NUM_THREADS back to 2 without changing NPROC. Rerun 'job' and confirm that 2 threads and 2 tasks was used in the NODE file and the run was successful. You may not be able to do this test if your computer does not have sufficient processing power.

If all these technical tests work, perform the acceptance test below to ensure that the numerical results from the model are as expected. It's strongly recommended this test is completed before proceeding to work with the model for development or production.

Acceptance testing

The final step is to check the model is producing the numerical answers within acceptable limits, even if it runs the short tests above without failing. OpenIFS includes code that will compute internal statistical norms and compare against numbers supplied by ECMWF. The file: ref_021_0072 in the t21test directory contains statistical norms computed by the model run at ECMWF.

OpenIFS CY38 releases used a longer run of 144 steps (24hrs). In later releases this was changed to 12hrs.

Before running the test, change the number of tasks NPROC and threads OMP_NUM_THREADS back to 1.  It is prudent to run the test without any parallel execution because experience shows that some compiler libraries have internal threading which can cause differences in the results.

Reset number of tasks and threads in 'job'
NPROC=1						# turn off parallel execution, edit fort.4, and ..
export OMP_NUM_THREADS=1	# .. turn off OpenMP threading 

Remember to set NPROC=1 in the fort.4 namelist file.

To do the acceptance test, edit the namelists in fort.4 and look for the NAMCT0 namelist:

&NAMCT0
 LREFOUT=false,
 NSTOP=6,

change the number of timesteps to 72 to run the model for 12hrs (assuming you have not changed the default timestep of 10mins at T21) and set the LREFOUT to TRUE:

&NAMCT0
 LREFOUT=true,
 NSTOP=72,

With LREFOUT=true, at the last timestep OpenIFS will read the ref_021_0072 file and produce a new file: res_021_0072 (note the similar filenames!). The contents of the file should be similar to:

% cat res_021_0072
 
               Results of ERROR calculation
 
 The error calculated from the results shows
 that the calculations are correct
 
 The maximum error is =         0.11345 %

The maximum error should be below 2-3%. The value of 0.11345 is illustrative.

As long as the model reports 'calculations are correct' the model is behaving satisfactorily in your compilation and run environment. However, note that the ref_021_0072 file was generated by using the GNU compilers. If you use a different compiler such as Intel, you will see a larger maximum error value.

Generating validation tests

To generate additional validation tests to produce your own ref* files, use the namelist switch:

NAMCT0

LREFGEN=.true.,

With this set, run the model for a short forecast. At the end of the run, a ref_*_* file will be created with the resolution value and the total number of steps in the filename.

The model should not be run for long because this test relies on a linear growth of errors. A 12 hour run is generally recommended, particularly at higher resolutions.


Any questions/problems please contact openifs-support@ecmwf.int.




On this page ...


OpenIFS User Guide ...

Unable to render {children}. Page not found: OpenIFS User Guide.

  • No labels