You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 34 Next »

On this page...

IBM / AIX

IBM (xlf) issues with grib-api

When compiling grib-api on IBM architectures with the XLF compiler, we recommend disabling the creation of "shared libraries" and use static libraries only. This can cause runtime errors.

For more information, please see Installing grib-api.

Compilation of bindproc.c fails with XLF V12 compiler

This is caused by missing lines in the bindproc.c file for OpenIFS version 38r1. Please add the following code lines to bindproc.c:

#include <unistd.h> /* for _SC_NPROCESSORS_ONLN */
#include <sys/processor.h> /* for BINDTHREAD */

Please contact openifs-support@ecmwf.int for further assistance.

JIRA Issue: OIFSSUP-12 - Getting issue details... STATUS

Intel compiler

Use of MKL library can cause irreproducible results

OpenIFS includes a compilation configuration for the Intel compiler with the Intel MKL library (for optimized LAPACK/BLAS). However, please be aware use of this library can cause the model to be irreproducible, even on the same core count in successive runs. We recommend not using it if reproducibility is a concern.

OpenIFS also only provides a compilation configuration for the MKL and the Intel library. Linking MKL with other compilers is possible, though complicated and is not tried or tested with OpenIFS.

For help with linking the MKL library with other compilers, please see: https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

OpenIFS can fail with Intel compiler at -O2

There is an issue with OpenIFS when compiling with the Intel compiler at optimization level -O2 or above on chipsets that support SSE4.1 & AVX instructions. Intel compilers are generally more aggressive at optimisations for -O2 than other compilers.

Users will see failure with the T21 test job similar to the following:

Sample failure message
signal_harakiri(SIGALRM=14): New handler installed at 0x432110; old preserved at 0x0
 ***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time = 6.18
 myproc#1,tid#1,pid#27600,signal#8(SIGFPE): Received signal :: 123MB (heap), 125MB (rss), 0MB (stack), 0 (paging), nsigs 1, time 6.18
 tid#1 starting drhook traceback, time = 6.18
 myproc#1,tid#1,pid#27600: MASTER 
 myproc#1,tid#1,pid#27600: CNT0<1> 
 myproc#1,tid#1,pid#27600: CNT1 
 myproc#1,tid#1,pid#27600: CNT2 
 myproc#1,tid#1,pid#27600: CNT3 
 myproc#1,tid#1,pid#27600: CNT4 
 myproc#1,tid#1,pid#27600: STEPO 
 myproc#1,tid#1,pid#27600: SCAN2H 
 myproc#1,tid#1,pid#27600: SCAN2M 
 myproc#1,tid#1,pid#27600: GP_MODEL 
 myproc#1,tid#1,pid#27600: EC_PHYS_DRV 
 myproc#1,tid#1,pid#27600: >OMP-PHYSICS CLDPP T/S (1002) 
 myproc#1,tid#1,pid#27600: EC_PHYS 
 myproc#1,tid#1,pid#27600: CALLPAR 
 myproc#1,tid#1,pid#27600: SLTEND 

It arises because this compiler makes use of 2-way vectorization when compiling both branches of IF statements which can generate floating point exceptions if a zero divide is possible in the unexecuted branch and the IFS internal signal handler (DRHOOK) is enabled.

There are several possible workarounds:

  1. Compile the routines that cause the problem with lower optimisation, -O1. The routines affected are: sltend.F90, vsurf_mod.F90, vdfmain.F90, vdfhghtn.F90.
  2. Run with the environment variable: DR_HOOK_IGNORE_SIGNALS=8 to disable trapping of floating point exception signals (SIGFPE) by the model. This is not ideal as it will not catch other causes of floating point exceptions.
  3. Edit the code and insert the line:

     !DEC$ OPTIMIZE:1

    directly after the SUBROUTINE statement into the routines: sltend.F90, vsurf_mod.F90, vdfmain.F90, vdfhghtn.F90.

  4. Edit the intel-*.cfg configuration files in make/cfg and add lines to change the compile options specifically for these files.

OpenIFS uses a default of -O1 in the configuration files. If you increase the optimisation level, please be aware of this issue.

For more help with this issue, please contact openifs-support@ecmwf.int.

OpenIFS fails writing GRIB if grib_api compiled with Intel and -O2

We are aware of a problem in grib_api when using the Intel compiler that seems to affect different versions of grib_api and causes the model to fail with a floating point exception (SIGFPE). This is known to happen in the routine PRESET_GRIB_TEMPLATE or in the GRIB_F_SET_REAL8_ARRAY in the grib_api library. The advice is to reduce the optimization level when compiling grib_api to -O1 rather than -O2 or try a more recent version of the Intel compiler.

The error message that typifies this problem is:

OpenIFS log file
***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time =    3.10
JSETSIG: sl->active = 0
signal_harakiri(SIGALRM=14): New handler installed at 0xabfa00; old preserved at 0x0
***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time =    3.10
[myproc#1,tid#1,pid#14063]:  MASTER
[myproc#1,tid#1,pid#14063]:   CNT0<1>
[myproc#1,tid#1,pid#14063]:    SU0YOMB
[myproc#1,tid#1,pid#14063]:     SU_GRIB_API
[myproc#1,tid#1,pid#14063]:      PRESET_GRIB_TEMPLATE
JSETSIG: sl->active = 0
signal_harakiri(SIGALRM=14): New handler installed at 0xabfa00; old preserved at 0x0

or a traceback like this:

Traceback
[gdb__sigdump] : Received signal#8(SIGFPE), pid=-1
[LinuxTraceBack]: Backtrace(s) for program 'oifs38r1/make/intel_mkl-opt_conv/oifs/bin/master.exe' (pid=38451) :
(pid=38451): oifs38r1/src_conv/ifsaux/utilities/linuxtrbk.c:109  :  master.exe() [0xc14a2d]
(pid=38451):      oifs38r1/src_conv/ifsaux/support/drhook.c:884  :  master.exe() [0xac8ddb]
(pid=38451):                                          <Unknown>  :  libpthread.so.0(+0xf7e0) [0x7f59e215b7e0]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(log.L+0x23c) [0x7f59e60db98c]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xa7de4) [0x7f59e60a7de4]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0x9c6d4) [0x7f59e609c6d4]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(grib_pack_double+0x18) [0x7f59e6079847]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xc4814) [0x7f59e60c4814]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xc4890) [0x7f59e60c4890]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(grib_set_double_array_internal+0x68) [0x7f59e60c4921]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xa3a4a) [0x7f59e60a3a4a]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(grib_pack_double+0x18) [0x7f59e6079847]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xc4814) [0x7f59e60c4814]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xc4890) [0x7f59e60c4890]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xc4b3f) [0x7f59e60c4b3f]
(pid=38451):                                          <Unknown>  :  libgrib_api_f90.so.0(grib_f_set_real8_array_+0x51) [0x7f59e6380aea]
(pid=38451):                                          <Unknown>  :  libgrib_api_f90.so.0(grib_api_mp_grib_set_real8_array_+0x8a) [0x7f59e63858af]
(pid=38451): oifs38r1/src_conv/ifsaux/module/grib_api_interface.F90:358  :  master.exe() [0xb03bbd]

Note that the grib packing can also fail if the model has produced fields with a very large range of values, such that the grib library can't pack the values into a smaller bit range. For further help, please contact openifs-support@ecmwf.int.

GNU compilers

OpenIFS 38r1 fails with gfortran version 5 compiler

OpenIFS 38r1 is known to fail when using the gfortran/gcc version 5.2 compiler. The error is:

SUDIM1; after call to read(namgfl), nmfdiaglev =            0
		Error in `../make/gnu-noopt/oifs/bin/master.exe': double free or corruption (out): 0x0000000009fafd90 ***

If this occurs we recommend using version 4.8.1 of the gnu compilers. There is currently no fix for this issue with OpenIFS based on the 38r1 release.

Later versions of OpenIFS (40r1+) do not fail.

Cray

Cray ATP does not work

This is caused by the way IFS creates its own signal handler. To enable Cray ATP set:

export DR_HOOK_IGNORE_SIGNALS=-1

in the job script to completely disable any signal trapping by the IFS signal handler code 'DrHook'.

Contact openifs-support@ecmwf.int for assistance.

CrayPAT does not work

This is a result of the way in which the OpenIFS is compiled. More information on this and the resolution is described here.

MacOS X

Gfortran compiler fails with missing reference fedisableexcept

The compilation failure message may be seen when using the gnu compiler, gfortran, on MacOS X when compiling drhook.c.

To compile OpenIFS correctly on a mac requires a small change to the configuration files to add a pre-processor symbol. In addition, please be aware that the FCM command will hang if more than one thread is being used (see below).

For the required compiler/build option, make a copy of the configuration. e.g. assuming the use of the GNU compilers:

cd cfg
cp gnu-opt.cfg mac-opt.cfg

Edit mac_gnu-opt.cfg and find the line which defines the pre-processor symbols for the C compiler (gcc):

$OIFS_CCDEFS{?} = BLAS LITTLE LINUX INTEGER_IS_INT _ABI64

add 'DARWIN' to this line:

$OIFS_CCDEFS{?} = BLAS LITTLE LINUX INTEGER_IS_INT _ABI64 DARWIN

and save the file.

To compile OpenIFS with this new configuration file set:

export OIFS_COMP=mac
export OIFS_BUILD=opt

which tells the build system to use the file: mac-opt.cfg. The same change can be made to the un-optimized compilation configuration: gnu-noopt.cfg.

Using FCM with 2 or more threads hangs

There is a known issue where more than one thread causes the fcm command to hang. Until this is fixed only use 1 thread for compilation e.g.

fcm make -j1

This will unfortunately result in longer compile times.

 

 

 


 

  • No labels