Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Panel
bgColorwhite
titleBGColorlightgrey
borderStyledotted
titleOn this page...

Table of Contents

 

 

Panel
bgColorlightblue
borderStylenone

IBM / AIX

Panel
bgColorwhite
titleBGColorlightgrey

IBM (xlf) issues with grib-api

When compiling grib-api on IBM architectures with the XLF compiler, we recommend disabling the creation of "shared libraries" and use static libraries only. This can cause runtime errors.

For more information, please see Installing grib-api.

Panel
bgColorwhite

Compilation of bindproc.c fails with XLF V12 compiler

This is caused by missing lines in the bindproc.c file for OpenIFS version 38r1. Please add the following code lines to bindproc.c:

Code Block
#include <unistd.h> /* for _SC_NPROCESSORS_ONLN */
#include <sys/processor.h> /* for BINDTHREAD */

Please contact openifs-support@ecmwf.int for further assistance.

JIRA Issue:

Jira
serverECMWF Software Support
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId267ffb4b-b041-3e3e-bee4-0486d22e0a7f
keyOIFSSUP-12

 

 

Panel
bgColorlightblue
borderStylenone

Intel compiler

Panel
bgColorwhite

Use of MKL library can cause irreproducible results

OpenIFS includes a compilation configuration for the Intel compiler with the Intel MKL library (for optimized LAPACK/BLAS). However, please be aware use of this library can cause the model to be irreproducible, even on the same core count in successive runs. We recommend not using it if reproducibility is a concern.

OpenIFS also only provides a compilation configuration for the MKL and the Intel library. Linking MKL with other compilers is possible, though complicated and is not tried or tested with OpenIFS.

For help with linking the MKL library with other compilers, please see: https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

Panel
bgColorwhite
titleBGColorlightgrey

OpenIFS can fail with Intel compiler at -O2

There is an issue with OpenIFS when compiling with the Intel compiler at optimization level -O2 or above on chipsets that support SSE4.1 & AVX instructions. Intel compilers are generally more aggressive at optimisations for -O2 than other compilers.

Users will see failure with the T21 test job similar to the following:

Code Block
titleSample failure message
collapsetrue
signal_harakiri(SIGALRM=14): New handler installed at 0x432110; old preserved at 0x0
 ***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time = 6.18
 myproc#1,tid#1,pid#27600,signal#8(SIGFPE): Received signal :: 123MB (heap), 125MB (rss), 0MB (stack), 0 (paging), nsigs 1, time 6.18
 tid#1 starting drhook traceback, time = 6.18
 myproc#1,tid#1,pid#27600: MASTER 
 myproc#1,tid#1,pid#27600: CNT0<1> 
 myproc#1,tid#1,pid#27600: CNT1 
 myproc#1,tid#1,pid#27600: CNT2 
 myproc#1,tid#1,pid#27600: CNT3 
 myproc#1,tid#1,pid#27600: CNT4 
 myproc#1,tid#1,pid#27600: STEPO 
 myproc#1,tid#1,pid#27600: SCAN2H 
 myproc#1,tid#1,pid#27600: SCAN2M 
 myproc#1,tid#1,pid#27600: GP_MODEL 
 myproc#1,tid#1,pid#27600: EC_PHYS_DRV 
 myproc#1,tid#1,pid#27600: >OMP-PHYSICS CLDPP T/S (1002) 
 myproc#1,tid#1,pid#27600: EC_PHYS 
 myproc#1,tid#1,pid#27600: CALLPAR 
 myproc#1,tid#1,pid#27600: SLTEND 

It arises because this compiler makes use of 2-way vectorization when compiling both branches of IF statements which can generate floating point exceptions if a zero divide is possible in the unexecuted branch and the IFS internal signal handler (DRHOOK) is enabled.

There are several possible workarounds:

  1. Compile the routines that cause the problem with lower optimisation, -O1. The routines affected are: sltend.F90, vsurf_mod.F90, vdfmain.F90, vdfhghtn.F90.
  2. Run with the environment variable: DR_HOOK_IGNORE_SIGNALS=8 to disable trapping of floating point exception signals (SIGFPE) by the model. This is not ideal as it will not catch other causes of floating point exceptions.
  3. Edit the code and insert the line:

    Code Block
     !DEC$ OPTIMIZE:1

    directly after the SUBROUTINE statement into the routines: sltend.F90, vsurf_mod.F90, vdfmain.F90, vdfhghtn.F90.

  4. Edit the intel-*.cfg configuration files in make/cfg and add lines to change the compile options specifically for these files.

OpenIFS uses a default of -O1 in the configuration files. If you increase the optimisation level, please be aware of this issue.

For more help with this issue, please contact openifs-support@ecmwf.int.

Panel
bgColorwhite
titleBGColorlightgrey

OpenIFS fails writing GRIB if grib_api compiled with Intel and -O2

We are aware of a problem in grib_api when using the Intel compiler that seems to affect different versions of grib_api and causes the model to fail with a floating point exception (SIGFPE). This is known to happen in the routine PRESET_GRIB_TEMPLATE or in the GRIB_F_SET_REAL8_ARRAY in the grib_api library. The advice is to reduce the optimization level when compiling grib_api to -O1 rather than -O2 or try a more recent version of the Intel compiler.

The error message that typifies this problem is:

Code Block
titleOpenIFS log file
collapsetrue
***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time =    3.10
JSETSIG: sl->active = 0
signal_harakiri(SIGALRM=14): New handler installed at 0xabfa00; old preserved at 0x0
***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time =    3.10
[myproc#1,tid#1,pid#14063]:  MASTER
[myproc#1,tid#1,pid#14063]:   CNT0<1>
[myproc#1,tid#1,pid#14063]:    SU0YOMB
[myproc#1,tid#1,pid#14063]:     SU_GRIB_API
[myproc#1,tid#1,pid#14063]:      PRESET_GRIB_TEMPLATE
JSETSIG: sl->active = 0
signal_harakiri(SIGALRM=14): New handler installed at 0xabfa00; old preserved at 0x0

or a traceback like this:

Code Block
languagetext
titleTraceback
collapsetrue
[gdb__sigdump] : Received signal#8(SIGFPE), pid=-1
[LinuxTraceBack]: Backtrace(s) for program 'oifs38r1/make/intel_mkl-opt_conv/oifs/bin/master.exe' (pid=38451) :
(pid=38451): oifs38r1/src_conv/ifsaux/utilities/linuxtrbk.c:109  :  master.exe() [0xc14a2d]
(pid=38451):      oifs38r1/src_conv/ifsaux/support/drhook.c:884  :  master.exe() [0xac8ddb]
(pid=38451):                                          <Unknown>  :  libpthread.so.0(+0xf7e0) [0x7f59e215b7e0]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(log.L+0x23c) [0x7f59e60db98c]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xa7de4) [0x7f59e60a7de4]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0x9c6d4) [0x7f59e609c6d4]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(grib_pack_double+0x18) [0x7f59e6079847]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xc4814) [0x7f59e60c4814]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xc4890) [0x7f59e60c4890]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(grib_set_double_array_internal+0x68) [0x7f59e60c4921]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xa3a4a) [0x7f59e60a3a4a]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(grib_pack_double+0x18) [0x7f59e6079847]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xc4814) [0x7f59e60c4814]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xc4890) [0x7f59e60c4890]
(pid=38451):                                          <Unknown>  :  libgrib_api.so.0(+0xc4b3f) [0x7f59e60c4b3f]
(pid=38451):                                          <Unknown>  :  libgrib_api_f90.so.0(grib_f_set_real8_array_+0x51) [0x7f59e6380aea]
(pid=38451):                                          <Unknown>  :  libgrib_api_f90.so.0(grib_api_mp_grib_set_real8_array_+0x8a) [0x7f59e63858af]
(pid=38451): oifs38r1/src_conv/ifsaux/module/grib_api_interface.F90:358  :  master.exe() [0xb03bbd]

Note that the grib packing can also fail if the model has produced fields with a very large range of values, such that the grib library can't pack the values into a smaller bit range. For further help, please contact openifs-support@ecmwf.int.

 

 

Panel
bgColorlightblue
borderStylenone

GNU compilers

Panel
bgColorwhite

OpenIFS 38r1 fails with gfortran version 5 compiler

OpenIFS 38r1 is known to fail when using the gfortran/gcc version 5.2 compiler. The error is:

Code Block
SUDIM1; after call to read(namgfl), nmfdiaglev =            0
		Error in `../make/gnu-noopt/oifs/bin/master.exe': double free or corruption (out): 0x0000000009fafd90 ***

If this occurs we recommend using version 4.8.1 of the gnu compilers. There is currently no fix for this issue with OpenIFS based on the 38r1 release.

Later versions of OpenIFS (40r1+) do not fail.

 

 

Panel
bgColorlightblue
borderStylenone

Cray

Panel
bgColorwhite
titleBGColorlightgrey

Cray ATP does not work

This is caused by the way IFS creates its own signal handler. To enable Cray ATP set:

Code Block
export DR_HOOK_IGNORE_SIGNALS=-1

in the job script to completely disable any signal trapping by the IFS signal handler code 'DrHook'.

Contact openifs-support@ecmwf.int for assistance.

Panel
bgColorwhite
titleBGColorlightgrey

CrayPAT does not work

This is a result of the way in which the OpenIFS is compiled. More information on this and the resolution is described here.

 

 

 


Navigation Map
openifs-known-issues
openifs-known-issues
cellWidth200
wrapAfter3
cellHeight50
titleOpenIFS known issues




Navigation Map
known-issues
known-issues
cellWidth200
wrapAfter3
cellHeight100
titleOther known issues


HTML
<script type="text/javascript" src="https://

...

jira.ecmwf.int

...

/s/en_UKet2vtj/787/12/1.2.5/_/download/batch/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector.js?collectorId=5fd84ec6"></script>

Excerpt Include
Credits
Credits
nopaneltrue