You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

IBM (xlf) issues

 When compiling grib-api on IBM architectures with the XLF compiler, we recommend disabling the creation of "shared libraries" and use static libraries only. This can cause runtime errors.

For more information, please see Installing grib-api.

Portland compiler issues (pgi)

We are aware of a number of problems using OpenIFS with the Portland compiler (multiple versions). Reports indicate the model can hang or the compilation can hang.

Support for PGI will be withdrawn from OpenIFS version 38r1v05 onwards. We encourage users to use GNU (gfortran), Intel or Cray compilers instead.

OpenIFS can fail with Intel compiler at -O2

There is an issue with OpenIFS when compiling with the Intel compiler at optimization level -O2 or above on chipsets that support SSE4.1 & AVX instructions.

Users will see failure with the T21 test job similar to the following:

Sample failure message
signal_harakiri(SIGALRM=14): New handler installed at 0x432110; old preserved at 0x0
 ***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time = 6.18
 myproc#1,tid#1,pid#27600,signal#8(SIGFPE): Received signal :: 123MB (heap), 125MB (rss), 0MB (stack), 0 (paging), nsigs 1, time 6.18
 tid#1 starting drhook traceback, time = 6.18
 myproc#1,tid#1,pid#27600: MASTER 
 myproc#1,tid#1,pid#27600: CNT0<1> 
 myproc#1,tid#1,pid#27600: CNT1 
 myproc#1,tid#1,pid#27600: CNT2 
 myproc#1,tid#1,pid#27600: CNT3 
 myproc#1,tid#1,pid#27600: CNT4 
 myproc#1,tid#1,pid#27600: STEPO 
 myproc#1,tid#1,pid#27600: SCAN2H 
 myproc#1,tid#1,pid#27600: SCAN2M 
 myproc#1,tid#1,pid#27600: GP_MODEL 
 myproc#1,tid#1,pid#27600: EC_PHYS_DRV 
 myproc#1,tid#1,pid#27600: >OMP-PHYSICS CLDPP T/S (1002) 
 myproc#1,tid#1,pid#27600: EC_PHYS 
 myproc#1,tid#1,pid#27600: CALLPAR 
 myproc#1,tid#1,pid#27600: SLTEND 

It arises because this compiler makes use of 2-way vectorization when compiling both branches of IF statements which can generate floating point exceptions if a zero divide is possible in the unexecuted branch and the IFS internal signal handler (DRHOOK) is enabled.

There are several possible workarounds:

  1. Compile the routines that cause the problem with lower optimisation, -O1. The routines affected are: sltend.F90, vsurf_mod.F90, vdfmain.F90, vdfhghtn.F90.
  2. Run with the environment variable: DR_HOOK_IGNORE_SIGNALS=8 to disable trapping of floating point exception signals (SIGFPE) by the model. This is not ideal as it will not catch other causes of floating point exceptions.
  3. Edit the code and insert the line:

     !DEC$ OPTIMIZE:1

    directly after the SUBROUTINE statement into the routines: sltend.F90, vsurf_mod.F90, vdfmain.F90, vdfhghtn.F90.

For more help with this issue, please contact openifs-support@ecmwf.int.

This issue has been fixed in oifs38r1v05 and above.

OpenIFS fails in PRESET_GRIB_TEMPLATE if grib_api compiled with Intel and -O2

We are aware of a problem in grib_api when using the Intel compiler that seems to affect different versions of grib_api and causes the model to fail with a floating point exception (SIGFPE) in the routine PRESET_GRIB_TEMPLATE. The advice is to reduce the optimization level when compiling grib_api to -O1 rather than -O2.

The error message that typifies this problem is:

OpenIFS log file
***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time =    3.10
JSETSIG: sl->active = 0
signal_harakiri(SIGALRM=14): New handler installed at 0xabfa00; old preserved at 0x0
***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time =    3.10
[myproc#1,tid#1,pid#14063]:  MASTER
[myproc#1,tid#1,pid#14063]:   CNT0<1>
[myproc#1,tid#1,pid#14063]:    SU0YOMB
[myproc#1,tid#1,pid#14063]:     SU_GRIB_API
[myproc#1,tid#1,pid#14063]:      PRESET_GRIB_TEMPLATE
JSETSIG: sl->active = 0
signal_harakiri(SIGALRM=14): New handler installed at 0xabfa00; old preserved at 0x0

tail NODE.001_01
 - Set up F-post processing, part 2----------------------------------
 YDSL%CVER=FP YDSL%NASLB1=   1053 YDSL%NASLB1_TRUE=     79
 *** YRFP%NASLB1 RESET TO NPROMA*NGPBLKS=          48
 THE POST-PROCESSING RESOLUTION IS NEVER COARSER THAN THE MODEL RESOLUTION
 ARRAY  SSEC2     ALLOCATED      132     132
 SUBFPOS: case LFPDISTRIB=F
 NFPROMA=NFPROMA_DEP; NFPBLOCS=NFPBLOCS_DEP
 NFPSTART=NFPSTART_DEP; NFPEND=NFPEND_DEP
 NFPSORT=NFPSORT_DEP; NFPBLOFF=NFPBLOFF_DEP

 SUFPIOS PRINTS OUT
 NFPXFLD =   -999
 - Set up GRIB API usage----------------------------------
 ABOR1 CALLED
 Dr.Hook calls ABOR1 ...

Cray ATP does not work

This is caused by the way IFS creates its own signal handler. To enable Cray ATP set:

export DR_HOOK_IGNORE_SIGNALS=-1

in the job script to completely disable any signal trapping by DrHook.

This issue has been fixed in OpenIFS releases 38r1v05 and beyond. For previous releases, either use the fix above or contact openifs-support@ecmwf.int for assistance.

CrayPAT does not work

This is a result of the way in which the OpenIFS is compiled. More information on this and the resolution is described here.

 

 


  • No labels