I experienced some odd behaviour today with cy40r1 of OpenIFS. I am running an aqua-planet configuration with both Tq63 and Tq106 horizontal resolution. Both experiments worked fine. I then added the grib codes 121 and 122 (Maximum temperature at 2 metre in the last 6 hours and Minimum temperature at 2 metre in the last 6 hours) to the MFPPHY namelist option. This was the only change. With both model resolutions, the model ran successfully (and wrote output files) for 64 days and then crashed. In the NODE file there was this error

 NSTEP =  3072 SCAN2M_HPOS  P

 FIELD IN BUFFER BUT PP NOT REQUESTED ...******

 IO-STREAM SETUP - IOTYPE =           2  NUMIOPROCS =           1  CPATH =

 ICMGGguiu+003072 MODE=w


and in the error file

 13:44:02 STEP 3071 H=1535:30 +CPU=  0.077
 GRIB_SET_INT           7 endStep     5529600  FAILED         -25
 GRIB_API ERROR MSG: Unable to set step
 GRIB_SET_INT           7 endStep     5529600  FAILED         -25
 GRIB_API ERROR MSG: Unable to set step
MPL_ABORT: CALLED FROM PROCESSOR      3 THRD     1
MPL_ABORT: CALLED FROM PROCESSOR      4 THRD     1
 MPL_ABORT: THRD           1   GRIB_SET_VALUE FAILED
 MPL_ABORT: THRD           1   GRIB_SET_VALUE FAILED

I have no idea what is causing this.  However it is not a big problem for me as I don't really need these two diagnostics. I am just writing this to let others know of this problem.

Victoria

7 Comments

  1. Unknown User (nagc)

    Hi Victoria,

    Thanks for noting this. It is a known issue though we don't have a fix yet. The problem is the 'endStep' has too high an integer value to be encoded into the grib file, so it's related to the length of the run.

    When we have a fix for it, I'll post an update here.

    Glenn

  2. Unknown User (jstreffi)

    I just ran into the same error.

    000:   13:43:45 STEP     2047 H=     1535:15 +CPU=  0.358
    028:  GRIB_SET_INT           7 endStep     5529600  FAILED         -25
    028:  GRIB_API ERROR MSG: Unable to set step

    Thank you for providing the information about it here. That made it easy to guess that its grib codes 201 and 202 for me.

     201  MX2T          Maximum 2 metre temperature since previous post-processing [K]
     202  MN2T          Minimum 2 metre temperature since previous post-processing [K]

    Best regards,

    Jan

  3. I have just had the same problem again. Same time step (step 2047) but this time the cause of the problem was grib code 49 which is  "10 metre wind gust since previous post-processing". Shortname 10fg.

    This was with cy40r1. Is there a fix for this yet?

    Victoria


  4. Unknown User (nagc)

    Hi Victoria,

    I never completely tracked down the problem but I think it's related to a grib packing problem with the fields that require a difference between successive timesteps, that happens on the longer runs. One option is to follow what EC-Earth do and restart the model and reset the date offset so the grib packing doesn't see such big numbers. At least, I think that's the problem.

    Glenn

  5. Strangely I had today the same error with openifs43r3 using the T255L91 resolution and a 11 day forecast (no restarts).

    I was able to run the forecast when removing 201 and 202 variables from the output namelist.

    I think these variables are fairly standard and cannot be recovered from raw output, so it would be nice to fix it soon.

    Please let me know if you need more information about the setup. Here is the output of the job runscript.

    This happened on day 6 of the forecast. Maybe the initial condition file has something funny?

     13:50:36 STEP  791 H= 131:50 +CPU=  0.336
    GRIB_SET_INT           8  endStep       475200  FAILED         -25
    GRIB_API ERROR MSG: Unable to set step
    MPL_ABORT: CALLED FROM PROCESSOR     17 THRD     1



    1. actually I also had to disable the following variable (in addition to 201 and 202 already mentioned above), it now works fine!

      10 metre wind gust since previous post-processing10fgm s-149
  6. Unknown User (nagc)

    Hi Etienne,

    Yes, this is a known problem with the variables you've identified. It affects the accumulated variables which compute difference between steps. This was originally a problem with OpenIFS 40 but also occurs with 43.

    The IFS and OpenIFS code is identical in computing these fields. However, the big difference is the output. OpenIFS writes the fields through the master task only, using the 'older' I/O approach. The IFS writes out via a different method using its parallel I/O server to the Field Database (FDB).  When I looked at this before, my suspicion was the different output code parts accounted for the behaviour. I've not had time to get back to look at this as more urgent things tend to take over.   This isn't something I've got time to look at any time soon either unfortunately.

    Glenn