Hi all

We've got a new HPC at Kiel University that is a SX vector machine from NEC and I got the question if we can compile and run OpenIFS on it, to which I have to admit I have no idea. 

There are compiled NEC math libraries (including LAPACK and BLAS) as well as grib_api 1.23. Only NEC Fortran/C compilers are available. 

I see in the OpenIFS source code (cy40r1v2) that there are flags for "NECSX", so that sounds like someone has used OpenIFS on NEC SX machines before. Does anyone know who? 

My initial attempt was to take the cfg file for intel-opt and just "translate" from Intel to NEC flags (e.g. -O1 etc), but that kept failing. Then I activated the NECSX compile option, but that did not work either. 

Apart from not finding grib_api (might be a problem with the grib_api installation which I think has not really been tested), I get a lot of errors from the WAM code, e.g. 

FAIL mpinfort -oo/cigetdeac.o -c -DLINUX -DINTERCEPT_ALLOC -DNECSX -I./include -g -traceback -O1 -fopenmp /gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/wam/Wam_oper/cigetdeac.F # rc=1
FAIL Warning: /gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/wam/Wam_oper/cigetdeac.F, line 571: Unused local variable IH
FAIL Warning: /gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/wam/Wam_oper/cigetdeac.F, line 571: Unused local variable IT
FAIL Warning: /gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/wam/Wam_oper/cigetdeac.F, line 571: ITEST explicitly imported into CIGETDEAC but not used
FAIL Warning: /gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/wam/Wam_oper/cigetdeac.F, line 571: IU06 explicitly imported into CIGETDEAC but not used
FAIL Obsolescent: /gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/wam/Wam_oper/cigetdeac.F, line 1: Fixed source form
FAIL Error: /gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/wam/Wam_oper/cigetdeac.F, line 44: No specific match for reference to generic DR_HOOK
FAIL Error: /gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/wam/Wam_oper/cigetdeac.F, line 569: No specific match for reference to generic DR_HOOK
FAIL NEC Fortran Compiler error termination, 2 errors, 5 warnings

and similarly for many other WAM files. 

Does anyone have experience in compiling OpenIFS on NEC compilers and could share a cfg file? 


Many thanks for any help!

/Joakim  

compile_openifs.sh

nec-opt.cfg

compile_oifs.txt


7 Comments

  1. Unknown User (nagc)

    Hi Joakim,

    IFS has been run on a NEC machine in the past. See this article I found from previous HPC workshops for instance: https://www.ecmwf.int/en/elibrary/14450-performance-ecmwf-ifs-t799-l91-spectral-model-nec-sx-8-vector-supercomputer and I believe Meteo-France also ran their version of IFS on a NEC.

    You will need to change the NPROMA & NRPROMA values as this sets the inner loop (vector) length. I can talk to internal people here to help with performance. But first, we need to get the model compiling. I have no experience building on a NEC specifically but alot of experience with vector machines.

    Looking at your compile_oifs.txt log, the first set of errors are from compiling sgemmx.F.   I think these are being generated because the compiler is not running the preprocessor first to resolve the preprocessor '#if defined' lines in this file. Is there an extra compile option to enable the preprocessor first?  Often if a fortran file has a capital as the extension, that is, '.F' rather than '.f', this automatically runs the preprocessor first, but it looks like this is not happening with the NEC compiler. Check the man page for the compiler for how to enable the preprocessor?  You can always run the C preprocessor 'cpp' to resolve these preprocessor lines (keeping the -DNECSX). OpenIFS only uses preprocessor directives in the lower level routines so there are not many of them.

    Same applies for the [FAIL] lines for the code in emos/gribex & emos/pbio.

    For the WAM code (and all fixed format '.F' fortran file), make sure you are using the autopromote '-r8' or equivalent for all this code. I suspect this is what's causing the problem with the wam code when it says it can't find a specific match for generic DR_HOOK; it's looking for a specific subroutine with the wrong bit-length for variables.  In the make/oifs.cfg file you'll see lines like this:

    oifs.prop{fc.flags}[wam]                        = $OIFS_FFIXED $OIFS_FFLAGS

    and in the make/cfg/intel-opt.cfg file (for example):

    $OIFS_FFIXED{?} = -r8

    which sets all fixed format F77 files to be auto-promoted to real*8.  I see in your nec-opt.cfg file you are not using any kind of real*8 flag?

    I think the above will fix alot of the errors. The grib-api ones could be because the grib-api library was not compiled with the same compilers (assuming the paths to the library & module are correct).

    I suggest compiling with no optimization to start with and see if the model will run a few timesteps before increasing the optimization level.

    Let us know how you get on.  Hope that helps.

                 Glenn

  2. Unknown User (joakimkjellsson@gmail.com)

    Hi Glenn

    Many thanks for the help! I played around with -fdefault-real=8, but I think I had placed it on the wrong line in the cfg file. I also added OIFS_GRIB_API_INCLUDE to the "WAM" files, and now WAM does not complain during compile anymore. Not sure why this was not a problem on x86 with Intel or GNU Fortran before. 

    The only compile error I've got left now is

    FAIL mpincc -oo/linuxtrbk.o -c -DLINUX -D_ABI64 -DINTERCEPT_ALLOC -DNECSX -I./include -g -O0 -I//sfs/fs5/sw/Aurora/grib-api/grip-api1.23.1/usr//include /gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/ifsaux/utilities/linuxtrbk.c # rc=1
    FAIL "/gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/ifsaux/util
    FAIL ities/linuxtrbk.c", line 43: catastrophic error: cannot open source
    FAIL file "execinfo.h"
    FAIL #include <execinfo.h>
    FAIL ^
    FAIL
    FAIL 1 catastrophic error detected in the compilation of "/gpfs/fs6/home-geomar/smomw352/models/openifs-cy40r1/oifs-40r1/src/ifsaux/utilities/linuxtrbk.c".
    FAIL compile 0.1 ! linuxtrbk.o <- ifsaux/utilities/linuxtrbk.c

    I see in the code that "execinfo" gets included due to the compile flag LINUX. 

    Should I tell the code to include some other library on a NECSX machine? 

    Many thanks for the link to the performance report, I'll check it out! 

    Cheers

    Joakim 

  3. Unknown User (nagc)

    Hi Joakim,

    I'd forgotten to talk to the IFS engineers about the NEC code. I will do that next week and see if I can get you a traceback code suitable for the NEC and other tips for running OpenIFS on a vector machine.

    I think you can live without the traceback code for the time being.  I suggest making a new src file with dummy subroutines to get it compiled. A quick way to do this is to use the C preprocessor and don't define any symbols.  e.g.

    cd src/ifsaux/utilities/
    mv linuxtrbk.c linuxtrbk.c.orig
    # comment out the #include lines at the top otherwise the preprocessor will include the files
    gcc -E linuxtrbk.c.orig > linuxtrbk.c

    This will generate the new linuxtrbk.c file with dummy subroutines that should compile fine. You'll then need to copy the #include lines back to the new linuxtrbk.c.

    There is another part of the model code you need to make some changes. There is a logical switch called 'LOPT_SCALAR', that's in NAMELIST NAMCT0. This sets some internal dimensions suitable for cache based machines. You will need to set this to FALSE. See the code in ifs/setup/sumpini.F90. Around line 174, you'll see variables set depending on the hardware model specified by the #defines. I suggest you add one in for the NECSX and set LOPT_SCALAR=false.

    Keep in touch. I'll write again when I have more info about the traceback and configuration.

    Cheers,  Glenn


  4. Unknown User (joakimkjellsson@gmail.com)

    Hi Glenn

    Thanks for the reply! But following your instructions, I seem to not get the same result as you. I'm attaching the file, and I've tried to comment out (using "!!") the "define" statements at line 23-24, but when I try to preprocess I get

    smomw352@nesh-fe4-adm utilities$ gcc -E linuxtrbk.c.orig > linuxtrbk.c
    gcc: warning: linuxtrbk.c.orig: linker input file unused because linking not done

    But all I get is a completely  empty linuxtrbk.c file. Am I doing this all wrong? Should I remove all "define" statements throughout the file? 

    Best regards

    Joakim


  5. Unknown User (nagc)

    Hi Joakim,

    Sorry, I've just noticed I said comment out the #define lines above, when I meant to say comment out the #include lines. Otherwise the preprocessor (gcc -E) will include the files listed which is not what you need.

    As this is C code you need to comment out using C-style rather than fortran '!!':

    /*
    #include ...
    */

    Simplest to delete the #include lines, run gcc -E and then copy back in (so it compiles correctly).

    I've attached the resulting code to this comment. We've also sent a request to MeteoFrance to ask about NEC specific changes. Can't promise we'll get a quick response.

    Cheers,  Glenn

    File: linuxtrbk.c

  6. Unknown User (joakimkjellsson@gmail.com)

    Hi Glenn

    I still can't get the dummy linuxtrbk.c file to work. I've commented out the "include" lines in linuxtrbk.c.orig and then run 

    gcc -E linuxtrbk.c.orig > linuxtrbk.c

    but as before I keep getting an empty file. So I just took the file you included in the previous post instead. 

    With that file, and making sure I load the correct LAPACK/BLAS libraries, I can now compile OpenIFS cy40r1 on our NEC SX Aurora machine (smile)  I'm attaching my cfg file and the script to compile if anyone is interested. 

    I haven't tried running yet, but I'll get to it soon. Do you have any recommendations for changes in the namelist? You mentioned that NPROMA and a few other things may have to change. 

    Cheers

    Joakim

     nec-opt.cfgcompile_openifs.sh

  7. Unknown User (nagc)

    Hi Joakim,

    No idea why that doesn't work for you?

    Yes, you'll need to experiment with the NPROMA and NRPROMA settings and try increasing them to test larger vector lengths. We sent a request to MeteoFrance for their suggestions on using NEC hardware but have not had anything back. There may be some more help in the presentation link I posted above.

    Cheers,  Glenn