Hi all

I'm running OpenIFS T511L91 coupled to NEMO ORCA05L46 via OASIS3-MCT2.8. I've got the issue that the model runs out of memory when I give it too many compute nodes. 

The "OOM Killer" (Out Of Memory) stops the model if I use ~ 2000 CPUs for OpenIFS, and it is always the OpenIFS processes that run out of memory. However, if I reduce down to ~1500 CPUs, it's not a problem anymore. This sounds like a memory leak to me, and this is what the support team suggested. The error message does not point to a specific routine. 

It happens about 2-3 months into the run (step 8400 * 900s = 87,5 days). More CPUs makes it crash sooner, fewer CPUs allows for longer run or no crash at all. 
This sounds like OpenIFS is accumulating memory over time, and the accumulated memory is somehow a function of the number of MPI tasks. 

I'm running essentially unchanged namelist from the source code at ftp.ecmwf.int. I've activated some more output, and run longer runs. 

The hardware is (https://www.hlrn.de/home/view/System4/AtosQuickstartGuide#The_HLRN_IV_System): 2x Intel Skylake Gold 6148 CPUs (40 cores) per node with 192 GB memory. 

I've never really understood what the NPROMA parameter does, but it know it sort of splits the grid on each MPI task into "chunks". Could this be the issue? 

Does anyone have similar configurations? I think EC-Earth will run T511 + ORCA025 for a few CMIP6 runs, and obviously ECMWF has run forecasts of a lot higher res, but maybe not as long as 2 months? 

It's not a terrible problem. T511L91 + ORCA05 runs 4-5hr/simulated year at 1500 + 400 CPUs for OpenIFS + NEMO, which is completely fine, but it would be great to speed things up even further. 


Best regards
Joakim 

2 Comments

  1. Unknown User (nagc)

    Hi Joakim,

    I've seen OOM errors before. It's not impossible there's a memory leak in the IFS code itself but unlikely (I looked at this when I was preparing the code). The memory leaks might be coming from other sources, like the MPI library for instance. I suspect the high-water memory use is when OpenIFS is running (often it's in the radiation code the memory use peaks) and that's why the code crashes there.

    How are you distributing the MPI tasks on the nodes?  Have you tried "under populating" the nodes, that is, putting less MPI tasks on each node to give each task more memory.  You don't say what parallel combination you are using: MPI tasks .v. OpenMP threads?  Try using more threads and less tasks; say 4 MPI tasks & 8 openmp threads per node? 

    I also recently saw a post on the EC-Earth portal about problems running the EC-Earth model (IFS based + NEMO) on a skylake machine with Omni-path connectivity fabric. See Issue #533. That might help, or you could contact the person on the issue.

    I don't have a skylake machine to try unfortunately.

    HTH,    Glenn

  2. Unknown User (joakimkjellsson@gmail.com)

    Hi Glenn

    Interesting! Sounds like it could be a hardware issue then? I'm not too familiar with what "fabrics" are, but I know I had to define a bunch of environment variables, e.g.

    export I_MPI_FABRICS=shm:ofi

    export I_MPI_OFI_PROVIDER=psm2

    to get OpenIFS + NEMO to run at all on the new machine, and I think this is the "Omni-path" fabric. It is also a brand-new machine, and there could be some issues they haven't worked out yet. 

    So far I've always used 1 thread per core, but I can definitely try with 2 or 4 per core. I tried "under populating" the nodes, but apparently this can not be done for only one executable on our machine, so I would have to also underpopulate nodes with NEMO, which does not use OpenMP at all if I remember correctly. 

    I'll check some more and see how it goes!

    Thanks for the suggestions! 

    Joakim