Good evening

I'm wondering if there is a tool or guide on how to "glue" and "split" the restart files (srf) so that the restart can be run on a different number of processors than the previous leg.

For example, I've got a run with 143 processors that produces 143 "srf" files. Is there a way to make this into 287 "srf" files so that I can continue the experiment with 287 processors?

NEMO has an option to "glue" all restart files into one global file, and then restart from the global file. That way each restart can be run on any number of processors.

Best wishes
Joakim

26 Comments

  1. Unknown User (nagc)

    Hi Joakim,

    I'm not aware of any tool that allows the srf, restart, files to be rearranged in that way. The advice I've had in the past is not to change the decomposition across restarts. I think it can affect the mass conservation but my memory is vague on this now.  It's probably not easy because of the spectral nature of the dynamics and you also have to deal with the wave model restart files too.  I'll ask someone in ECMWF who might know but I suspect the answer is no on this one.

    Cheers,  Glenn

  2. Unknown User (nagc)

    Hi Joakim,

    I asked someone from the team who work with seasonal coupled runs and the answer was (slightly paraphrased):

    "I don't think it is really possible to restart a run with different number of MPI tasks. In principle would be possible to write a tool which takes as set of N restart files and reshuffle the data to another parallel distribution with M restart files but that would be a major task since the tool would have to know the parallel distribution in both grid point and spectral space for all possible grids and parallel distributions."

    Plus it would need to deal with the wave model restarts.

    Personally, I'd rather just live with the smaller number of MPI tasks and remember to do it differently next time!

    Cheers,  Glenn

  3. thanks for the clarifications Glenn. This topic is of high relevance for climate prediction and it is unfortunate that we do not (and by the looks of it probably never) have the ability to "rebuild" the ifs restarts.

  4. Unknown User (nagc)

    I'm not sure I understand why you would want to rearrange the restart files, and hence the parallel decomposition, once the run has started?  Is it so you can start the run on one machine and restart on another?

    Another approach is to convert the restart files to 'initial files' and treat the model run as a brand new one. I believe this is what EC-Earth do with version 3, though you lose some accuracy doing it this way because the model files only have 1 time level.

    Glenn

    1. Hi Glenn. I spent lots of time working on this feature for EC-Earth3, which is saving special output on model levels and generating an initial condition file. You loose the tendencies and it's not a proper restart. We would like to be able to save the restart to initialize another run for perfect model studies, among other use cases.

  5. Unknown User (nagc)

    p.s.  I don't think we're saying it can't be done. Only that it's a lot of work to develop the tool to do it and we don't have one.

    1. Sorry if I sounded pessimistic. I meant to say that this is not important for ECMWF so may never happen. This is not the first time people ask about this and are usually surprised, given that other models (like NEMO) can do it. But it doesn't have to deal with spectral space as well as grid-point space so it's probably very challenging.

      1. Unknown User (nagc)

        IFS was never intended to run as a climate model so I'm not surprised at all by these kind of issues around long runs.

        There's more than just spectral & gridpoint data. There are two time levels of grid-pt fields, surface fields, cumulative & instantaneous fluxes for the physics. It's a nasty fix to convert to initial files because these are lost.

  6. Unknown User (de3j)

    Thanks for checking, Glenn. I had a strong feeling the answer would be "no", but felt it could not hurt to ask.

    My other option would be to write output in GRIB and use the last day as initial conditions for a new run.

    Etienne, would you be willing to share a script or hint on the method you use in ECE3?
    I'm really not worried about losing precision here.
    Are you just writing output to GRIB and then modifying it in some way to treat it as a initial condition for the "restart"?

    The reason I'd like to do this is because I'd like to branch of new runs for an existing control, but these runs might be run on other machines where the number of CPUs per node is different.

    Many thanks both!

    /Joakim

    1. Hi Joakim. The process is a bit complex and requires a few steps. It's much easier if you can do a 1-day run a with special output configuration to dump the important variables in the normal ifs output. Then there is a script to extract the necessary output to the ICMGG/SH files to initialize ifs with. I can share with you the output configuration and the script required to produce the initial condition files.

      However, I never tried this with openifs43 yet so there will be some missing pieces for sure.

      1. Unknown User (de3j)

        Hi Etienne

        Could you please share this script? I might be able to work out what the differences are between 40r1 and 43r3 and get it working.
        Having this kind of script would help a lot.

        Cheers
        Joakim

  7. Unknown User (nagc)

    I wonder if it's possible to add some extra code that forces the restarts to be written out only by task 0, and consequently read by task 0 on a restart with different parallel decomposition. This would not be the normal restart step but could be enabled, say, end of every month, so that only 1 single restart file is written. It would mean adding some MPI calls to transfer the fields but code that does something similar already exists in other parts of the model. If it's not done frequently, the added overhead would be small.

    I think that's how I would first approach this as a low-cost route to getting something more flexible and workable.

    Glenn

  8. that's quite a nice idea Glenn! This would probably work for the main topic here (changing the domain decomposition/mpi processes). For Initializing a new experiment from the restarts of another run, it would also require starting ifs from scratch with restarts instead of the ICM{GG,SH} files used to initialize ifs. Do you know if this is easy? 

    1. Unknown User (nagc)

      Not sure I understand what you mean. Couldn't you treat it as a model restart rather than from the ICM* files?  The cumulative fluxes in the restarts would need to be reset to zero but otherwise I wonder if it's just as easy to do a restart?   I've never done this but I guess otherwise, it requires taking the normal upper air and surface fields found in the ICM* files from the restarts, inserting them into ICM* files and resetting dates in the GRIB?? (I'm sure you know this better than me!)

      1. I think I need to think this through a bit. I guess if you make sure the same files are there as when you start from 0, all you need is an appropriate rcf file and the associated restarts (or restart in this case). I will keep this in mind when I get to implementing this in EC-Earth4.

        1. Unknown User (nagc)

          Yes, agreed, needs thinking though carefully and in more detail. In general though, I have found it easier to get the model to do what I need rather than create some solution after on the output.

          1. Unknown User (de3j)

            I looked at the code and found that the routine "SETUP_IOSTREAM", which is used to write both the ICM* files for output as well as the "srf" files for restart, has an optional argument "KIOMASTER".
            I tried to set KIOMASTER=1 in the call to SETUP_IOSTREAM when writing restarts and I did end up with only one "srf" file for processor 1.

            However, this file seems to be just the restart for processor one, i.e the restart for the other processors are just missing.
            The srf file is a lot smaller than a normal ICMGG file, indicating that it does not hold the global grid.

            wroutspgb has the following line to write ICM* output (one file for the global grid):
            CALL SETUP_IOSTREAM(YL_IOSTREAM,'CIO',TRIM(CLFNSH),CDMODE=CLMODE,KIOMASTER=1)
            while wrresf has the following line to write restarts (one file per processor):
            CALL SETUP_IOSTREAM(YL_IOSTREAM,'CIO',TRIM(CLFN),CDMODE='w')

            I don't really understand why adding the argument KIOMASTER=1 to SETUP_IOSTREAM does not lead to one file for the global grid for the srf files, but I'm also way in over my head here...

            Is there an easy change we can do here?

            /Joakim

            1. Unknown User (nagc)

              I've not looked at the code but maybe it's doing a check if (mytask == kiomaster) write out file...

              I'll try and find time later today to have a look.  As far as I am aware, there is no MPI communication between the tasks for the writing & reading of the restart files. It would need to be added.    The restart file mechanism is best thought of as a memory dump to guard against hardware failures.

              1. Unknown User (de3j)

                Hi Glenn

                Just to follow up: Did you manage to have a look at reading / writing restarts on just the master task?

                I've got some workarounds in mind, but your solution would be the cleanest by far.

                Many thanks
                Joakim

                1. Unknown User (nagc)

                  Hi Joakim,

                  Yes, looked at the code. And it's as I thought. If you add KIOMASTER=1 to the call to SETUP_IOSTREAM, that routine will essentially just limits I/O to only the MPI task specified by KIOMASTER.  There is no code that will do any kind of MPI send to KIOMASTER unfortunately. That would need to be added, but I don't think it should go into the iostream_mix.F90 module.  It's specific to the restarts so should reside with wrresf.F90 (or wrappers around them).

                  There was alot of discussion of this today at the EC-Earth meeting.   Unfortunately the IO code in IFS is rather impenetratable. I am coming round to the idea of writing an external, separate code (but uses relevant code from OpenIFS) to read in the separate restarts and combine them after the run.

                  Cheers,  Glenn


                  1. Unknown User (de3j)

                    Glad to hear the discussion is still going. I hope you or someone else finds a way to do this.

                    Best wishes
                    Joakim

  9. according to Glenn, writing restarts is in ifs/utility/wrresf.F90 and reading in ifs/control/reresf.F90 .

    It would be great to develop an option to write only one restart (either in ifs or as a standalone utility), and then to be able to read this single restart file in ifs.

  10. Hi Joakim,

    Could you tell me how to "glue" all the NEMO restart files into one global file, and then restart from the global file? 

    I'd like to start a new experiment from existing NEMO restart files, but as you said, this existing restart files were produced with different number of produces.

    Best regards.

    Zhenqian

    1. Unknown User (de3j)

      There is a tool that comes with NEMO
      https://salishsea-meopar-tools.readthedocs.io/en/latest/nemo-tools/rebuild_nemo.html

      I usually use nocs_combine. It's fine for low-res grids, e.g. global 1° etc.

      https://salishsea-meopar-tools.readthedocs.io/en/latest/legacy_docs/nocscombine/

      Cheers
      Joakim

      1. Thanks for your answer.

        In fact, I encountered the above problem when using EC-Earth3.

        After combine each restart_*.nc,  i had set the restart directory to the merged file directory, but it doesn't work.

        Could you tell me the detail about after combined?


        Best regards.

        Zhenqian

  11. Unknown User (de3j)

    Thanks to Etienne's script that I got some time ago, I've now put together a script to take OpenIFS GRIB output and create new initial conditions (attached). oifs_grib_output_to_restart.sh

    My strategy to create restart files from a long piControl run is to continue the same run for 1 more day. Then take the output (ICMGGECE3+400001, ICMUAECE3+400001, ICMSHECE3+400001) and create new initial condtions (ICMGGECE3INIT, ICMGGECE3INIUA, ICMSHECE3INIT).

    This requires a bit of manual work etc, but its not very often that one wishes to restart a model run using a different number of tasks/threads.

    The following is required in the fort.4 namelist:

    &NAMFPC
        CFPFMT = 'MODEL'
        NFP3DFS = 10
        MFP3DFS = 133, 75, 76, 246, 247, 248, 138, 155, 130, 203
        NRFP3S = -99
        NFP2DF = 2
        MFP2DF = 152, 129
        MFP3DFP = 75, 76, 246, 247, 248, 138, 155, 130, 133, 203
        MFPPHY =  15,  16,  17,  18,  26,  27,  28,  29,  30,  31,
                  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,
                  42,  43,  66,  67,  74  139, 141, 148, 160, 161,
                 162, 163, 170, 172, 173, 174, 183, 198, 234, 235,
                 236, 238, 228008, 228009, 228010, 228011, 228012, 228013, 228014,
        228007
        NFP3DFP = 10
        NFPPHY = 50
        RFP3P = 85000.0, 50000.0, 5000.0
        NFP3DFT = 0
        NFP3DFV = 0
        NFPCLI = 0
        LFPQ = .false.
        LTRACEFP = .false.
        RFPCORR = 60000.0
    /
    

    Appears to work with OpenIFS 43r3v2. Should also work for 43r3v1, but I haven't checked.

    Note: My script assumes L91 vertical grid, so it's hard coded in there, but shouldn't be too hard to change to L62 or L137.

    Cheers and thanks all for the advice!
    Joakim