In the first stage of training, standard practice is to train the model on a 6-hr interval. 

In the second stage of training, we use 'rollout' and fine-tune the model on errors up to 72hrs (ie. 6, 12, 18, 24 etc.).

In order to do this you must change your batch script as follows:

srun aifs-train --config-name=**config-file** \
    hardware.files.warm_start=last.ckpt \
    training.run_id=8bdfd7a1-294c-41f5-8ffc-952d6170c9e9
    training.max_epochs= **epoch_num from first run** + 12

Here the training.run_id is the name of the folder where the checkpoints are stored on scratch. You can find this run_id through Weights & Biases:

and change the following options in your config file

defaults:
  - dataloader: rollout
training:
  rollout:
    epoch_increment: 1
    max: 12

If you prefer you can also put the run_id, warm_start and max_epochs in your config file.