Output-file: JOB STATISTICS ============== Job ID: 809697 Cluster: snellius User/Group: batelaant/batelaant State: OUT_OF_MEMORY (exit code 0) Nodes: 3 Cores per node: 128 CPU Utilized: 1-02:06:48 CPU Efficiency: 87.75% of 1-05:45:36 core-walltime Job Wall-clock time: 00:04:39 Memory Utilized: 2.86 TB (estimated maximum) Memory Efficiency: 104.03% of 2.75 TB (7.32 GB/core) Error-file: 14:11:16 STEP 0 H= 0:00 +CPU= 95.788 STEP 0 :## EC_MEMINFO | TC | MEMORY USED(MB) | MEMORY FREE(MB) INCLUDING CACHED | %USED %HUGE STEP 0 :## EC_MEMINFO | Malloc| Inc Heap | Numa node 0 | Numa node 1 | | STEP 0 :## EC_MEMINFO Node Name | Heap | RSS(tsk) | Small Huge or | Small Huge or | Total | STEP 0 :## EC_MEMINFO | (tsk) | Small Huge | Only Small | Only Small | Memfree+Cached | STEP 0 :## EC_MEMINFO 1 fcn20 922 6347 0 564 79626 645 81896 659398 1507 1.0 0.0 s/p 14:12:46 STEP 1 H= 0:07 +CPU= 88.252 STEP 1 :## EC_MEMINFO 1 fcn20 2134 6402 0 486 60384 1473 61020 493588 1512 1.3 0.0 s/p slurmstepd: error: mpi/pmix_v2: _errhandler: fcn20 [0]: pmixp_client_v2.c:211: Error handler invoked: status = -25, source = [slurm.pmix.809697.0:47] srun: Job step aborted: Waiting up to 32 seconds for job step to finish. slurmstepd: error: *** STEP 809697.0 ON fcn20 CANCELLED AT 2022-04-27T14:13:23 *** slurmstepd: error: Detected 5 oom-kill event(s) in StepId=809697.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. slurmstepd: error: Detected 1 oom-kill event(s) in StepId=809697.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. srun: error: fcn27: tasks 128-255: Out Of Memory srun: launch/slurm: _step_signal: Terminating StepId=809697.0 slurmstepd: error: mpi/pmix_v2: _errhandler: fcn20 [0]: pmixp_client_v2.c:211: Error handler invoked: status = -25, source = [slurm.pmix.809697.0:35] srun: error: fcn29: tasks 256-383: Killed ifs.stat: 14:09:18 000000000 CNT3 -999 4.152 4.152 5.373 0:00 0:00 0.00000000000000E+00 0GB 0MB 14:09:41 A00000000 STEPO 0 26.161 26.161 28.175 0:26 0:28 0.49588412029193E-04 0GB 0MB 14:09:41 0AA000000 STEPO 0 0.000 0.000 0.003 0:26 0:28 0.49588412029193E-04 0GB 0MB 14:09:53 FULLPOS-S DYNFPOS 0 11.570 11.570 11.665 0:38 0:40 0.49588412029193E-04 0GB 0MB 14:10:09 0AAA00AAA STEPO 0 16.458 16.458 16.528 0:54 0:56 0.49588412029193E-04 0GB 0MB 14:11:16 0AAA00AAA STEPO 1 66.713 66.713 67.120 2:01 2:03 0.49394979039009E-04 0GB 0MB 14:12:47 0AAA00AAA STEPO 2 88.209 88.209 90.534 3:29 3:34 0.49186674044295E-04 768GB 0MB