Hi,

I have been building a simple 2 node "cluster" to play around with OpenIFS. Each node has 4 cores. I'm able to run OpenIFS (the T21 test) just fine on both nodes separately with 4 processes. I'm also able to invoke the executable on node 2 from node 1 using mpirun, so I'm confident that the MPI connection/network settings etc. are configured correctly and that both nodes can talk to each other.

However, when I try to run OpenIFS with 8 processes across both nodes, it hangs with no output - not even a node file. I've tried the solutions to the "OpenIFS hangs and I don't get any output?" question in the FAQ but the problem remains. Are there any other common causes of this problem?

When I Ctrl-C the executable I can see from the stack trace that it always seems to be stuck in SUMPINI, but I don't know which line. Also, I only get back 4 copies of the stack trace, not 8 as I would expect from an 8-process invocation.

Other details about the system that might be relevant:

  • There is no shared storage yet. Both nodes have their own filesystems and OpenIFS is installed identically on both.
  • Each node has ulimit -s unlimited set in the global bashrc, so I don't believe there are any memory issues. There would probably be a segfault if that was the case.
  • I'm running the executable as mpirun -np 8 --hostfile machinefile -x LD_LIBRARY_PATH. machinefile contains the IP addresses of both nodes.

Any ideas?

Thanks!

5 Comments

  1. Unknown User (nagc)

    Hi Sam,

    Shame the traceback doesn't give a line number. It would be useful to know whether SUMPINI has got pass the CALL MPL_INIT and the c_drhook_init_signals() lines.  You could try putting a write statement in (and a call to FLUSH immediately after the write to force the output) to see where the code has got to. My guess is it's stuck in the MPL_INIT call.

    When the model starts up do you see 8 separate invocations of OpenIFS running; 4 on each node?  If not, that suggests only one node has started the MPI tasks correctly.  Are the filesystem pathnames the same on each node?

    Maybe try writing a simple MPI program that does something trivial like each task writes it's task number to an output file and then try running that on 8 cores and see if that initializes correctly? 

    Cheers,  Glenn

  2. Hi Glenn,

    I should have mentioned that I do at least get the standard list of installed signal handles by DrHook printed to screen. Also, I can see 4 processes running on each node through top. Each one seems to spend about 99% of CPU doing a whole lot of nothing.

    The filesystem pathnames are identical on both systems.

    I'll have a go at adding some write statements in SUMPINI. Should I do something like:

    USE YOMLUN, ONLY: NULOUT

    WRITE(NULOUT,*) "Here"

    CALL FLUSH(NULOUT)

    ?

    I'll try your minimal MPI program as well.

    Thanks,

    Sam

  3. I've narrowed the problem down to this line in ifsaux/module/mpl_groups.f90:

    CALL MPI_CART_CREATE(MPL_COMM_OML(1), 2, idims, ltorus, lreorder, &
                       & MPL_COMM_GRID, ierr)

    Here are the values of the arguments (for 4 tasks across 2 nodes):

    MPL_COMM_OML(1) = 3
    idims = (/ 2, 2 /)
    ltorus = (/ .false., .false. /)
    lreorder = .false.
    MPL_COMM_GRID = 0
    ierr = 512

    I printed numbers in various subroutines leading up to this call to MPI_CART_CREATE so see where it's getting stuck. However, I only see the output of the print statements from the master node tasks, not the slave(s).

    Here are the setups that work:

    • 1-4 MPI tasks on 1 node
    • 2 MPI tasks across 2 nodes (!!)

    Here are the setups that don't work:

    • more than 2 MPI tasks across 2 or more nodes

    Any ideas now? I'm really stumped by this.

    Incidentally, I now have openifs installed in an NFS directory mounted on all nodes so all nodes see the same directory/path structure.

  4. By the way, here is a simple program that does NOT hang:

    program main
        USE mpi
        implicit none
        integer :: old_comm, new_comm, ierr
        integer, DIMENSION(2) :: dim_size
        logical ::  reorder
        logical, DIMENSION(2) :: periods
    
        call MPI_INIT(ierr)
        old_comm = MPI_COMM_WORLD
        dim_size = (/ 2, 2 /)
        periods = (/ .false., .false. /)
        reorder = .false.
        call MPI_CART_CREATE(old_comm, 2, dim_size, periods, reorder, new_comm, ierr)
        print *, ierr
        call MPI_Finalize(ierr)
    end program

    Invoked with:

    mpirun --np 4 --map-by node --hostfile machinefile a.out

    Thanks!

    Sam

  5. Unknown User (nagc)

    Hi Sam,

    I checked and OpenIFS will run fine with 3 MPI tasks. Have you still got this problem?

    The MPI initialization is done in ifsaux/module/mpl_init_mod.F90, after line 150. The code uses MPI_INIT_THREAD rather than MPI_INIT, I'm not sure if that would make a difference in your case but might be worth trying.

    MPI_CART_CREATE is essentially a collective, you could try adding an MPI_BARRIER call just before that line to see if that works ok.

    Another option would be to try another MPI implementation (MPICH or OpenMPI)?  I test with both.

    Perhaps when the model hangs, you could an ABORT signal to the process and then look in the traceback to see where in the MPI library it's stuck to get some more clues?

    Hope that helps, not sure what the issue is.

    Cheers,   Glenn