Skip to end of metadata
Go to start of metadata
Introduction

This guide will help you to monitor the performance of your parallel application. We have tuned and installed IPM (Intergrated Performance Monitor) in the HPC.

IPM is a set of call wrappers that will monitor the performance of your application in terms of:

  • MPI calls count, bytes and time
  • POSIX calls count, bytes and time
  • Hardware counters (PAPI)
  • GBytes of memory used per task

The information is written in the output of the job as a banner and also in a XML file than can be digested afterwards. You will be able to retrieve a table in text format or graphic plots.

Some considerations before to use:

  • You do not have to recompile your code, it works using LD_PRELOAD mechanism
  • Note that to be able to profile your code, it must have MPI_Init and MPI_finalize calls.

Load the ipm module
Enable ipm
module load ipm

This module will add both special aprun and some utilities to read the output afterwards.

aprun ...

From now on, any aprun command will use ipm aprun. This aprun will set the proper environment variables to monitor the performance of your parallel application.


If you want to disable ipm in some parallel executions you can use:

Disable ipm
module load ipm
aprun parallel_program args
#this aprun will be traced

unset USE_IPM
aprun parallel_program2 args
#this aprun will not be traced
export USE_IPM=1

Set environmental variables

 You can control the level of monitoring of your execution using some environment variables.

Control the text banner in the output IPM_REPORT

variablevaluedescription
IPM_REPORTnoneNo report

terse

(default) Aggregate wallclock time, memory usage and flops are reported along with the percentage of wallclock time spent in MPI calls.
fullEach HPM counter is reported as are all of wallclock, user, system, and MPI time. The contribution of each MPI call to the communication time is given.

Whether or not print memory information in the banner IPM_REPORT_MEM

variablevaluedescription
IPM_REPORT_MEMnoNo report

yes

(default)The memory consumption per rank will be displayed in the stdout of the job

Control the level of output written in the XML log file IPM_LOG

variablevaluedescription
IPM_LOGnoneNo report

terse

(default) MPI and POSIX calls information, memory usage and flops . Counts, bytes and time of each MPI/POSIX call. Aggregation of PAPI counters per task
full

(currently not used) Extended information of the PAPI counters and MPI calls.

Comma separated hardware Counter events selection IPM_HPM

You can see the information about the hardware counters doing:

module load papi
papi_avail

By default it will count: Level 1 data cache misses, Floating point operations, Instructions completed, Total cycles.

variablevaluedescription
IPM_HPMnoneNo hardware counter

PAPI_L1_DCM,PAPI_FP_OPS,PAPI_TOT_INS,PAPI_TOT_CYC

(default)

Where to place the XML file IPM_LOGDIR

variablevaluedescription
IPM_LOGDIR.(default) will be the job working dir

/custom/path

Will generate the XML file in this path (useful if you want to have them centralized)

Select directory for the output log

 By default, the ipm log output dir will be: PBS_O_WORKDIR. In some experiments, this is $HOME. We suggest to use $SCRATCH or $PERM to keep these files.

Example:

export IPM_LOGDIR=$SCRATCH/ipm-logs
mkdir -p $IPM_LOGDIR

The directory must exist.

Then ipm will generate a file per parallel command in the directory. The file name will be something like:

rdx_uscs_ifsMASTER_gedz_id4930673.ccbpar.1440488618.ipm.xml

<user>_<realuser>_<executable>_<experimentID>_id<JOB_ID>_<epochTime>.ipm.xml

Use IPMsummary to parse the log file

We have created a special tool to parse the file and retrieve the useful information in a txt format. This tool is also able to generate graphical plots using Python:

usage: IPMsummary [-h] [-d] [-f FUNCTIONS] [-l] [-s] [-c] [-g] ipmfile

Python tool to get IPM summary

positional arguments:
  ipmfile

optional arguments:
  -h, --help            show this help message and exit
  -d, --debug           enable debug mode
  -f FUNCTIONS, --functions FUNCTIONS
                        comma separated list of functions to print (-f all
                        will print all the functions)
  -l, --list            list all available functions
  -s, --summary         print summary per rank
  -c, --counters        print PAPI counters summary per rank
  -g, --graphics        generate plots instead of table data

Example:

  • This example will print the default summary of MPI, POSIX and memory.
$> IPMsummary uscs_exec.x_id5926.ccapar.1440508348.ipm.xml
##################################################################################################
###################################JOB RESUME#####################################################
Executable:     /path/to/exec.x 
Nprocs (-n):    439
Nnodes:         19
JOB ID:         5926.ccapar
User:           uscs
Start Time:     2015-08-25 13:12:28
End Time:       2015-08-25 13:30:50
Run Time:       1102.14940596 seconds

Command:        IPMsummary uscs_exec.x.x_id5926.ccapar.1440508348.ipm.xml
###################################################################################################

Summary per rank
---------------------------------------------------------------------------------------------------
 rank         MPI [%]       MPI [sec]     POSIXIO [%]   POSIXIO [sec]     memory [GB]  mem [GB] |       node | executable                    
    0           95.08         1047.94            2.53           27.92            0.68      0.68 |   nid00522 | exec.x                 
    1           95.11         1048.25            2.52           27.78            0.68      0.68 |   nid00522 | exec.x                 
    2           95.07         1047.81            2.54           28.02            0.68      0.68 |   nid00522 | exec.x                 
    3           95.09         1048.05            2.55           28.12            0.68      0.68 |   nid00522 | exec.x                 
    4           95.16         1048.85            2.56           28.22            0.68      0.68 |   nid00522 | exec.x                 
    5           98.26         1082.94            0.82            9.08            0.40      0.40 |   nid00522 | exec.x                 
    6           98.29         1083.34            0.81            8.98            0.40      0.40 |   nid00522 | exec.x                 
    7           23.44          258.37            3.54           39.05            0.82      0.82 |   nid03190 | ifs -v ecmwf
    8           25.01          275.68            0.89            9.81            0.67      0.67 |   nid03190 | ifs -v ecmwf
    9           25.51          281.17            0.91            9.99            0.66      0.66 |   nid03190 | ifs -v ecmwf
   10           25.50          281.02            0.91           10.05            0.67      0.67 |   nid03190 | ifs -v ecmwf 
   11           25.73          283.58            0.94           10.38            0.67      0.67 |   nid03190 | ifs -v ecmwf
...

Example PAPI counters (-c):

  • This example shows the hardware counters of your program
$> IPMsummary uscs_exec.x_id5926.ccapar.1440508348.ipm.xml -c
##################################################################################################
###################################JOB RESUME#####################################################
Executable:     /path/to/exec.x 
Nprocs (-n):    439
Nnodes:         19
JOB ID:         5926.ccapar
User:           uscs
Start Time:     2015-08-25 13:12:28
End Time:       2015-08-25 13:30:50
Run Time:       1102.14940596 seconds

Command:        IPMsummary uscs_exec.x.x_id5926.ccapar.1440508348.ipm.xml
###################################################################################################
Hardware counters per rank
---------------------------------------------------------------------------------------------------
If you want to get information about the PAPI Hardware counters:

module load papi
papi_avail -e EVENT

 rank           gflop    PAPI_TOT_INS    PAPI_TOT_CYC     PAPI_L1_DCM     PAPI_FP_OPS |       node | executable                    
    0    2.857770e-03    9.280690e+12    3.207549e+12    1.126418e+09    3.149391e+09 |   nid00522 | exec.x                 
    1    2.855410e-03    8.932103e+12    3.212797e+12    1.128875e+09    3.146975e+09 |   nid00522 | exec.x                 
    2    2.855290e-03    8.940195e+12    3.212259e+12    1.128509e+09    3.146948e+09 |   nid00522 | exec.x                 
    3    2.856690e-03    8.916870e+12    3.212129e+12    1.125382e+09    3.148645e+09 |   nid00522 | exec.x                 
    4    2.855620e-03    8.927689e+12    3.212122e+12    1.057265e+09    3.147696e+09 |   nid00522 | exec.x                 
    5    9.354010e-04    9.315839e+12    3.269574e+12    6.555938e+08    1.030865e+09 |   nid00522 | exec.x                 
    6    9.354640e-04    9.336792e+12    3.270578e+12    6.566418e+08    1.031119e+09 |   nid00522 | exec.x                 
    7    1.606180e-01    4.162576e+12    3.004313e+12    6.855685e+10    1.815371e+11 |   nid03190 | ifs -v ecmwf 
    8    1.635820e-01    4.315565e+12    3.093403e+12    6.913764e+10    1.848861e+11 |   nid03190 | ifs -v ecmwf
    9    1.561060e-01    4.321139e+12    3.068875e+12    6.854915e+10    1.764533e+11 |   nid03190 | ifs -v ecmwf
   10    1.575230e-01    4.314031e+12    3.064538e+12    6.829505e+10    1.780490e+11 |   nid03190 | ifs -v ecmwf
   11    1.568620e-01    4.337071e+12    3.093300e+12    6.871042e+10    1.772904e+11 |   nid03190 | ifs -v ecmwf
...

Example function count, bytes and time:

Using the option -l you can list the functions that have been monitored:

lseek,MPI_Bcast,fread,fclose,ftruncate64,MPI_Gatherv,MPI_Finalize,MPI_Comm_create,close,fflush,fopen,open,lseek64,MPI_Comm_size,
MPI_Buffer_detach,fwrite,getc,MPI_Init,MPI_Allreduce,write,open64,MPI_Comm_dup,MPI_Waitall,read,MPI_Recv,MPI_Barrier,MPI_Alltoallv,
MPI_Send,MPI_Isend,MPI_Comm_group,MPI_Buffer_attach,MPI_Comm_split,MPI_Allgatherv,MPI_Wait,MPI_Comm_rank,MPI_Bsend,MPI_Allgather,
MPI_Irecv

Then you can pass this list (or a subset) to the -f option:

********** MPI_Bcast
 rank     count       bytes        time | executable
    0         -           -           - | exec.x 
    1         -           -           - | exec.x 
    2         -           -           - | exec.x 
    3         -           -           - | exec.x 
    4         -           -           - | exec.x 
    5         -           -           - | exec.x 
    6         -           -           - | exec.x 
    7      7780  4.6241e+07  9.0050e-01 | ifs -v ecmwf 
    8      7780  4.6241e+07  3.1880e+01 | ifs -v ecmwf
    9      7780  4.6241e+07  3.5567e+01 | ifs -v ecmwf
...

********** fread
 rank     count       bytes        time | executable
    0        11  4.4000e+01  2.2830e-01 | exec.x 
    1        11  4.4000e+01  1.9233e-01 | exec.x 
    2        11  4.4000e+01  1.4722e-01 | exec.x 
    3        11  4.4000e+01  1.6860e-01 | exec.x 
    4        11  4.4000e+01  1.5498e-01 | exec.x 
    5        11  4.4000e+01  3.1470e-01 | exec.x 
    6        11  4.4000e+01  2.3308e-01 | exec.x 
    7     73215  6.6663e+08  2.0519e+01 | ifs -v ecmwf 
    8        28  7.3908e+04  2.0275e-02 | ifs -v ecmwf
    9        28  7.3908e+04  2.1103e-02 | ifs -v ecmwf
...

Graphic plots (-g):

Any of the previous options can be used with (-g) :

IPMsummary uscs_exec.x_id5926.ccapar.1440508348.ipm.xml -s -g

 

Examples
#!/bin/bash
#PBS -N IPMH_TEST
#PBS -q np
#PBS -l EC_total_tasks=48
#PBS -l EC_threads_per_task=1
#PBS -l EC_hyperthreads=2
#PBS -l walltime=01:00:00

cd $SCRATCH/...
module load ipm

export IPM_LOGDIR=$SCRATCH/ipm-logs
mkdir -p $IPM_LOGDIR

aprun -N $EC_tasks_per_node -n $EC_total_tasks -d $EC_threads_per_task -j $EC_hyperthreads <mpi_program>