Skip to end of metadata
Go to start of metadata

New version installed on the HPC based on Darshan version 3.4.1

http://www.mcs.anl.gov/research/projects/darshan/

Git repository: https://xgitlab.cels.anl.gov/csimarro/darshan

Introduction

This guide will help you to be able to profile I/O parallel programs and read the output. Darshan is the selected tool. It works wrapping Posix, MPI-IO and HDF-5 IO calls. These wrappers intercept the calls and add some performance information. This information is written into a log file at the end of the execution. It works with C and Fortran.

  • You do not have to recompile your code, it works using LD_PRELOAD mechanism
  • Note that to be able to profile your code, it must have MPI_Init and MPI_finalize calls.
  • By default, Darshan will not trace files included in system directories: /usr/, /proc/, /etc/, ... However, you can override this option.

Load the Darshan module
Enable DARSHAN
module unload atp
module load darshan

This module will add both special aprun and some utilities to read the output afterwards.

aprun ...

From now on, any aprun command will use darshan's aprun. This aprun will set the proper environment variables to trace all the IO of the parallel program.


If you want to disable darshan in some parallel executions you can use:

Disable DARSHAN
module load darhan
aprun parallel_program args
#this aprun will be traced
unset USE_DARSHAN
aprun parallel_program2 args
#this aprun will not be traced
export USE_DARHSAN=1

Select directory for the output log

 By default, the darshan log output dir will be: PBS_O_WORKDIR. In some experiments, this is $HOME. We suggest to use $SCRATCH to keep these files.

Example:

export DARSHAN_LOG_DIR=$SCRATCH/darshan-logs
mkdir -p $DARSHAN_LOG_DIR

The directory must exist.

Then darshan will generate a file per parallel command in the directory. The file name will be something like:

rdx_g91h_ifsMASTER_id7170195_2-2-30782-15280438332034793663_1.darshan.gz

<user>_<experimentID>_<executable>_id<JOB_ID>_<month>-<day>-<seconds>-<randomnum>.darshan.gz

(optional) Set the directories to be excluded

 By default, darshan will not trace files in the system directories:

"/etc/"
"/dev/"
"/usr/"
"/bin/"
"/boot/"
"/lib/"
"/opt/"
"/sbin/"
"/sys/"
"/proc/"

You can overwrite these list using:

# syntax is: "dir1,dir2,dir3"
export DARSHAN_EXCLUDE_DIRS="/etc/,/proc/"
# this will just exclude /etc/ and /proc/, all the other files will be traced.

You can also disable the directories excluded and track everything that your program access:

export DARSHAN_EXCLUDE_DIRS=none

This will generate huge darshan log files. Use it carefully.

Use darshansummary to parse the log file

There are some darshan internal utilities to parse the output and retrieve some information. You can also generate a summary pdf using: darshan-job-summary.pl.

We have created a special tool to parse the file and retrieve the useful information in a txt format:

darshansummary -h
usage: darshansummary <file_darshan.gz>

Arguments:
    -a  this enable all the reports
    -f  enable report each rank all files (default 10 per rank)
    -t  enable report aggregated per MPI rank
    -s  enable summary of all IO operations
    -i  enable print list of all shared files
    -j  enable summary of shared files
    -p  enable report for shared files
    -h  shows this help message

Extra arguments:
     --extended         shows all the files per rank   
                                (default: 10)
     --threshold=N.N    will change the default threshold to N.N seconds  
                                (default 5.0 seconds)
                        this means that the table will show all the files which Meta + Read + Write time is lower than N.N
     --ntasks=N         minimum number of tasks to consider a file shared
                                (default: 4)
     --systemfiles      this special flag will enable the report of system files a.k.a. /etc/, /usr/, /proc/... if you have  
                        asked to report without excluding these dirs

Example:

  • This example will enable all IO files per task, showing files with more than 10 seconds of Meta + Read + Write time and considering a shared file those that have been accessed by more than 5 tasks.
$> darshansummary --ntasks=5 --extended --threshold=10 user_program_id84383_3-10-34795-2377591297034330_1.darshan.gz

##################################################################################################
###################################JOB RESUME#####################################################
Executable:    mpi_program
Nprocs (-n):    48
JOB ID:         4323232
Start Time:     Tue Mar 10 08:46:25 2015
End Time:       Tue Mar 10 09:39:55 2015
Run Time:       3211
SHOW INFO:
  - Showing all IO files per task
  - Showing files with more than 10.0 seconds of Meta + Read + Write time , you can change it using --threshold=N.N
  - Considering shared files those that have been accessed by 5 or more ranks
###################################################################################################

...

Example IFS:

  • This example is from a real IFS execution. It is showing the report and the summary for all the shared files (by default any file accessed by 4 or more tasks is shared)
$> darshansummary -p -j ./rdx-uscs_gkv0_ifsMASTER_id9809390_6-6-57473-5375414089742733424_1.darshan.gz
##################################################################################################
###################################JOB RESUME#####################################################
Executable: 	/.../tmp.gkv0_an_main_lw00_fc.model.1.5807/ifsMASTER -f h18 -t 900.0 -v ecmwf -e gkv0 
Nprocs (-n):	52
JOB ID:     	809390
Start Time: 	Tue Jun  6 15:57:53 2017
End Time:   	Tue Jun  6 16:05:42 2017
Run Time:   	470

Command:    	darshansummary -p -j ./rdx-uscs_gkv0_ifsMASTER_id9809390_6-6-57473-5375414089742733424_1.darshan.gz
SHOW INFO:
  - Showing 10 most expensive IO files per task
  - Showing files with more than 5.0 seconds of Meta + Read + Write time , you can change it using --threshold=N.N
  - Considering shared files those that have been accessed by 4 or more ranks
###################################################################################################

Summary of all IO
---------------------------------------------------------------------------------------------------
         47 different files
        122 file opens for read
        128 file opens for write

        525 total number of POSIX read
    1603869 total number of STDIO fread
        276 total number of POSIX write
      12800 total number of STDIO fwrite
        139 total number of POSIX open
        222 total number of STDIO fopen
      18104 total number of POSIX seek
        779 total number of stat
         16 total number of fsync
        410 total number of STDIO flush
          0 files opened but no read/write action
          0 files stat/seek but not opened

        7.0  read time (aggregated total from all ranks)
       11.7 write time (aggregated total from all ranks)
        4.0  meta time (aggregated total from all ranks)
        0.0 fsync only time
        0.0 stat/seek but no open time
        0.0 open but no read/write time

     1579.6 Mbytes read
     9167.5 Mbytes written



Report of shared files IO
---------------------------------------------------------------------------------------------------
(Considering shared files those that have been accessed by 4 or more ranks, you can change it using --ntasks=N)

 rank opens   stats   seeks  fsyncs flushes   reads  writes    File    Meta |         Read           |          Write         | Type | Filename
                                                               size    time |   time      MB    MB/s |   time      MB    MB/s |      |
    4     4       4       0       4       0       0      12       -     0.0 |      -       -       - |    1.0     0.4     0.4 |    P | :fc:0600::::::3::.
   48    96     474    4800       0       0   73824       0       -     0.9 |    0.1    10.0   198.1 |      -       -       - |    F | wam_namelist_coupled_000
   48    96     144   13152       0       0 1387200       0       -     0.4 |    0.2    42.4   208.7 |      -       -       - |    F | namelistfc
    4     4       4       0       4       0       0     126       -     0.0 |      -       -       - |    0.2     2.5    14.9 |    P | :fc:0600::::::3::.
   52    52       0       0       0      48       0     311       -       - |      -       -       - |    0.1     0.0     0.1 |    F | <STDERR>
   52    52       0       0       0     116       0     632       -       - |      -       -       - |    0.0     0.0     0.9 |    F | <STDOUT>

Examples
#!/bin/bash
#PBS -N DSH_TEST
#PBS -q np
#PBS -l EC_total_tasks=48
#PBS -l EC_threads_per_task=1
#PBS -l EC_hyperthreads=2
#PBS -l walltime=01:00:00

cd $SCRATCH/...
module load darshan
module unload atp
export DARSHAN_LOG_DIR=$SCRATCH/darshan-logs
mkdir -p $DARSHAN_LOG_DIR
####export DARSHAN_EXCLUDE_DIRS="/etc/,/proc/"

aprun -N $EC_tasks_per_node -n $EC_total_tasks -d $EC_threads_per_task -j $EC_hyperthreads <mpi_program>