This guide will help users to use Extrae and Paraver performance tools. These tools have been developed at Barcelona Supercomputing Centre (BSC).
These tools will allow users to study the efficient use of the computational resources.
Provide HPC studies such as performance analysis
Identifying bottlenecks and solving and optimising parallel applications
Extrae is the package that generates Paraver-trace files for a post-morten analysis. Installed on cca/ccb
Paraver is the trace visualisation and analysis browser. Installed in the desktops.
Supported programming models:
OpenMP (Intel, GNU or IBM runtimes)
The default behaviour of extrae is using the LD_PRELOADmechanism. Using interposition and/or sampling mechanism this is the performance data gathered:
Timestamp: When analysing the behaviour of an application, it is important to have a fine-grained timestamping mechanism (up to nanoseconds). Extrae provides a set of clock functions that are specifically implemented for different target machines in order to provide the most accurate possible timing. On systems that have daemons that inhibit the usage of these timers or that do not have a specific timer implementation. In such cases, Extrae still uses advanced POSIX clocks to provide nanosecond resolution timestamps with low cost.
Performance and other counter metrics: Extrae uses the PAPI and the PMAPI interfaces to collect information regarding the microprocessor performance. With the advent of the components in the PAPI software, Extrae is not only able to collect information regarding how is behaving the microprocessor only, but also allows studying multiple components of the system (disk, network, operating system, among others) and also extend the study over the microprocessor (power consumption and thermal information). Extrae mainly collects these counter metrics at the parallel programming calls and at samples. It also allows capturing such information at the entry and exit points of the user routines instrumented.
References to the source code: Analysing the performance of an application requires relating the code that is responsible for such performance. This way the analyst can locate the performance bottlenecks and suggest improvements on the application code. Extrae provides information regarding the source code that was being executed (in terms of name of function, file name and line number) at specific location points like programming model calls or sampling points.
By default, extrae will choose the pure MPI configuration
The module will prepare the environment to extract the traces. Extrae will use a xml file for the configuration. We have prepared 2 default xml files in this location:
There is one for pure MPI applications and another for hybrid MPI+OPENMP.
Click here to see how to activate OpenMP trace...
# remember that only Intel, GNU or IBM runtimes are supported
There are some xml examples in:
In this directory you will find different folders with the programming models and inside these folders, several extrae_explained.xml files with the explanation of each section. If you want to user your own xml file you can use the environment variable:
Set the wrapper script in the aprun line
To enable the trace we have prepared 2 scripts to enable the trace of your parallel program depending on the source code:
To enable it, you have to use the wrapper script between the "aprun <args>" and the "executable". For example. to trace a Fortran parallel program:
The extrae library will generate several .mpit files in the current directory from where the aprun was run.
Note that the size of this files can be large! So it is recommended to place them in a Lustre filesystem (i.e. $SCRATCH)
If you have the <merge enabled="yes" in the xml file (enabled if you are using the default). The library will merge these files at the end of the execution. Once the files have been merged, 3 different files will appear in your directory:
These are the merged files that will be read later with paraver
If everything went well, you should be able to see the extrae output in your job stdout. This output is telling the user the configuration used to extrae the performance information.
Welcome to Extrae 3.4.3
Extrae: Parsing the configuration file (/usr/local/apps/extrae/xml/MPI/extrae.xml) begins
Extrae: Generating intermediate files for Paraver traces.
Extrae: MPI routines will collect HW counters information.
Extrae: Tracing 4 level(s) of MPI callers: [ 1 2 3 4 ]
Extrae: Warning! change-at-time time units not specified. Using seconds
Extrae: PAPI domain set to ALL for HWC set 1
Extrae: HWC set 1 contains following counters < PAPI_TOT_INS (0x80000032) PAPI_TOT_CYC (0x8000003b) PAPI_L1_DCM (0x80000000) > - never changes
Extrae: Resource usage is disabled at flush buffer.
Extrae: Memory usage is disabled at flush buffer.
Extrae: Tracing buffer can hold 500000 events
Extrae: Circular buffer disabled.
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (/usr/local/apps/extrae/xml/MPI/extrae.xml) has ended
Extrae: Intermediate traces will be stored in <tmpdir>
Extrae: Temporal directory (tmpdir) is shared among processes.
Extrae: Final directory (/scratch/...) is shared among processes.
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 300 tasks and 1 threads