Performance properties

Time

Description:
Total time spent for program execution including the idle times of CPUs reserved for slave threads during OpenMP sequential execution. This pattern assumes that every thread of a process allocated a separate CPU during the entire runtime of the process.
Unit:
Seconds
Parent:
None
Children:
Execution Time, Overhead Time, OpenMP Idle threads Time

Visits

Description:
Number of times a certain call path has been visited.
Unit:
Counts
Parent:
None
Children:
None

Execution Time

Description:
Time spent on program execution but without the idle times of slave threads during OpenMP sequential execution. For pure MPI applications, this pattern is equal to Time.
Unit:
Seconds
Parent:
Time
Children:
MPI Time, OpenMP Time

Overhead Time

Description:
Time spent performing major tasks related to trace generation, such as time synchronization or dumping the trace-buffer contents to a file. Note that the normal per-event overhead is not included.
Unit:
Seconds
Parent:
Time
Children:
None

MPI Time

Description:
This pattern refers to the time spent in (instrumented) MPI calls.
Unit:
Seconds
Parent:
Execution Time
Children:
MPI Synchronization Time, MPI Communication Time, MPI File I/O Time, MPI Init/Exit Time

MPI Synchronization Time

Description:
This pattern refers to the time spent in MPI synchronization calls, i.e., barriers.
Unit:
Seconds
Parent:
MPI Time
Children:
MPI Collective Synchronization Time

MPI Communication Time

Description:
This pattern refers to the time spent in MPI communication calls.
Unit:
Seconds
Parent:
MPI Time
Children:
MPI Point-to-point Communication Time, MPI Collective Communication Time

MPI File I/O Time

Description:
This pattern refers to the time spent in MPI file I/O calls.
Unit:
Seconds
Parent:
MPI Time
Children:
MPI Collective File I/O Time

MPI Collective File I/O Time

Description:
This pattern refers to the time spent in collective MPI file I/O calls.
Unit:
Seconds
Parent:
MPI File I/O Time
Children:
None

MPI Init/Exit Time

Description:
This pattern refers to the time spent in MPI initialization and finalization calls, i.e., MPI_Init(), MPI_Init_thread() or MPI_Finalize().
Unit:
Seconds
Parent:
MPI Time
Children:
None

MPI Collective Synchronization Time

Description:
This pattern refers to the total time spent in MPI barriers.
Unit:
Seconds
Parent:
MPI Synchronization Time
Children:
Wait at Barrier Time, Barrier Completion Time

Wait at Barrier Time

Description:
This pattern covers the time spent waiting in front of an MPI barrier, which is the time inside the barrier call until the last processes has reached the barrier. A large amount of waiting time spent in front of barriers can be an indication of load imbalance.


Wait at Barrier Example

Unit:
Seconds
Parent:
MPI Collective Synchronization Time
Children:
None

Barrier Completion Time

Description:
This pattern refers to the time spent in MPI barriers after the first process has left the operation.


Barrier Completion Example

Unit:
Seconds
Parent:
MPI Collective Synchronization Time
Children:
None

MPI Point-to-point Communication Time

Description:
This pattern refers to the total time spent in MPI point-to-point communication calls.
Unit:
Seconds
Parent:
MPI Communication Time
Children:
Late Sender Time, Late Receiver Time

Late Sender Time

Description:
Refers to the time lost waiting caused by a blocking receive operation (e.g., MPI_Recv() or MPI_Wait()) that is posted earlier than the corresponding send operation.


Late Sender Example

If the receiving process is waiting for multiple messages to arrive (e.g., in an call to MPI_Waitall()), the maximum waiting time is accounted, i.e., the waiting time due to the latest sender.
Unit:
Seconds
Parent:
MPI Point-to-point Communication Time
Children:
Late Sender, Wrong Order Time

Late Sender, Wrong Order Time

Description:
A Late Sender situation may be the result of messages that are received in the wrong order. If a process expects messages from one or more processes in a certain order, although these processes are sending them in a different order, the receiver may need to wait for a message if it tries to receive a message early that has been sent late. This situation can eventually be avoided by receiving messages in the order in which they are sent instead. This pattern refers to the time spent in a wait state as a result of this situation.

This pattern comes in two different flavors: See the description of the corresponding specializations for more details.
Unit:
Seconds
Parent:
Late Sender Time
Children:
Late Sender, Wrong Order Time / Different Sources, Late Sender, Wrong Order Time / Same Source

Late Sender, Wrong Order Time / Different Sources

Description:
This specialization of the Late Sender, Wrong Order pattern refers to wrong order situations due to messages received from different source locations.


Messages from different sources Example

Unit:
Seconds
Parent:
Late Sender, Wrong Order Time
Children:
None

Late Sender, Wrong Order Time / Same Source

Description:
This specialization of the Late Sender, Wrong Order pattern refers to wrong order situations due to messages received from the same source location.


Messages from same source Example

Unit:
Seconds
Parent:
Late Sender, Wrong Order Time
Children:
None

Late Receiver Time

Description:
A send operation may be blocked until the corresponding receive operation is called. This can happen for several reasons. Either the MPI implementation is working in synchronous mode by default or the size of the message to be sent exceeds the available MPI-internal buffer space and the operation is blocked until the data can be transferred to the receiver. The pattern refers to the time spent waiting as a result of this situation.


Late Receiver Example

Note that this pattern does currently not apply to nonblocking sends waiting in the corresponding completion call, e.g., MPI_Wait().
Unit:
Seconds
Parent:
MPI Point-to-point Communication Time
Children:
None

MPI Collective Communication Time

Description:
This pattern refers to the total time spent in MPI collective communication calls.
Unit:
Seconds
Parent:
MPI Communication Time
Children:
Early Reduce Time, Early Scan Time, Late Broadcast Time, Wait at N x N Time, N x N Completion Time

Early Reduce Time

Description:
Collective communication operations that send data from all processes to one destination process (i.e., n-to-1) may suffer from waiting times if the destination process enters the operation earlier than its sending counterparts, that is, before any data could have been sent. The pattern refers to the time lost as a result of this situation. It applies to the MPI calls MPI_Reduce(), MPI_Gather() and MPI_Gatherv().


Early Reduce Example

Unit:
Seconds
Parent:
MPI Collective Communication Time
Children:
None

Early Scan Time

Description:
MPI_Scan operations may suffer from waiting times if the process with rank n enters the operation earlier than its sending counterparts (i.e., ranks 0..n-1). The pattern refers to the time lost as a result of this situation.


Early Scan Example

Unit:
Seconds
Parent:
MPI Collective Communication Time
Children:
None

Late Broadcast Time

Description:
Collective communication operations that send data from one source process to all processes (i.e., 1-to-n) may suffer from waiting times if destination processes enter the operation earlier than the source process, that is, before any data could have been sent. The pattern refers to the time lost as a result of this situation. It applies to the MPI calls MPI_Bcast(), MPI_Scatter() and MPI_Scatterv().


Late Broadcast Example

Unit:
Seconds
Parent:
MPI Collective Communication Time
Children:
None

Wait at N x N Time

Description:
Collective communication operations that send data from all processes to all processes (i.e., n-to-n) exhibit an inherent synchronization among all participants, that is, no process can finish the operation until the last process has started it. This pattern covers the time spent in n-to-n operations until all processes have reached it. It applies to the MPI calls MPI_Reduce_scatter(), MPI_Allgather(), MPI_Allgatherv(), MPI_Allreduce(), MPI_Alltoall(), MPI_Alltoallv().


Wait at N x N Example

Note that the time reported by this pattern is not necessarily completely waiting time since some processes could -- at least theoretically -- already communicate with each other while others have not yet entered the operation.
Unit:
Seconds
Parent:
MPI Collective Communication Time
Children:
None

N x N Completion Time

Description:
This pattern refers to the time spent in MPI n-to-n collectives after the first process has left the operation.


N x N Completion Example

Note that the time reported by this pattern is not necessarily completely waiting time since some processes could -- at least theoretically -- still communicate with each other while others have already finished communicating and exited the operation.
Unit:
Seconds
Parent:
MPI Collective Communication Time
Children:
None

OpenMP Idle threads Time

Description:
Idle time on CPUs that may be reserved for teams of threads when the process is executing sequentially before and after OpenMP parallel regions, or with less than the full team within OpenMP parallel regions.


OMP Example

Unit:
Seconds
Parent:
Time
Children:
OpenMP Limited parallelism Time

OpenMP Limited parallelism Time

Description:
Idle time on CPUs that may be reserved for threads within OpenMP parallel regions where not all of the thread team participates.


OMP Example

Unit:
Seconds
Parent:
OpenMP Idle threads Time
Children:
None

OpenMP Time

Description:
Time spent in OpenMP API calls and code generated by the OpenMP compiler.
Unit:
Seconds
Parent:
Execution Time
Children:
OpenMP Flush Time, OpenMP Management Time, OpenMP Synchronization Time

OpenMP Flush Time

Description:
Time spent in OpenMP flush directives.
Unit:
Seconds
Parent:
OpenMP Time
Children:
None

OpenMP Management Time

Description:
Time spent managing teams of threads, creating and initializing them when forking a new parallel region and clearing up afterwards when joining.


Management Example

Unit:
Seconds
Parent:
OpenMP Time
Children:
OpenMP Management Fork Time

OpenMP Management Fork Time

Description:
Time spent creating and initializing teams of threads.


Fork Example

Unit:
Seconds
Parent:
OpenMP Management Time
Children:
None

OpenMP Synchronization Time

Description:
Time spent in OpenMP synchronization, whether barriers or mutual exclusion via critical sections, atomics or lock API calls.
Unit:
Seconds
Parent:
OpenMP Time
Children:
OpenMP Barrier Synchronization Time, OpenMP Critical Synchronization Time, OpenMP Lock API Synchronization Time

OpenMP Barrier Synchronization Time

Description:
Time spent in implicit (compiler-generated) or explicit (user-specified) OpenMP barrier synchronization. Note that during measurement implicit barriers are treated similar to explicit ones. The instrumentation procedure replaces an implicit barrier with an explicit barrier enclosed by the parallel construct. This is done by adding a nowait clause and a barrier directive as the last statement of the parallel construct. In cases where the implicit barrier cannot be removed (i.e., parallel region), the explicit barrier is executed in front of the implicit barrier, which will then be negligible because the team will already be synchronized when reaching it. The synthetic explicit barrier appears as a special implicit barrier construct.
Unit:
Seconds
Parent:
OpenMP Synchronization Time
Children:
OpenMP Explicit Barrier Synchronization Time, OpenMP Implicit Barrier Synchronization Time

OpenMP Explicit Barrier Synchronization Time

Description:
Time spent in explicit (i.e., user-specified) OpenMP barrier synchronization.
Unit:
Seconds
Parent:
OpenMP Barrier Synchronization Time
Children:
None

OpenMP Implicit Barrier Synchronization Time

Description:
Time spent in implicit (i.e., compiler-generated) OpenMP barrier synchronization.
Unit:
Seconds
Parent:
OpenMP Barrier Synchronization Time
Children:
None

OpenMP Critical Synchronization Time

Description:
Time spent waiting to enter OpenMP critical sections and in atomics, where mutual exclusion restricts access to a single thread at a time.
Unit:
Seconds
Parent:
OpenMP Synchronization Time
Children:
None

OpenMP Lock API Synchronization Time

Description:
Time spent in OpenMP API calls dealing with locks.
Unit:
Seconds
Parent:
OpenMP Synchronization Time
Children:
None

Synchronizations

Description:
This metric provides the total number of MPI synchronization operations that were executed. This does not only include barrier calls, but also communication operations which transfer no data (i.e., zero-sized messages are considered to be used for synchronization).
Unit:
Counts
Parent:
None
Children:
Point-to-point Synchronizations, Collective Synchronizations

Point-to-point Synchronizations

Description:
Provides the total number of MPI point-to-point synchronization operations, i.e., point-to-point transfers of zero-sized messages.
Unit:
Counts
Parent:
Synchronizations
Children:
Point-to-point Send Synchronizations, Point-to-point Receive Synchronizations

Point-to-point Send Synchronizations

Description:
Provides the number of MPI point-to-point synchronization operations sending a zero-sized message.
Unit:
Counts
Parent:
Point-to-point Synchronizations
Children:
Late Receiver Instances (Synchronizations)

Point-to-point Receive Synchronizations

Description:
Provides the number of MPI point-to-point synchronization operations receiving a zero-sized message.
Unit:
Counts
Parent:
Point-to-point Synchronizations
Children:
Late Sender Instances (Synchronizations)

Collective Synchronizations

Description:
Provides the number of MPI collective synchronization operations. This does not only include barrier calls, but also calls to collective communication operations that are neither sending nor receiving any data.
Unit:
Counts
Parent:
Synchronizations
Children:
None

Communications

Description:
Provides the total number of MPI communication operations, excluding calls transferring no data (which are considered synchronizations).
Unit:
Counts
Parent:
None
Children:
Point-to-point Communications, Collective Communications

Point-to-point Communications

Description:
Provides the number of MPI point-to-point communication operations, excluding calls transferring zero-sized messages.
Unit:
Counts
Parent:
Communications
Children:
Point-to-point Send Communications, Point-to-point Receive Communications

Point-to-point Send Communications

Description:
Provides the number of MPI point-to-point send operations, excluding calls transferring zero-sized messages.
Unit:
Counts
Parent:
Point-to-point Communications
Children:
Late Receiver Instances (Communications)

Point-to-point Receive Communications

Description:
Provides the number of MPI point-to-point receive operations, excluding calls transferring zero-sized messages.
Unit:
Counts
Parent:
Point-to-point Communications
Children:
Late Sender Instances (Communications)

Collective Communications

Description:
Provides the number of MPI collective communication operations, excluding calls neither sending nor receiving any data.
Unit:
Counts
Parent:
Communications
Children:
Collective Exchange Communications, Collective Communications as Source, Collective Communications as Destination

Collective Exchange Communications

Description:
Provides the number of MPI collective communication operations which are both sending and receiving data.
Unit:
Counts
Parent:
Collective Communications
Children:
None

Collective Communications as Source

Description:
Provides the number of MPI collective communication operations that are only sending but not receiving data.
Unit:
Counts
Parent:
Collective Communications
Children:
None

Collective Communications as Destination

Description:
Provides the number of MPI collective communication operations that are only receiving but not sending data.
Unit:
Counts
Parent:
Collective Communications
Children:
None

Bytes Transferred

Description:
Provides the total number of bytes that were processed in MPI communication operations (i.e., the sum of the bytes that were sent and received).
Unit:
Bytes
Parent:
None
Children:
Point-to-point Bytes Transferred, Collective Bytes Transferred

Point-to-point Bytes Transferred

Description:
Provides the total number of bytes that were processed by MPI point-to-point communication operations.
Unit:
Bytes
Parent:
Bytes Transferred
Children:
Point-to-point Bytes Sent, Point-to-point Bytes Received

Point-to-point Bytes Sent

Description:
Provides the number of bytes that were sent using MPI point-to-point communication operations.
Unit:
Bytes
Parent:
Point-to-point Bytes Transferred
Children:
None

Point-to-point Bytes Received

Description:
Provides the number of bytes that were received using MPI point-to-point communication operations.
Unit:
Bytes
Parent:
Point-to-point Bytes Transferred
Children:
None

Collective Bytes Transferred

Description:
Provides the total number of bytes that were processed in MPI collective communication operations.
Unit:
Bytes
Parent:
Bytes Transferred
Children:
Collective Bytes Outgoing, Collective Bytes Incoming

Collective Bytes Outgoing

Description:
Provides the number of bytes that were sent by MPI collective communication operations.
Unit:
Bytes
Parent:
Collective Bytes Transferred
Children:
None

Collective Bytes Incoming

Description:
Provides the number of bytes that were received by MPI collective communication operations.
Unit:
Bytes
Parent:
Collective Bytes Transferred
Children:
None

Late Sender Instances (Communications)

Description:
Provides the total number of Late Sender instances found in communication operations.
Unit:
Counts
Parent:
Point-to-point Receive Communications
Children:
Late Sender, Wrong Order Instances (Communications)

Late Sender, Wrong Order Instances (Communications)

Description:
Provides the total number of Late Sender instances found in communication operations were messages were sent in wrong order (see also Messages in Wrong Order).
Unit:
Counts
Parent:
Late Sender Instances (Communications)
Children:
None

Late Receiver Instances (Communications)

Description:
Provides the total number of Late Receiver instances found in communication operations.
Unit:
Counts
Parent:
Point-to-point Send Communications
Children:
None

Late Sender Instances (Synchronizations)

Description:
Provides the total number of Late Sender instances found in synchronization operations (i.e., zero-sized message transfers).
Unit:
Counts
Parent:
Point-to-point Receive Synchronizations
Children:
Late Sender, Wrong Order Instances (Synchronizations)

Late Sender, Wrong Order Instances (Synchronizations)

Description:
Provides the total number of Late Sender instances found in synchronization operations (i.e., zero-sized message transfers) were messages are received in wrong order (see also Messages in Wrong Order).
Unit:
Counts
Parent:
Late Sender Instances (Synchronizations)
Children:
None

Late Receiver Instances (Synchronizations)

Description:
Provides the total number of Late Receiver instances found in synchronization operations (i.e., zero-sized message transfers).
Unit:
Counts
Parent:
Point-to-point Send Synchronizations
Children:
None

MPI File Operations

Description:
Number of MPI file operations of any type.
Unit:
Counts
Parent:
None
Children:
MPI File Individual Operations, MPI File Collective Operations

MPI File Individual Operations

Description:
Number of individual MPI file operations.
Unit:
Counts
Parent:
MPI File Operations
Children:
MPI File Individual Read Operations, MPI File Individual Write Operations

MPI File Individual Read Operations

Description:
Number of individual MPI file read operations.
Unit:
Counts
Parent:
MPI File Individual Operations
Children:
None

MPI File Individual Write Operations

Description:
Number of individual MPI file write operations.
Unit:
Counts
Parent:
MPI File Individual Operations
Children:
None

MPI File Collective Operations

Description:
Number of collective MPI file operations.
Unit:
Counts
Parent:
MPI File Operations
Children:
MPI File Collective Read Operations, MPI File Collective Write Operations

MPI File Collective Read Operations

Description:
Number of collective MPI file read operations.
Unit:
Counts
Parent:
MPI File Collective Operations
Children:
None

MPI File Collective Write Operations

Description:
Number of collective MPI file write operations.
Unit:
Counts
Parent:
MPI File Collective Operations
Children:
None

Computational load imbalance heuristic

Description:
This simple heuristic allows to identify computational load imbalances and is calculated for each (call-path, process/thread) pair. Its value represents the absolute difference to the average exclusive execution time. This average value is the aggregated exclusive time spent by all processes/threads in this call-path, divided by the number of processes/threads visiting it.

Note: A high value for a collapsed call tree node does not necessarily mean that there is a load imbalance in this particular node, but the imbalance can also be somewhere in the subtree underneath.
Unit:
Seconds
Parent:
None
Children:
Computational load imbalance heuristic (values below average), Computational load imbalance heuristic (values above average)

Computational load imbalance heuristic (values below average)

Description:
This metric is provided as a convenience to identify processes/threads were the exclusive execution time spent for a particular call tree node was below the average value.

Please see Computational load imbalance for details on how this heuristic is calculated.
Unit:
Seconds
Parent:
Computational load imbalance heuristic
Children:
None

Computational load imbalance heuristic (values above average)

Description:
This metric is provided as a convenience to identify processes/threads were the exclusive execution time spent for a particular call tree node was above the average value.

Please see Computational load imbalance for details on how this heuristic is calculated.
Unit:
Seconds
Parent:
Computational load imbalance heuristic
Children:
None

SCALASCA    Copyright © 1998-2009 Forschungszentrum Jülich