MADbench2

MADbench2 is a tool for testing the overall integrated performance of the I/O, communication and calculation subsystems of massively parallel architectures under the stresses of a real scientific application.

MADbench2 is based on the MADspec code, which calculates the maximum likelihood angular power spectrum of the Cosmic Microwave Background radiation from a noisy pixelized map of the sky and its pixel-pixel noise correlation matrix.

MADbench2 retains the full computational complexity of its parent scientific application code, but uses self-generated pseudo-data to allow the myriad computationally irrelevant details associated with handling real CMB datasets to be by-passed.

MADbench2 can be run in two modes:

(i) regular mode, in which the full code is run.

(ii) IO mode, in which all calculation/communication is replaced with busy-work.

In addition, MADbench2 can be run as single- or multi-gang; in the former all the matrix operations are carried out distributed over all of the processors, whereas in the latter the matrices are built, summed and inverted over all the processors (S & D), but then redistributed over subsets of processors (gangs) for their subsequent manipulations (W & C). This gang-parallelism allows the data to be dense on the processors for the dominant matrix-matrix multiplication (W) phase even with very large numbers of processors.

Compiling MADbench2


To run in regular mode, MADbench2 needs to be linked to the ScaLAPACK & LAPACK libraries and their dependencies (BLAS, PBLAS, BLACS). The MADbench2.h file contains system-specific definitions and declarations; this file should be augmented as needed and the code compiled with -D SYSTEM.

To run in IO mode, MADbench2 should be compiled with -D IO (in addition to -D SYSTEM) whereupon all of the library calls are redefined to busy-work so that none of the libraries are needed.

Running MADbench2

Running MADbench2 requires:
  • a square number of processors
  • a uniform square number of processors per gang
  • a uniform number of bins per gang
  • a scalapack blocksize that distributes some data to every processor
  • a file blocksize that is a whole number of doubles
  • a number of gangs that is exactly divisible by the read-modulus and the write-modulus
each of which is checked on initialization.

In addition, MADbench2 requires 5 x NO_PIX2 x 8 bytes of memory per gang.

Error checking

All mallocs and IO calls are explicitly checked for success and MADbench2 aborts if any one fails.
In case of failure, the processor ID and attempted action are reported before exiting.

Output

MADbench2 reports the mean, minimum and maximum times spent in calculation/communication, busy-work, reading and writing in each function.

In addition, the first element of the MADspec solution vector is reported to check that the code performed correctly. In full mode, NO_PIX = 5000 & NO_BIN = 4 should return dC[0] = -9.22431e-01; IO mode always returns dC[0] = 0.00000.

MADbench2 links

Code tarball

Command-line arguments

Environment variables

Component functions

Example IO mode spreadsheet



MADbench Papers


HIPC 2004

ICPP 2005