An Asynchronous Execution Modeling and Simulation Framework
Rambutan is a performance modeling and analysis tool for understanding the behavior of asynchronous, task-based execution models. It consists of a deeply-instrumented runtime that collects statistics during the execution of a task-based application across distributed memory machines. The tool keeps track of application task execution, communication costs, and runtime overheads such as task creation and deletion, queue management, dependency satisfaction (possibly remote), remote data transfers, and termination detection.
We have implemented a parameterized synthetic noise model to help us understand the performance impact of hardware error correction and OS interrupts on large distributed application runs. Previous research has shown that hardware error correction and OS noise can contribute significantly to delays in large parallel jobs. Our model allows a tunable level of noise to be randomly injected into the program execution so we can observe the sensitivity of application performance to varying levels (both frequency and duration) of noise under different execution model configurations (different task granularity, bulk-synchronous, static task DAG, dynamic task DAG).
We are also investigating how the sensitivity of algorithms to injected noise varies with degree of irregularity in the task graph structure itself. For example, stencil computations over uniform cartesian grids have a very regular task graph structure, while sparse matrix calculations can have very irregular task graph structure with more inherent "slack" that can be tolerated in the task schedule.
(Image: Dongarra et. al.)
We have implemented three main benchmarks that span the range from very regular to very irregular: stencil, sparse cholesky factorization, and unbalanced tree search (UTS). These benchmarks represent a range of potential task-based applications both in terms of task graph structure and arithmetic intensity, and they have different sensitivities to choices in the execution model, such as task granularity and scheduling policy.
We are investigating the impact of two-level parallelism in task-based execution models. By allowing the tasks that compose the task graph to be themselves parallel, we can help mitigate the scheduling overhead problem associated with creating a sufficient number of tasks to saturate all of the cores available on modern and future hardware platforms. We are also investigating a task-based runtime's ability to dynamically schedule tasks on heterogeneous architectures to take advantage of features such as accelerators and/or multi-level memory. Given sufficient metadata about the tasks in a given application, a task-based runtime is well-suited to handling the staging of data across disparate memories and the scheduling of tasks on heterogeneous compute resources.
Under the hood, Rambutan leverages the GASNet PGAS library to provide remote synchronization and communication. It leverages a combination of active messages and RDMAs for task and data dependency satisfaction.