Enabling Task-Based Performance Models for AMR
In order to model the behavior of AMR solvers that run in an asynchronous fashion, we have developed a tool that builds a skeleton task dependency graph for a variety of AMR algorithms. The task dependency graph generated contains critical performance information, such as compute time estimates and required communication traffic volume. The task graph exposes the true data dependencies of the constituent tasks and removes false dependencies that are often introduced as a byproduct of bulk-synchronous programming models such as MPI. For example, a rank that owns multiple boxes might wait unnecessarily for *all* of its boxes to complete an iteration before any box may proceed to the next, even if a box has all of its individual data dependencies satisfied early.
Our tool interacts with the SST/macro network simulator to model the performance of AMR algorithms on future supercomputer architectures. It provides comparisons between task execution models, data placement strategies, and machine architecture parameters such as compute/bandwidth performance balance. We are also investigating performance trade-offs between AMR algorithm choices by producing skeletonized task-dependency graphs and communication traces for AMR and MG algorithm variants.