X-Tune: Auto-tuning for Exascale
Automatic Performance Tuning (or Auto-tuning) has emerged as an effective means of providing performance portability from one architecture to the next. Rather than hoping a compiler can deliver optimal performance on ever more novel multicore architectures, or worse manually hand tune, auto-tuned kernels and applications can tune themselves on the target CPU, network, and programming model.
Our work represents one component of a larger DOE X-Stack2 project (X-Tune) that represents a collaboration between the University of Utah, Lawrence Berkeley Lab, the University of Southern California, and Argonne National Lab. Building on the algorithmic and pathfinding work of the CACHE institute in conjunction with the CHiLL/ROSE auto-tuning framework, we at LBL are researching and developing tools that automatically implement code transformations that minimize vertical (i.e. from DRAM) data movement and aggregate horizontal (i.e. MPI) data movement. To that end, we are leveraging the CHiLL/ROSE compiler to automatically transform and autotune numerical methods including Multigrid, the Spectral Element Method, and block eigensolvers like LOBPCG.
- Mary Hall, principal investigator (Utah)
- Samuel Williams, institutional lead (LBNL)
- Paul Hovland, institutional lead (ANL)
- Jacqueline Chame, institutional lead (USC/ISI)
- HPGMG-FV (a scalable compact benchmark developed under the ExaCT project for understanding the challenges of Geometric Multigrid on petascale and exascale systems built from multicore processors and manycore accelerators). X-Tune leverages this code for compiler research.
- miniGMG (A compact geometric multigrid benchmark developed under the CACHE project for optimization, architecture, and algorithmic research at small scale) X-Tune leverages this code for compiler research.
Exascale Research Conference Materials
- Quad Chart
Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Phillip Colella, Mary Hall, "Compiler-Based Code Generation and Autotuning for Geometric Multigrid on GPU-Accelerated Supercomputers", Parallel Computing (PARCO), April 2017, doi: 10.1016/j.parco.2017.04.002
Protonu Basu, Samuel Williams, Brian Van Straalen, Mary Hall, Leonid Oliker, Phillip Colella, "Compiler-Directed Transformation for Higher-Order Stencils", International Parallel and Distributed Processing Symposium (IPDPS), May 2015,
- Download File: ipdps15CHiLL.pdf (pdf: 1.8 MB)
Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Mary Hall, "Converting Stencils to Accumulations for Communication-Avoiding Optimization in Geometric Multigrid", Workshop on Stencil Computations (WOSC), October 2014,
- Download File: wosc14chill.pdf (pdf: 973 KB)
Protonu Basu, Anand Venkat, Mary Hall, Samuel Williams, Brian Van Straalen, Leonid Oliker, "Compiler generation and autotuning of communication-avoiding operators for geometric multigrid", 20th International Conference on High Performance Computing (HiPC), December 2013, 452--461,
- Download File: hipc13chill.pdf (pdf: 989 KB)
P. Basu, A. Venkat, M. Hall, S. Williams, B. Van Straalen, L. Oliker, "Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid", Workshop on Stencil Computations (WOSC), 2013,
Samuel Williams, X-TUNE, X-Stack PI Meeting, December 2015,
- Download File: XStackPI2015XTuneSWWilliams.pdf (pdf: 5.9 MB)
Samuel Williams, Dhiraj D. Kalamkar, Amik Singh, Anand M. Deshpande, Brian Van Straalen, Mikhail Smelyanskiy,
Ann Almgren, Pradeep Dubey, John Shalf, Leonid Oliker,
"Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark",
- Download File: miniGMGLBNL-6676E.pdf (pdf: 906 KB)