Auto-tuning Graphs, Sparse Linear Algebra, and Particle-in-Cell Codes for Exascale (X-Stack)
Automatic Performance Tuning (or Auto-tuning) has emerged as an effective means of providing performance portability from one architecture to the next. Rather than hoping a compiler can deliver optimal performance on ever more novel multicore architectures, or worse manually hand tune, auto-tuned kernels and applications can tune themselves on the target CPU, network, and programming model.
Building on our previous advances in this field, this project aims to address auto-tuning’s two principal limitations: an interface ill-suited to the forthcoming ubiquitous hybrid SPMD programming model; and its scope limited to fixed-function numerical routines. We believe an abstraction-layer is essential in hiding the complexities of hybrid communication (possibly cache-coherent shared memory + MPI), hierarchical parallelism (processes + threads + tasks), and heterogeneous architectures (multicore + manycore + accelerators) from application scientists and domain experts. Additionally, we believe that auto-tuning can be applied to many emerging numerical and non-numerical fields as of yet untouched. To that end, as part of this X-Stack project, we are developing auto-tuned components for concurrent operations and discrete data structures. In addition to enhancing performance on existing numerical routines, we believe these components provide a stepping stone towards performance on a number of emerging fields including graph analytics.
Researchers
- Samuel Williams, principal investigator (LBNL)
- Stéphane Ethier, institutional lead (PPPL)
- John Gilbert, institutional lead (UCSB)
- Adam Lugowski
- Kamesh Madduri (Penn State, formerly LBNL)
Software
- Libraries and runtimes for Data Locality and Data Synchronization
- Combinatorial BLAS (CombBLAS)
- Knowledge Discovery Toolkit (KDT)
Exascale Research Conference Materials
- Handout
- Quad Chart
- Highlights
- Poster
Publications
2017
Bei Wang, Stephane Ethier, William Tang, Khaled Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, "Modern Gyrokinetic Particle-in-cell Simulation of Fusion Plasmas on Top Supercomputers", International Journal of High-Performance Computing Applications (IJHPCA), May 2017, doi: https://doi.org/10.1177/1094342017712059
2016
William Tang, Bei Wang, Stephane Ethier, Grzegorz Kwasniewski, Torsten Hoefler, Khaled Z. Ibrahim4, Kamesh Madduri, Samuel Williams, Leonid Oliker, Carlos Rosales-Fernandez, Tim Williams, "Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide", Supercomputing, November 2016,
- Download File: sc16-gtcp-submit.pdf (pdf: 971 KB)
2014
Adam Lugowski, Shoaib Kamil, Aydın Buluç, Samuel Williams, Erika Duriakova, Leonid Oliker, Armando Fox, John R. Gilbert,, "Parallel processing of filtered queries in attributed semantic graphs", Journal of Parallel and Distributed Computing (JPDC), September 2014, doi: 10.1016/j.jpdc.2014.08.010
2013
Bei Wang, Stephane Ethier, William Tang, Timothy Williams, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, "Kinetic Turbulence Simulations at Extreme Scale on Leadership-Class Systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2013, doi: 10.1145/2503210.2503258
- Download File: sc13gtc.pdf (pdf: 1.3 MB)
A. Buluç, K. Madduri, "Graph partitioning for scalable distributed graph computations", AMS Contemporary Mathematics, Graph Partitioning and Graph Clustering (Proc. 10th DIMACS Implementation Challenge), 2013,
- Download File: DIMACSprocfinal.pdf (pdf: 2.1 MB)
Khaled Z Ibrahim, Kamesh Madduri, Samuel Williams, Bei Wang, Stephane Ethier, Leonid Oliker, "Analysis and optimization of gyrokinetic toroidal simulations on homogeneous and heterogeneous platforms", International Journal of High Performance Computing Applications (IJHPCA), July 2013, doi: 10.1177/1094342013492446
E. Solomonik, A. Buluç, J. Demmel, "Minimizing communication in all-pairs shortest paths", International Parallel and Distributed Processing Symposium (IPDPS), 2013,
- Download File: 25dapspipdps13.pdf (pdf: 256 KB)
Aydın Buluç, Erika Duriakova, Armando Fox, John Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams, "High-Productivity and High-Performance Analysis of Filtered Semantic Graphs", International Parallel and Distributed Processing Symposium (IPDPS), 2013, doi: 10.1145/2370816.2370897
- Download File: ipdps13-kdtsejits.pdf (pdf: 398 KB)
2012
B. Wang, S. Ethier, W. Tang, K. Ibrahim, K. Madduri, S. Williams, "Advances in gyrokinetic particle in cell simulation for fusion plasmas to Extreme scale", Supercomputing (SC), 2012,
A. Buluç, A. Fox, J. R. Gilbert, S. Kamil, A. Lugowski, L. Oliker, S. Williams, "High-performance analysis of filtered semantic graphs", PACT '12 Proceedings of the 21st international conference on Parallel architectures and compilation techniques (extended abstract), 2012, doi: 10.1145/2370816.2370897
K. Kandalla, A. Buluç, H. Subramoni, K. Tomko, J. Vienne, L. Oliker, D. K. Panda, "Can network-offload based non-blocking neighborhood MPI collectives improve communication overheads of irregular graph algorithms?", International Workshop on Parallel Algorithms and Parallel Software (IWPAPS 2012), 2012,
A. Buluç, J. Gilbert, "Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments", SIAM Journal on Scientific Computing (SISC), 2012,
- Download File: spgemmsisc12.pdf (pdf: 1.2 MB)
A. Lugowski, D. Alber, A. Buluç, J. Gilbert, S. Reinhardt, Y. Teng, A. Waranis, "A flexible open-source toolbox for scalable complex graph analysis", SIAM Conference on Data Mining (SDM), 2012,
- Download File: kdt-final.pdf (pdf: 753 KB)
A. Lugowski, A. Buluç, J. Gilbert, S. Reinhardt, "Scalable complex graph analysis with the knowledge discovery toolbox", International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2012,
K Madduri, J Su, S Williams, L Oliker, S Ethier, K Yelick, "Optimization of parallel particle-to-grid interpolation on leading multicore platforms", IEEE Transactions on Parallel and Distributed Systems, January 1, 2012, 23:1915--1922, doi: 10.1109/TPDS.2012.28
2011
A. Buluç, K. Madduri, "Parallel breadth-first search on distributed memory systems", Supercomputing (SC), November 2011,
- Download File: sc11bfs.pdf (pdf: 786 KB)
A. Buluç, J. Gilbert, "The Combinatorial BLAS: Design, implementation, and applications", International Journal of High-Perormance Computing Applications (IJHPCA), 2011,
- Download File: combblas-r2.pdf (pdf: 288 KB)
Kamesh Madduri, Eun-Jin Im, Khaled Z. Ibrahim, Samuel Williams, Stephane Ethier, Leonid Oliker, "Gyrokinetic Particle-in-cell Optimization on Emerging Multi- and Manycore Platforms", Parallel Computing (PARCO), January 2011, 37:501 - 520, doi: 10.1016/j.parco.2011.02.001
- Download File: parco11-gtc.pdf (pdf: 2 MB)
Kamesh Madduri, Khaled Ibrahim, Samuel Williams, Eun-Jin Im, Stephane Ethier, John Shalf, Leonid Oliker, "Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), January 2011, 23, doi: 10.1145/2063384.2063415
- Download File: sc11-gtc.pdf (pdf: 1.3 MB)