Careers | Phone Book | A - Z Index

Auto-tuning Graphs, Sparse Linear Algebra, and Particle-in-Cell Codes for Exascale (X-Stack)

Automatic Performance Tuning (or Auto-tuning) has emerged as an effective means of providing performance portability from one architecture to the next. Rather than hoping a compiler can deliver optimal performance on ever more novel multicore architectures, or worse manually hand tune, auto-tuned kernels and applications can tune themselves on the target CPU, network, and programming model.

Building on our previous advances in this field, this project aims to address auto-tuning’s two principal limitations: an interface ill-suited to the forthcoming ubiquitous hybrid SPMD programming model; and its scope limited to fixed-function numerical routines.  We believe an abstraction-layer is essential in hiding the complexities of hybrid communication (possibly cache-coherent shared memory + MPI), hierarchical parallelism (processes + threads + tasks), and heterogeneous architectures (multicore + manycore + accelerators) from application scientists and domain experts. Additionally, we believe that auto-tuning can be applied to many emerging numerical and non-numerical fields as of yet untouched.  To that end, as part of this X-Stack project, we are developing auto-tuned components for concurrent operations and discrete data structures.  In addition to enhancing performance on existing numerical routines, we believe these components provide a stepping stone towards performance on a number of emerging fields including graph analytics.



  • Libraries and runtimes for Data Locality and Data Synchronization
  • Combinatorial BLAS (CombBLAS)
  • Knowledge Discovery Toolkit (KDT)

Exascale Research Conference Materials

  • Handout
  • Quad Chart
  • Highlights
  • Poster



Bei Wang, Stephane Ethier, William Tang, Khaled Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, "Modern Gyrokinetic Particle-in-cell Simulation of Fusion Plasmas on Top Supercomputers", (to appear) International Journal of High-Performance Computing Applications (IJHPCA), May 2017,


William Tang, Bei Wang, Stephane Ethier, Grzegorz Kwasniewski, Torsten Hoefler, Khaled Z. Ibrahim4, Kamesh Madduri, Samuel Williams, Leonid Oliker, Carlos Rosales-Fernandez, Tim Williams, "Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide", Supercomputing, November 2016,


Adam Lugowski, Shoaib Kamil, Aydın Buluç, Samuel Williams, Erika Duriakova, Leonid Oliker, Armando Fox, John R. Gilbert,, "Parallel processing of filtered queries in attributed semantic graphs", Journal of Parallel and Distributed Computing (JPDC), September 2014, doi: 10.1016/j.jpdc.2014.08.010


Bei Wang, Stephane Ethier, William Tang, Timothy Williams, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, "Kinetic Turbulence Simulations at Extreme Scale on Leadership-Class Systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2013, doi: 10.1145/2503210.2503258

A. Buluç, K. Madduri, "Graph partitioning for scalable distributed graph computations", AMS Contemporary Mathematics, Graph Partitioning and Graph Clustering (Proc. 10th DIMACS Implementation Challenge), 2013,

Khaled Z Ibrahim, Kamesh Madduri, Samuel Williams, Bei Wang, Stephane Ethier, Leonid Oliker, "Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms", International Journal of High Performance Computing Applications (IJHPCA), July 2013, doi: 10.1177/1094342013492446

Aydın Buluç, Erika Duriakova, Armando Fox, John Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams, "High-Productivity and High-Performance Analysis of Filtered Semantic Graphs", International Parallel and Distributed Processing Symposium (IPDPS), 2013, doi: 10.1145/2370816.2370897

E. Solomonik, A. Buluç, J. Demmel, "Minimizing communication in all-pairs shortest paths", International Parallel and Distributed Processing Symposium (IPDPS), 2013,


B. Wang, S. Ethier, W. Tang, K. Ibrahim, K. Madduri, S. Williams, "Advances in gyrokinetic particle in cell simulation for fusion plasmas to Extreme scale", Supercomputing (SC), 2012,

K. Madduri, J. Su, S. Williams, L. Oliker, S. Ethier, K. Yelick, "Optimization of Parallel Particle-to-Grid Interpolation on Leading Multicore Platforms", Transactions on Parallel and Distributed Systems (TPDS), October 1, 2012, doi: 10.1109/TPDS.2012.28

A. Buluç, A. Fox, J. R. Gilbert, S. Kamil, A. Lugowski, L. Oliker, S. Williams, "High-performance analysis of filtered semantic graphs", PACT '12 Proceedings of the 21st international conference on Parallel architectures and compilation techniques (extended abstract), 2012, doi: 10.1145/2370816.2370897

K. Kandalla, A. Buluç, H. Subramoni, K. Tomko, J. Vienne, L. Oliker, D. K. Panda, "Can network-offload based non-blocking neighborhood MPI collectives improve communication overheads of irregular graph algorithms?", International Workshop on Parallel Algorithms and Parallel Software (IWPAPS 2012), 2012,

A. Buluç, J. Gilbert, "Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments", SIAM Journal on Scientific Computing (SISC), 2012,

A. Lugowski, D. Alber, A. Buluç, J. Gilbert, S. Reinhardt, Y. Teng, A. Waranis, "A flexible open-source toolbox for scalable complex graph analysis", SIAM Conference on Data Mining (SDM), 2012,

A. Lugowski, A. Buluç, J. Gilbert, S. Reinhardt, "Scalable complex graph analysis with the knowledge discovery toolbox", International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2012,


A. Buluç, K. Madduri, "Parallel breadth-first search on distributed memory systems", Supercomputing (SC), November 2011,

A. Buluç, J. Gilbert, "The Combinatorial BLAS: Design, implementation, and applications", International Journal of High-Perormance Computing Applications (IJHPCA), 2011,

A. Buluç, S. Williams, L. Oliker, J. Demmel, "Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication", International Parallel Distributed Processing Symposium (IPDPS), May 2011, doi: 10.1109/IPDPS.2011.73

Kamesh Madduri, Khaled Ibrahim, Samuel Williams, Eun-Jin Im, Stephane Ethier, John Shalf, Leonid Oliker, "Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), January 2011, 23, doi: 10.1145/2063384.2063415

Kamesh Madduri, Eun-Jin Im, Khaled Z. Ibrahim, Samuel Williams, Stephane Ethier, Leonid Oliker, "Gyrokinetic Particle-in-cell Optimization on Emerging Multi- and Manycore Platforms", Parallel Computing (PARCO), January 2011, 37:501 - 520, doi: 10.1016/j.parco.2011.02.001