ECP PROTEAS-TUNE

PROTEAS-TUNE is a multi-institutional ECP software technology project spanning the topics of compilers, code generation, auto-tuning, and profiling. Broadly speaking, under PROTEAS-TUNE, LBL has formed a tight collaboration with the University of Utah focused on the development of the Brick Library to affect scalable, performance-portable computations on structured grids.

Research Topics

The exploitation of data locality is essential in attaining high performance on many applications that perform operations on structured grids (stencils, matrices, tensors, FFTs). Traditionally, long cache lines transparently deliver spatial locality to computations that exhibit reuse in the unit-stride dimension. Unfortunately, many computations require the exploitation of data locality in multiple dimensions. Whereas traditional compiler techniques leverage loop tiling to affect data locality, the bricks library transforms the data structure to ensure one can exploit multi-dimensional data locality via spatial locality. In essence, a 3D 256^3 array of doubles can be transformed into a 64^3 array of 4^3 "Bricks" of doubles. Each 4^3 Brick represents 512-bytes of contiguous data. Thus striding in the i-, j-,k-, dimensions imply striding by 1-, 4-, or 16-doubles when bounded within a brick. Operations that require data from neighboring bricks, must locate the relevant brick and extract the relevant data. Overall, this technique has produced a number of research opportunities including

Code generation technologies that hide the complexity of inter-brick accesses from users,
Autotuning the optimal brick dimensions and code generation techniques,
Assessing the performance portability of bricks across multiple GPU and CPU platforms,
Extending the bricks technology to a wide range of application domains,
Exploring the use of bricks to improve the strong-scaling performance of distributed memory applications, and
Exploring the use of bricks to affect model parallelism in AI training.

LBL Researchers

Publications

Sort by: Date | Author | Type

2024

Oscar Antepara, Samuel Williams, Hans Johansen, Mary Hall, "High-Performance, Scalable Geometric Multigrid via Fine-Grain Data Blocking for GPUs", Performance, Portability & Productivity in HPC (P3HPC), November 10, 2024,

Download File: P3HPC24_bricks_mg_final.pdf (pdf: 358 KB)

Mahesh Lakshminarasimhan, Oscar Antepara, Tuowen Zhao, Benjamin Sepanski, Protonu Basu, Hans Johansen, Mary Hall, Samuel Williams, "Bricks: A high-performance portability layer for computations on block-structured grids", The International Journal of High Performance Computing Applications (IJHPCA), August 19, 2024, doi: 10.1177/10943420241268288

Mahesh Lakshminarasimhan, Mary Hall, Samuel Williams, Oscar Antepara, "BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs", Proceedings of the 53rd International Conference on Parallel Processing (ICPP), August 12, 2024,

Download File: ICPP24_BrickDL_final-v2.pdf (pdf: 1.7 MB)

2023

Oscar Antepara, Hans Johansen, Samuel Williams, Tuowen Zhao, Samantha Hirsch, Priya Goyal, Mary Hall, "Performance portability evaluation of blocked stencil computations on GPUs", International Workshop on Performance, Portability & Productivity in HPC (P3HPC), November 2023,

Download File: P3HPC23_bricks_final-v4.pdf (pdf: 684 KB)

2022

Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams, "Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations", MCHPC, November 2022,

Download File: MCHPC22_final.pdf (pdf: 401 KB)

2021

Tuowen Zhao, Mary Hall, Hans Johansen, Samuel Williams, "Improving Communication by Optimizing On-Node Data Movement with Data Layout", PPoPP, February 2021,

Download File: PPoPP-Bricks-MPI-final.pdf (pdf: 864 KB)

2019

Tuowen Zhao, Mary Hall, Samuel Williams, Hans Johansen, "Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs", Supercomputing (SC), November 2019,

Download File: SC19-VectorScatter-final.pdf (pdf: 1019 KB)

2018

Tuowen Zhao, Samuel Williams, Mary Hall, Hans Johansen, "Delivering Performance Portable Stencil Computations on CPUs and GPUs Using Bricks", International Workshop on Performance, Portability and Productivity in HPC (P3HPC), November 2018,

Download File: p3hpc-bricks-final.pdf (pdf: 1.3 MB)

Tuowen Zhao, Mary Hall, Protonu Basu, Samuel Williams, Hans Johansen, "SIMD code generation for stencils on brick decompositions", Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2018,