ECP PROTEAS-TUNE
PROTEAS-TUNE is a multi-institutional ECP software technology project spanning the topics of compilers, code generation, auto-tuning, and profiling. Broadly speaking, under PROTEAS-TUNE, LBL has formed a tight collaboration with the University of Utah focused on the development of the Brick Library to affect scalable, performance-portable computations on structured grids.
Research Topics
The exploitation of data locality is essential in attaining high performance on many applications that perform operations on structured grids (stencils, matrices, tensors, FFTs). Traditionally, long cache lines transparently deliver spatial locality to computations that exhibit reuse in the unit-stride dimension. Unfortunately, many computations require the exploitation of data locality in multiple dimensions. Whereas traditional compiler techniques leverage loop tiling to affect data locality, the bricks library transforms the data structure to ensure one can exploit multi-dimensional data locality via spatial locality. In essence, a 3D 256^3 array of doubles can be transformed into a 64^3 array of 4^3 "Bricks" of doubles. Each 4^3 Brick represents 512-bytes of contiguous data. Thus striding in the i-, j-,k-, dimensions imply striding by 1-, 4-, or 16-doubles when bounded within a brick. Operations that require data from neighboring bricks, must locate the relevant brick and extract the relevant data. Overall, this technique has produced a number of research opportunities including
- Code generation technologies that hide the complexity of inter-brick accesses from users,
- Autotuning the optimal brick dimensions and code generation techniques,
- Assessing the performance portability of bricks across multiple GPU and CPU platforms,
- Extending the bricks technology to a wide range of application domains,
- Exploring the use of bricks to improve the strong-scaling performance of distributed memory applications, and
- Exploring the use of bricks to affect model parallelism in AI training.
LBL Researchers
- Samuel Williams
- Hans Johansen
- Oscar Antepara
Publications
Oscar Antepara
2024
Oscar Antepara, Samuel Williams, Hans Johansen, Mary Hall, "High-Performance, Scalable Geometric Multigrid via Fine-Grain Data Blocking for GPUs", Performance, Portability & Productivity in HPC (P3HPC), November 10, 2024,
Mahesh Lakshminarasimhan, Oscar Antepara, Tuowen Zhao, Benjamin Sepanski, Protonu Basu, Hans Johansen, Mary Hall, Samuel Williams, "Bricks: A high-performance portability layer for computations on block-structured grids", The International Journal of High Performance Computing Applications (IJHPCA), August 19, 2024, doi: 10.1177/1094342024126828
Mahesh Lakshminarasimhan, Mary Hall, Samuel Williams, Oscar Antepara, "BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs", Proceedings of the 53rd International Conference on Parallel Processing (ICPP), August 12, 2024,
- Download File: ICPP24_BrickDL_final-v2.pdf (pdf: 1.7 MB)
2023
Oscar Antepara, Hans Johansen, Samuel Williams, Tuowen Zhao, Samantha Hirsch, Priya Goyal, Mary Hall, "Performance portability evaluation of blocked stencil computations on GPUs", International Workshop on Performance, Portability & Productivity in HPC (P3HPC), November 2023,
- Download File: P3HPC23_bricks_final-v4.pdf (pdf: 684 KB)
Protonu Basu
2018
Tuowen Zhao, Mary Hall, Protonu Basu, Samuel Williams, Hans Johansen, "SIMD code generation for stencils on brick decompositions", Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2018,
Hans Johansen
2024
Oscar Antepara, Samuel Williams, Hans Johansen, Mary Hall, "High-Performance, Scalable Geometric Multigrid via Fine-Grain Data Blocking for GPUs", Performance, Portability & Productivity in HPC (P3HPC), November 10, 2024,
Mahesh Lakshminarasimhan, Oscar Antepara, Tuowen Zhao, Benjamin Sepanski, Protonu Basu, Hans Johansen, Mary Hall, Samuel Williams, "Bricks: A high-performance portability layer for computations on block-structured grids", The International Journal of High Performance Computing Applications (IJHPCA), August 19, 2024, doi: 10.1177/1094342024126828
2023
Oscar Antepara, Hans Johansen, Samuel Williams, Tuowen Zhao, Samantha Hirsch, Priya Goyal, Mary Hall, "Performance portability evaluation of blocked stencil computations on GPUs", International Workshop on Performance, Portability & Productivity in HPC (P3HPC), November 2023,
- Download File: P3HPC23_bricks_final-v4.pdf (pdf: 684 KB)
2022
Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams, "Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations", MCHPC, November 2022,
- Download File: MCHPC22_final.pdf (pdf: 401 KB)
2021
Tuowen Zhao, Mary Hall, Hans Johansen, Samuel Williams, "Improving Communication by Optimizing On-Node Data Movement with Data Layout", PPoPP, February 2021,
- Download File: PPoPP-Bricks-MPI-final.pdf (pdf: 864 KB)
2019
Tuowen Zhao, Mary Hall, Samuel Williams, Hans Johansen, "Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs", Supercomputing (SC), November 2019,
- Download File: SC19-VectorScatter-final.pdf (pdf: 1019 KB)
2018
Tuowen Zhao, Samuel Williams, Mary Hall, Hans Johansen, "Delivering Performance Portable Stencil Computations on CPUs and GPUs Using Bricks", International Workshop on Performance, Portability and Productivity in HPC (P3HPC), November 2018,
- Download File: p3hpc-bricks-final.pdf (pdf: 1.3 MB)
Tuowen Zhao, Mary Hall, Protonu Basu, Samuel Williams, Hans Johansen, "SIMD code generation for stencils on brick decompositions", Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2018,
Samuel W. Williams
2024
Oscar Antepara, Samuel Williams, Hans Johansen, Mary Hall, "High-Performance, Scalable Geometric Multigrid via Fine-Grain Data Blocking for GPUs", Performance, Portability & Productivity in HPC (P3HPC), November 10, 2024,
Mahesh Lakshminarasimhan, Oscar Antepara, Tuowen Zhao, Benjamin Sepanski, Protonu Basu, Hans Johansen, Mary Hall, Samuel Williams, "Bricks: A high-performance portability layer for computations on block-structured grids", The International Journal of High Performance Computing Applications (IJHPCA), August 19, 2024, doi: 10.1177/1094342024126828
Mahesh Lakshminarasimhan, Mary Hall, Samuel Williams, Oscar Antepara, "BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs", Proceedings of the 53rd International Conference on Parallel Processing (ICPP), August 12, 2024,
- Download File: ICPP24_BrickDL_final-v2.pdf (pdf: 1.7 MB)
2023
Oscar Antepara, Hans Johansen, Samuel Williams, Tuowen Zhao, Samantha Hirsch, Priya Goyal, Mary Hall, "Performance portability evaluation of blocked stencil computations on GPUs", International Workshop on Performance, Portability & Productivity in HPC (P3HPC), November 2023,
- Download File: P3HPC23_bricks_final-v4.pdf (pdf: 684 KB)
2022
Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams, "Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations", MCHPC, November 2022,
- Download File: MCHPC22_final.pdf (pdf: 401 KB)
2021
Tuowen Zhao, Mary Hall, Hans Johansen, Samuel Williams, "Improving Communication by Optimizing On-Node Data Movement with Data Layout", PPoPP, February 2021,
- Download File: PPoPP-Bricks-MPI-final.pdf (pdf: 864 KB)
2019
Tuowen Zhao, Mary Hall, Samuel Williams, Hans Johansen, "Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs", Supercomputing (SC), November 2019,
- Download File: SC19-VectorScatter-final.pdf (pdf: 1019 KB)
2018
Tuowen Zhao, Samuel Williams, Mary Hall, Hans Johansen, "Delivering Performance Portable Stencil Computations on CPUs and GPUs Using Bricks", International Workshop on Performance, Portability and Productivity in HPC (P3HPC), November 2018,
- Download File: p3hpc-bricks-final.pdf (pdf: 1.3 MB)