PROTEAS-TUNE is a multi-institutional ECP software technology project spanning the topics of compilers, code generation, auto-tuning, and profiling. Broadly speaking, under PROTEAS-TUNE, LBL has formed a tight collaboration with the University of Utah focused on the development of the Brick Library to affect scalable, performance-portable computations on structured grids.
The exploitation of data locality is essential in attaining high performance on many applications that perform operations on structured grids (stencils, matrices, tensors, FFTs). Traditionally, long cache lines transparently deliver spatial locality to computations that exhibit reuse in the unit-stride dimension. Unfortunately, many computations require the exploitation of data locality in multiple dimensions. Whereas traditional compiler techniques leverage loop tiling to affect data locality, the bricks library transforms the data structure to ensure one can exploit multi-dimensional data locality via spatial locality. In essence, a 3D 256^3 array of doubles can be transformed into a 64^3 array of 4^3 "Bricks" of doubles. Each 4^3 Brick represents 512-bytes of contiguous data. Thus striding in the i-, j-,k-, dimensions imply striding by 1-, 4-, or 16-doubles when bounded within a brick. Operations that require data from neighboring bricks, must locate the relevant brick and extract the relevant data. Overall, this technique has produced a number of research opportunities including
- Code generation technologies that hide the complexity of inter-brick accesses from users,
- Autotuning the optimal brick dimensions and code generation techniques,
- Assessing the performance portability of bricks across multiple GPU and CPU platforms,
- Extending the bricks technology to a wide range of application domains,
- Exploring the use of bricks to improve the strong-scaling performance of distributed memory applications, and
- Exploring the use of bricks to affect model parallelism in AI training.
Oscar Antepara, Hans Johansen, Samuel Williams, Tuowen Zhao, Samantha Hirsch, Priya Goyal, Mary Hall, "Performance portability evaluation of blocked stencil computations on GPUs", International Workshop on Performance, Portability & Productivity in HPC (P3HPC), November 2023,
- Download File: P3HPC23_bricks_final-v4.pdf (pdf: 684 KB)
Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams, "Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations", MCHPC, November 2022,
- Download File: MCHPC22_final.pdf (pdf: 401 KB)
Tuowen Zhao, Mary Hall, Hans Johansen, Samuel Williams, "Improving Communication by Optimizing On-Node Data Movement with Data Layout", PPoPP, February 2021,
- Download File: PPoPP-Bricks-MPI-final.pdf (pdf: 864 KB)
Tuowen Zhao, Mary Hall, Samuel Williams, Hans Johansen, "Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs", Supercomputing (SC), November 2019,
- Download File: SC19-VectorScatter-final.pdf (pdf: 1019 KB)