Careers | Phone Book | A - Z Index

Previous Projects

Auto-tuning Graphs, Sparse Linear Algebra, and Particle-in-Cell Codes for Exascale (X-Stack)

This project aims to address auto-tuning’s two principal limitations: an interface ill-suited to the forthcoming ubiquitous hybrid SPMD programming model; and its scope limited to fixed-function numerical routines. Read More »

X-Tune: Auto-tuning for Exascale

Our work represents one component of a larger DOE X-Stack2 project (X-Tune) that represents a collaboration between the University of Utah, Lawrence Berkeley Lab, the University of Southern California, and Argonne National Lab. Building on the algorithmic and pathfinding work of the CACHE institute in conjunction with the CHiLL/ROSE auto-tuning framework, we at LBL are researching and developing tools that automatically implement code transformations that minimize vertical (i.e. from DRAM) data movement and aggregate horizontal (i.e. MPI) data movement. To that end, we are leveraging the CHiLL/ROSE compiler to automatically transform and autotune numerical methods including Multigrid, the Spectral Element Method, and block eigensolvers like LOBPCG. Read More »

CACHE Joint Math-CS Institute

The CACHE Institute is focused on Communication Avoiding and Communication Hiding at Extreme Scales. The project is a collaboration between researchers at Lawrence Berkeley National Lab (LBNL), Argonne National Lab (ANL), the University of California at Berkeley (UCB), and Colorado State Univeristy (CSU). Read More »

Combustion Co-Design

Researchers of the Performance and Algorithms Research group are heavily involved with Researchers from the Computer Architecture Group and the Center for Computational Science and Engineering on Co-Designing algorithms, implementation, and architecture to maximize performance and energy efficiency in the context of combustion simulations. Read More »


miniGMG is a compact benchmark for understanding the performance challenges associated with geometric multigrid solvers found in applications built from AMR MG frameworks like CHOMBO or BoxLib when running on modern multi- and manycore-based supercomputers.  It includes both productive reference examples as well as highly-optimized implementations for CPUs and GPUs.  It is sufficiently general that it has been used to evaluate a broad range of research topics including PGAS programming… Read More »

TORCH Testbed

TORCH (Testbed for Optimization Research) TORCH is a broad testbed of computational reference kernels in the context of high-performance computing. The testbed provides a detailed problem specification, input generation scheme, verification scheme, and a functional reference implementation.  With these assets, computer scientists may research a wide range of areas including algorithms, performance optimization, programming models, languages, compilers, and hardware/software co-design by… Read More »

Application Performance Characterization Benchmarking (APEX)

The starting point of our Application Performance Characterization project (Apex) is the assumption that each application or algorithm can be characterized by several major performance factors that are specific to the application and independent of the computer architecture. A synthetic benchmark then combines these factors together to simulate the application's behavior. Thus, the performance of the benchmark should be closely related to that of the corresponding application. Such… Read More »

ULTRA Evaluation

This work evaluates existing and emerging large-scale HEC architectures using a set of in-depth studies from full applications. The novel aspect of this research is the emphasis on full applica­tions, run with real input data and at the scale desired by application scientists in the domain. These problems are much more complicated than in traditional benchmarking suites such as the NAS Parallel Benchmarks or the LINPACK benchmark, and therefore reveal the kinds of performance issues… Read More »

LDRD: Enhancing the Effectiveness of Manycore Chip Technologies for High-End Computing

Jonathan Carter (PI) For the past 15 years, CPU performance has improved at an exponential pace &emdash; doubling approximately every 18 months with remarkable consistency. In order to maintain performance improvements within the conservative power envelope allowed by practical system design, the historical trend of increasing clock rates at an exponential pace has given way to a chip-scale multiprocessor (CMP) design strategy where the performance of individual CPU cores stays… Read More »