Previous Projects

SciDAC-3

Researchers from FTG are engaged in a number of activities in the Scientific Discovery through Advanced Computing (SciDAC) initiative. The SciDAC program was initiated in 2001 in order to develop the Scientific Computing Software and Hardware Infrastructure needed to advance scientific discovery using supercomputers. As supercomputers continuously evolve, direct engagement of computer scientists and applied mathematicians with the scientists of targeted application domains becomes ever more… Read More »

Auto-tuning

Automatic Performance Tuning or “Auto-tuning,” is an empirical, feedback-driven performance optimization technique designed to maximize performance across a wide variety of architectures without sacrificing portability or productivity. Over the years, auto-tuning has expanded from simple loop tiling and unroll-and-jam to encompass transformations to data structures, parallelization, and algorithmic parameters. Perhaps the genesis of auto-tuning was PHiPAC (Portable High Performance… Read More »

Auto-tuning Graphs, Sparse Linear Algebra, and Particle-in-Cell Codes for Exascale (X-Stack)

This project aims to address auto-tuning’s two principal limitations: an interface ill-suited to the forthcoming ubiquitous hybrid SPMD programming model; and its scope limited to ﬁxed-function numerical routines. Read More »

X-Tune: Auto-tuning for Exascale

Our work represents one component of a larger DOE X-Stack2 project (X-Tune) that represents a collaboration between the University of Utah, Lawrence Berkeley Lab, the University of Southern California, and Argonne National Lab. Building on the algorithmic and pathfinding work of the CACHE institute in conjunction with the CHiLL/ROSE auto-tuning framework, we at LBL are researching and developing tools that automatically implement code transformations that minimize vertical (i.e. from DRAM) data movement and aggregate horizontal (i.e. MPI) data movement. To that end, we are leveraging the CHiLL/ROSE compiler to automatically transform and autotune numerical methods including Multigrid, the Spectral Element Method, and block eigensolvers like LOBPCG. Read More »

CACHE Joint Math-CS Institute

The CACHE Institute is focused on Communication Avoiding and Communication Hiding at Extreme Scales. The project is a collaboration between researchers at Lawrence Berkeley National Lab (LBNL), Argonne National Lab (ANL), the University of California at Berkeley (UCB), and Colorado State Univeristy (CSU). Read More »

Combustion Co-Design

Researchers of the Performance and Algorithms Research group are heavily involved with Researchers from the Computer Architecture Group and the Center for Computational Science and Engineering on Co-Designing algorithms, implementation, and architecture to maximize performance and energy efficiency in the context of combustion simulations. Read More »

miniGMG

miniGMG is a compact benchmark for understanding the performance challenges associated with geometric multigrid solvers found in applications built from AMR MG frameworks like CHOMBO or BoxLib when running on modern multi- and manycore-based supercomputers. It includes both productive reference examples as well as highly-optimized implementations for CPUs and GPUs. It is sufficiently general that it has been used to evaluate a broad range of research topics including PGAS programming… Read More »

TORCH Testbed

TORCH (Testbed for Optimization Research)TORCH is a broad testbed of computational reference kernels in the context of high-performance computing. The testbed provides a detailed problem specification, input generation scheme, verification scheme, and a functional reference implementation. With these assets, computer scientists may research a wide range of areas including algorithms, performance optimization, programming models, languages, compilers, and hardware/software co-design by… Read More »

Application Performance Characterization Benchmarking (APEX)

The starting point of our Application Performance Characterization project (Apex) is the assumption that each application or algorithm can be characterized by several major performance factors that are specific to the application and independent of the computer architecture. A synthetic benchmark then combines these factors together to simulate the application's behavior. Thus, the performance of the benchmark should be closely related to that of the corresponding application. Such… Read More »