Samuel Williams

Senior Scientist

SWWilliams@lbl.gov

Phone: 510-486-5936

LBNL Office: 59-3040 (CRT)

One Cyclotron Rd.

MS:59R4104

Berkeley, CA 94720-8142 us

Biographical Sketch

Sam Williams is a senior scientist in the Performance and Algorithms Research Group at the Lawrence Berkeley National Laboratory (LBNL). His research interests include high-performance computing, performance modeling, machine learning, computer architecture, and hardware/software co-design.

Dr. Williams received his Ph.D. and masters in Computer Science from the University of California at Berkeley (UCB). His doctoral research focused on multicore architectures and automated performance tuning under the guidance of David Patterson. To that end, Dr. Williams created the Roofline Model to enable developers, computer scientists, computer architects, and applied mathematicians to quickly and visually assess performance bottlenecks on multicore, manycore, and GPU-accelerated systems.

Previously, as a graduate student, Sam worked in several Parallel Computing Laboratory (ParLab) research groups including: BeBOP, Architecture, and the Berkeley View. His masters research was funded through the IRAM project under which he implemented the RTL for the integer and floating-point datapaths, verified the simulators and all RTL, floorplanned the entire VIRAM1 chip, and performed all necessary place-and-route (PnR) work. Sam received bachelor degrees in Electrical Engineering (computer specialization), Mathematics (applied), and Physics from Southern Methodist University (SMU) graduating summa cum laude. While an undergraduate, he spent five semesters as a paid intern at Cyrix corporation where he worked on RTL and gate-level verification, PnR (place and route), and silicon debug.

Current Research

Roofline Model (A performance model for throughput computing)
AMCR-NERSC Collaboration on Architectural Evaluation and Performance/Energy Modeling
- Power and Energy Benchmarking Modeling, Analysis, Attribution, and Optimization
- Performance Modeling and Analysis of Deep Neural Networks for Machine Learning for Science
- Benchmarking FPGAs for HPC computing
- Performance Modeling of Disaggregated Memory Architectures
- Performance Modeling and Analysis Disaggregated (GPU) Accelerated Architectures
- Performance Modeling and Analysis of Scientific Workflows
PROTEAS-TUNE (a compiler-based approach to auto-tuning) ECP Project
- Development of novel data structure transformations for scalable, performance-portable computations on GPU-accelerated systems.
RAPIDS SciDAC4 CS/Data Institute
Data-Driven Surrogate/Reduced Order Modeling for simulation of fusion devices
EFIT-AI SciDAC4 FES Partnership
- Performance-Portable GPU acceleration of Tokmak plasma equilibrium reconstruction codes
CTTS (Center for Tokamak Transient Simulations) SciDAC4 FES Partnership
- Optimization of GPU-accelerated (direct solver) preconditioners for fusion simulations
ISEP (Integrated Simulation of Energetic Particles) SciDAC4 FES Partnership
- Development of surrogate models to accelerate/replace fusion simulations
- Optimization of particle-in-cell simulations on GPU-accelerated HPC systems
HPGMG-FV (High Performance Geometric Multigrid)
- scalable compact benchmark for understanding the challenges of Geometric Multigrid on petascale and exascale systems built from multicore processors, manycore processors, and accelerators
- used to evaluate alternatives to HPL for the Top500 rankings.
- source code

Previous Research

ECP Hardware Evaluation Project
AMReX (Adaptive Mesh Refinement for Exascale) ECP CoDesign Center
Exascale Combustion Co-Design Center (ExaCT)
SUPER SciDAC-3 Institute, the Roofline Model, and four SciDAC-3 application partnerships
X-Tune (a compiler-based approach to auto-tuning)
X-Stack - Auto-tuning sparse linear algebra, Graph Analytics, and Particle-in-Cell codes
miniGMG was the predecessor to HPGMG and was designed primarily for understanding geometric MG challenges at small-scale
Joint Math-CS institute on Communication-Avoiding Algrothms (CACHE)
Ultra-scale Evaluation and Optimization
Chemistry SciDAC-e
Testbed for Optimization Research (TORCH) LDRD
LDRD on Green Flash/Wave
LDRD on Multicore Optimization and Auto-tuning

Books

alt Performance Tuning of Scientific Applications, edited by: David H. Bailey, Robert F. Lucas, Samuel W. Williams, CRC Press, 2010, ISBN: 978-1-4398156-9-4.

Honors and Awards

Best Short Paper, Performance Modeling, Benchmarking, and Simulation (PMBS), 2024
Best Paper, Workshop on Accelerator Programming and Directives (WACCPD), 2023
Best Paper, Performance Modeling, Benchmarking, and Simulation (PMBS), 2021
Best Paper, Performance Modeling, Benchmarking, and Simulation (PMBS), 2020
Best Paper, International Symposium on Benchmarking, Measuring, and Optimizing (Bench), 2019
Best Paper, Performance Modeling, Benchmarking, and Simulation (PMBS), 2019
Best Paper, Cray Users Group (CUG), 2009
Best Paper, Applications Track, International Parallel and Distributed Processing Symposium (IPDPS) 2008
2nd Place, Student Design Competition, Design Automation Conference, 2004.
Phi Beta Kappa, 1999
Eta Kappa Nu, 1996
Tau Beta Pi, 1995
Sigma Pi Sigma, 1998
Robert Stewart Hyer Society (highest academic honor at SMU), 1996
Outstanding Senior in SMU's School of Engineering and Applied Science, 1999
SMU's Charles J. Pipes Award for Outstanding Performance in Mathematics, 1998
University of California Microelectronics Fellowship, 1999-2000
Robert Stewart Hyer Scholarship (outstanding physics student), 1996
J. Lindsay Embrey Scholarship, 1994-1999
SMU University Scholarship, 1994-1999

Journal Articles

Zhe Bai, Xishuo Wei, William Tang, Leonid Oliker, Zhihong Lin, Samuel Williams, "Transfer Learning Nonlinear Plasma Dynamic Transitions in Low Dimensional Embeddings via Deep Neural Networks", Machine Learning: Science and Technology, April 8, 2025, doi: 10.1088/2632-2153/adca83

Mustafa Mutiur Rahman, Zhe Bai, Jacob Robert King, Carl R. Sovinec, Xishuo Wei, Samuel Williams, Yang Liu, "Sparsified time-dependent Fourier neural operators for fusion simulations", Phys. Plasmas, December 4, 2024, 31:12, doi: 10.1063/5.0232503

Xuan Jiang, Raja Sengupta, James Demmel, Samuel Williams, "Large scale multi-GPU based parallel traffic simulation for accelerated traffic assignment and propagation", Transportation Research Part C: Emerging Technologies, December 2024, 169:104873, doi: 10.1016/j.trc.2024.104873

Mahesh Lakshminarasimhan, Oscar Antepara, Tuowen Zhao, Benjamin Sepanski, Protonu Basu, Hans Johansen, Mary Hall, Samuel Williams, "Bricks: A high-performance portability layer for computations on block-structured grids", The International Journal of High Performance Computing Applications (IJHPCA), August 19, 2024, doi: 10.1177/10943420241268288

Nan Ding, Pieter Maris, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, LeAnn Lindsey, Christopher Daley, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, Samuel Williams, "Evaluating the potential of disaggregated memory systems for HPC applications", Concurrency and Computation, Practice and Experience (CCPE), May 2024, doi: https://doi.org/10.1002/cpe.8147

Marco Siracusa, Emanuele Del Sozzo, Marco Rabozzi, Lorenzo Di Tucci, Samuel Williams, Donatella Sciuto, Marco Domenico Santambrogio, "A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model", Transactions on Computers (TC), September 2021, doi: 10.1109/TC.2021.3111761

Tan Nguyen, Colin MacLean, Marco Siracusa, Douglas Doerfler, Nicholas J. Wright, Samuel Williams, "FPGA‐based HPC accelerators: An evaluation on performance and energy efficiency", CCPE, August 22, 2021, doi: 10.1002/cpe.6570

Nan Ding, Muaaz Awan, Samuel Williams, "Instruction Roofline: An insightful visual performance model for GPUs", CCPE, August 4, 2021, doi: 10.1002/cpe.6591

Charlene Yang, Yunsong Wang, Thorsten Kurth, Steven Farrell, Samuel Williams, "Hierarchical Roofline Performance Analysis for Deep Learning Applications", Intelligent Computing, LNNS, July 15, 2021, doi: 10.1007/978-3-030-80126-7

Charlene Yang, Thorsten Kurth, Samuel Williams, "Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC-9 Perlmutter system", Concurrency and Computation: Practice and Experience (CCPE), August 2019, doi: 10.1002/cpe.5547

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

Wenjing Ma, Yulong Ao, Chao Yang, Samuel Williams, "Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight", Cluster Computing, May 2019, doi: 10.1007/s10586-019-02938-w

Jack Deslippe, Doug Doerfler, Brandon Cook, Tareq Malas, Samuel Williams, Sudip Dosanjh, "Optimizing science applications for the Cori, Knights Landing, System at NERSC", Advances in Parallel Computing, New Frontiers in High Performance Computing and Big Data, August 2017, 30, doi: 10.3233/978-1-61499-816-7-235

Bei Wang, Stephane Ethier, William Tang, Khaled Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, "Modern Gyrokinetic Particle-in-cell Simulation of Fusion Plasmas on Top Supercomputers", International Journal of High-Performance Computing Applications (IJHPCA), May 2017, doi: https://doi.org/10.1177/1094342017712059

Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Phillip Colella, Mary Hall, "Compiler-Based Code Generation and Autotuning for Geometric Multigrid on GPU-Accelerated Supercomputers", Parallel Computing (PARCO), April 2017, doi: 10.1016/j.parco.2017.04.002

Khaled Z. Ibrahim, Evgeny Epifanovsky, Samuel Williams, Anna I. Krylov, "Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends", Journal of Parallel and Distributed Computing (JPDC), February 2017, doi: 10.1016/j.jpdc.2017.02.010

Ariful Azad, Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, Samuel Williams, "Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication", SIAM Journal on Scientific Computing, 38(6), C624–C651, November 2016, doi: 10.1137/15M104253X

Download File: SISC-SpGEMM.pdf (pdf: 1.5 MB)

Nicholas Chaimov, Khaled Z. Ibrahim, Samuel Williams, Costin Iancu, "Reaching Bandwidth Saturation Using Transparent Injection Parallelization", International Journal of High Performance Computing Applications (IJHPCA), November 2016, doi: 10.1177/1094342016672720

Hasan Metin Aktulga, Md. Afibuzzaman, Samuel Williams, Aydın Buluc, Meiyue Shao, Chao Yang, Esmond G. Ng, Pieter Maris, James P. Vary, "A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations", IEEE Transactions on Parallel and Distributed Systems (TPDS), November 2016, doi: 10.1109/TPDS.2016.2630699

Download File: ieeetpds-mfdn-lobpcg-rev.pdf (pdf: 889 KB)

Pieter Ghysels, Xiaoye S. Li, François-Henry Rouet, Samuel Williams, Artem Napov, "An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling", SIAM J. Sci. Comput. 38-5, pp. S358-S384, October 2016, doi: 10.1137/15M1010117

J. R. Jones, F.-H. Rouet, K. V. Lawler, E. Vecharynski, K. Z. Ibrahim, S. Williams, B. Abeln, C. Yang, C. W. McCurdy, D. J. Haxton, X. S. Li, T. N. Rescigno, "An efficient basis set representation for calculating electrons in molecules", Journal of Molecular Physics, 2016, doi: 10.1080/00268976.2016.1176262

The method of McCurdy, Baertschy, and Rescigno, J. Phys. B, 37, R137 (2004) is generalized to obtain a straightforward, surprisingly accurate, and scalable numerical representation for calculating the electronic wave functions of molecules. It uses a basis set of product sinc functions arrayed on a Cartesian grid, and yields 1 kcal/mol precision for valence transition energies with a grid resolution of approximately 0.1 bohr. The Coulomb matrix elements are replaced with matrix elements obtained from the kinetic energy operator. A resolution-of-the-identity approximation renders the primitive one- and two-electron matrix elements diagonal; in other words, the Coulomb operator is local with respect to the grid indices. The calculation of contracted two-electron matrix elements among orbitals requires only O(N log(N)) multiplication operations, not O(N^4), where N is the number of basis functions; N = n^3 on cubic grids. The representation not only is numerically expedient, but also produces energies and properties superior to those calculated variationally. Absolute energies, absorption cross sections, transition energies, and ionization potentials are reported for one- (He^+, H_2^+ ), two- (H_2, He), ten- (CH_4) and 56-electron (C_8H_8) systems.The method of McCurdy, Baertschy, and Rescigno, J. Phys. B, 37, R137 (2004) is generalized to obtain a straightforward, surprisingly accurate, and scalable numerical representation for calculating the electronic wave functions of molecules. It uses a basis set of product sinc functions arrayed on a Cartesian grid, and yields 1 kcal/mol precision for valence transition energies with a grid resolution of approximately 0.1 bohr. The Coulomb matrix elements are replaced with matrix elements obtained from the kinetic energy operator. A resolution-of-the-identity approximation renders the primitive one- and two-electron matrix elements diagonal; in other words, the Coulomb operator is local with respect to the grid indices. The calculation of contracted two-electron matrix elements among orbitals requires only O(N log(N)) multiplication operations, not O(N^4), where N is the number of basis functions; N = n^3 on cubic grids. The representation not only is numerically expedient, but also produces energies and properties superior to those calculated variationally. Absolute energies, absorption cross sections, transition energies, and ionization potentials are reported for one- (He^+, H_2^+ ), two- (H_2, He), ten- (CH_4) and 56-electron (C_8H_8) systems.

Nicholas Chaimov, Khaled Ibrahim, Samuel Williams, Costin Iancu, "Exploiting Communication Concurrency on High Performance Computing Systems", IJHPCA, April 17, 2015,

Download File: thorserv2.pdf (pdf: 1.7 MB)

D Unat, C Chan, W Zhang, S Williams, J Bachan, J Bell, J Shalf, "ExaSAT: An exascale co-design tool for performance modeling", International Journal of High Performance Computing Applications, January 2015, 29:209--232, doi: 10.1177/1094342014568690

Download File: International-Journal-of-High-Performance-Computing-Applications-2015-Unat-209-32.pdf (pdf: 4.3 MB)

Adam Lugowski, Shoaib Kamil, Aydın Buluç, Samuel Williams, Erika Duriakova, Leonid Oliker, Armando Fox, John R. Gilbert,, "Parallel processing of filtered queries in attributed semantic graphs", Journal of Parallel and Distributed Computing (JPDC), September 2014, doi: 10.1016/j.jpdc.2014.08.010

Khaled Z Ibrahim, Kamesh Madduri, Samuel Williams, Bei Wang, Stephane Ethier, Leonid Oliker, "Analysis and optimization of gyrokinetic toroidal simulations on homogeneous and heterogeneous platforms", International Journal of High Performance Computing Applications (IJHPCA), July 2013, doi: 10.1177/1094342013492446

K Madduri, J Su, S Williams, L Oliker, S Ethier, K Yelick, "Optimization of parallel particle-to-grid interpolation on leading multicore platforms", IEEE Transactions on Parallel and Distributed Systems, January 1, 2012, 23:1915--1922, doi: 10.1109/TPDS.2012.28

Kamesh Madduri, Eun-Jin Im, Khaled Z. Ibrahim, Samuel Williams, Stephane Ethier, Leonid Oliker, "Gyrokinetic Particle-in-cell Optimization on Emerging Multi- and Manycore Platforms", Parallel Computing (PARCO), January 2011, 37:501 - 520, doi: 10.1016/j.parco.2011.02.001

Download File: parco11-gtc.pdf (pdf: 2 MB)

S. Zhou, D. Duffy, T. Clune, M. Suarez, S. Williams, M. Halem, "The Impact of IBM Cell Technology on the Programming Paradigm in the Context of Computer Systems for Climate and Weather Models", Concurrency and Computation:Practice and Experience (CCPE), August 2009, doi: 10.1002/cpe.1482

S. Williams, A. Waterman, D. Patterson, "Roofline: an insightful visual performance model for multicore architectures", Communications of the ACM (CACM), April 2009, doi: 10.1145/1498765.1498785

K Datta, S Kamill, S Williams, L Oliker, J Shalf, K Yelick, "Optimization and performance modeling of stencil computations on modern microprocessors", SIAM Review, 2009, 51:129--159, doi: 10.1137/070693199

Download File: sirev09-stencil.pdf (pdf: 2.8 MB)

S Williams, J Carter, L Oliker, J Shalf, K Yelick, "Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms", Journal of Parallel and Distributed Computing, 2009, 69:762--777, doi: 10.1016/j.jpdc.2009.04.002

Download File: jpdc09-lbmhd.pdf (pdf: 1.1 MB)

S. Williams, K. Datta, J. Carter, L. Oliker, J. Shalf, K. Yelick, D. Bailey, "PERI: Auto-tuning Memory Intensive Kernels for Multicore", SciDAC PI Meeting, Journal of Physics: Conference Series, 125 012038, July 2008, doi: 10.1088/1742-6596/125/1/012038

Download File: jpconf8125012089.pdf (pdf: 874 KB)

D. Bailey, J. Chame, C. Chen, J. Dongarra, M. Hall, J. Hollingsworth, P. Hovland, S. Moore, K. Seymour, J. Shin, A. Tiwari, S. Williams, H. You, "PERI Auto-tuning", SciDAC PI Meeting, Journal of Physics: Conference Series, 125 012001, 2008,

Download File: jpconf8125012038.pdf (pdf: 1.2 MB)

S Williams, J Shalf, L Oliker, S Kamil, P Husbands, K Yelick, "Scientific computing kernels on the cell processor", International Journal of Parallel Programming, January 2007, 35:263--298, doi: 10.1007/s10766-007-0034-5

Download File: ijpp07-cell.pdf (pdf: 1000 KB)

S Williams, L Oliker, R Vuduc, J Shalf, K Yelick, J Demmel, "Optimization of sparse matrix-vector multiplication on emerging multicore platforms", Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC 07, 2007, doi: 10.1145/1362622.1362674

Download File: parco08-spmv.pdf (pdf: 1.5 MB)

C. Kozyrakis, D. Judd, J. Gebis, S. Williams, D. Patterson, K. Yelick, "Hardware/Compiler Co-development for an Embedded Media Processor", Proceedings of the IEEE, 2001, doi: 10.1109/5.964446

Conference Papers

Oscar Antepara, Zhengji Zhao, Brian Austin, Nan Ding, Leonid Oliker, Nicholas J. Wright, Samuel Williams, "Benchmark-driven Models for Energy Analysis and Attribution of GPU-accelerated Supercomputing", Supercomputing (SC), November 2025,

Nan Ding, Oscar Antepara, Zhengji Zhao, Brian Austin, Leonid Oliker, Nicholas J. Wright, Samuel Williams, "Maximizing Power-Constrained Supercomputing Throughput", ISC'25, June 11, 2025,

Download File: ISC25_GPU_Power_Cap.pdf (pdf: 5.2 MB)

Nan Ding, Brian Austin, Yang Liu, Neil Mehta, Steven Farrell, Johannes P. Blaschke, Leonid Oliker, Hai Ah Nam, Nicholas J. Wright, Samuel Williams, "A Workflow Roofline Model for End-to-End Workflow Performance Analysis", Supercomputing (SC), November 17, 2024,

Download File: Workflow_roofline-6.pdf (pdf: 1.2 MB)

Shashank Subramanian, Ermal Rrapaj, Peter Harrington, Smeet Chheda, Steven Farrell, Brian Austin, Samuel Williams, Nicholas Wright, Wahid Bhimji, "Comprehensive Performance Modeling and System Design Insights for Foundation Models", Performance Modeling, Benchmarking, and Simulation (PMBS), November 2024,

Download File: PMBS24_ModelingTransformerTraining_final.pdf (pdf: 736 KB)

Brian Austin, Dhruva Kulkarni, Brandon Cook, Samuel Williams, Nicholas J. Wright, "System-Wide Roofline Profiling - a Case Study on NERSC’s Perlmutter Supercomputer", (BEST SHORT PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2024,

Download File: PMBS24_DCGM_final.pdf (pdf: 319 KB)

Oscar Antepara, Samuel Williams, Hans Johansen, Mary Hall, "High-Performance, Scalable Geometric Multigrid via Fine-Grain Data Blocking for GPUs", Performance, Portability & Productivity in HPC (P3HPC), November 10, 2024,

Download File: P3HPC24_bricks_mg_final.pdf (pdf: 358 KB)

Oscar Antepara, Samuel Williams, Max Carlson, Jerry Watkins, "Performance Portable Optimizations of an Ice-sheet Modeling Code on GPU-supercomputers", Performance, Portability & Productivity in HPC (P3HPC), November 2024,

Download File: P3HPC24_IceSheet_final-v2.pdf (pdf: 1.4 MB)

Sterling Smith, Zichuan Anthony Xing, Torrin Bechtel, Severin Denk, Earl DeShazer, Orso Meneghini, Tom Neiser, Laurie Stephey, Oscar Antepara, Christopher Mitchell Clark, Eli Dart, Pengfei Ding, Sean Flanagan, Raffi Nazikian, David Schissel, Christine Simpson, Nicholas Tyler, Thomas D. Uram, Samuel Williams, "Expediting Higher Fidelity Plasma State Reconstructions for the DIII-D National Fusion Facility Using Leadership Class Computing Resources", Extreme-Scale Experiment-in-the-Loop Computing (XLOOP), November 2024,

Mahesh Lakshminarasimhan, Mary Hall, Samuel Williams, Oscar Antepara, "BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs", Proceedings of the 53rd International Conference on Parallel Processing (ICPP), August 12, 2024,

Download File: ICPP24_BrickDL_final-v2.pdf (pdf: 1.7 MB)

Oscar Antepara, Hans Johansen, Samuel Williams, Tuowen Zhao, Samantha Hirsch, Priya Goyal, Mary Hall, "Performance portability evaluation of blocked stencil computations on GPUs", International Workshop on Performance, Portability & Productivity in HPC (P3HPC), November 2023,

Download File: P3HPC23_bricks_final-v4.pdf (pdf: 684 KB)

Oscar Antepara, Samuel Williams, Scott Kruger, Torrin Bechtel, Joseph McClenaghan, Lang Lao, "Performance-Portable GPU Acceleration of the EFIT Tokamak Plasma Equilibrium Reconstruction Code", (BEST PAPER), Workshop on Accelerator Programming and Directives (WACCPD), November 2023,

Download File: WACCPD23_EFIT_final.pdf (pdf: 697 KB)

Yang Liu, Nan Ding, Piyush Sao, Samuel Williams, Xiaoye Sherry Li, "Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters", Supercomputing (SC), November 2023,

Download File: SC23_3DSpTRSV_final.pdf (pdf: 2.9 MB)

Nan Ding, Muhammad Haseeb, Taylor Groves, Samuel Williams, "Evaluating the Performance of One-sided Communication on CPUs and GPUs", 2023 International Workshop on Performance, Portability & Productivity in HPC, November 12, 2023,

Download File: OneSided_MPI_P3HPC_.pdf (pdf: 2.5 MB)

Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", RESDIS, https://resdis.github.io/ws/2022/sc/, November 18, 2022,

Download File: Methodology-for-Evaluating-the-Potential-of-Disaggregated-Memory-Systems.pdf (pdf: 5.1 MB)

Taylor Groves, Chris Daley, Rahulkumar Gayatri, Hai Ah Nam, Nan Ding, Lenny Oliker, Nicholas J. Wright, Samuel Williams, "A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures", PMBS, November 2022,

Download File: PMBS22_GPU_final.pdf (pdf: 719 KB)

Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams, "Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations", MCHPC, November 2022,

Download File: MCHPC22_final.pdf (pdf: 401 KB)

Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,

Download File: pmbs21-DL-final.pdf (pdf: 632 KB)

Nan Ding, Yang Liu, Samuel Williams, Xiaoye S. Li, "A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), July 19, 2021,

Download File: Multi-GPU-SpTRSV-ACDA21-.pdf (pdf: 897 KB)

Douglas Doerfler, Farzad Fatollahi-Fard, Colin MacLean, Tan Nguyen, Samuel Williams, Nicholas J. Wright, Marco Siracusa, "Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs", International Workshop on OpenCL (iWOCL), April 2021, doi: 10.1145/3456669.3456671

Tuowen Zhao, Mary Hall, Hans Johansen, Samuel Williams, "Improving Communication by Optimizing On-Node Data Movement with Data Layout", PPoPP, February 2021,

Download File: PPoPP-Bricks-MPI-final.pdf (pdf: 864 KB)

Anastasiia Butko, George Michelogiannakis, Samuel Williams, Costin Iancu, David Donofrio, John Shalf, Jonathan Carter, Irfan Siddiqi, "Understanding Quantum Control Processor Capabilities and Limitations through Circuit Characterization", IEEE Conference on Rebooting Computing (ICRC), December 2020,

Download File: ICRC20-QUASAR-final.pdf (pdf: 1.1 MB)

Tan Nguyen, Samuel Williams, Marco Siracusa, Colin MacLean, Douglas Doerfler, Nicholas J. Wright, "The Performance and Energy Efficiency Potential of FPGAs in Scientific Computing", (BEST PAPER) Performance Modeling, Benchmarking, and Simulation of High Performance Computer Systems (PMBS), November 2020,

Download File: PMBS20-FPGA-final.pdf (pdf: 2.9 MB)

Yunsong Wang, Charlene Yang, Steven Farrell, Yan Zhang, Thorsten Kurth, Samuel Williams, "Time-Based Roofline for Deep Learning Performance Analysis", Deep Learning on Supercomputing (DLonSC), November 2020,

Download File: DLonSC20-TimeRoofline-final.pdf (pdf: 534 KB)

Marco Siracusa, Marco Rabozzi, Emanuele Del Sozzo, Lorenzo Di Tucci, Samuel Williams, Marco D. Santambrogio, "A CAD-based methodology to optimize HLS code via the Roofline model", International Conference on Computer Aided Design (ICCAD), November 2020, doi: 10.1145/3400302.3415730

Christopher Daley, Hadia Ahmed, Samuel Williams, Nicholas Wright, "A case study of porting HPGMG from CUDA to OpenMP target offload", The International Workshop on OpenMP (IWOMP), September 2020,

Download File: p24-daley.pdf (pdf: 272 KB)

Jonathan R Madsen, Muaaz G Awan, Hugo Brunie, Jack Deslippe, Rahul Gayatri, Leonid Oliker, Yunsong Wang, Charlene Yang, Samuel Williams, "TiMemory: Modular Performance Analysis for HPC", International Supercomputing Conference (ISC), June 2020, doi: 10.1007/978-3-030-50743-5_22

Nan Ding, Samuel Williams, Yang Liu, Xiaoye S. Li, "Leveraging One-Sided Communication for Sparse Triangular Solvers", 2020 SIAM Conference on Parallel Processing for Scientific Computing, February 14, 2020,

Download File: One-side-SPTRS-SIAM-PP20-.pdf (pdf: 2.9 MB)

T Groves, B Brock, Y Chen, KZ Ibrahim, L Oliker, NJ Wright, S Williams, K Yelick, "Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches", Proceedings of PMBS 2020: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis, January 2020, 126--137, doi: 10.1109/PMBS51919.2020.00016

Download File: PMBS20-NVSHMEM-final.pdf (pdf: 659 KB)

Tuowen Zhao, Mary Hall, Samuel Williams, Hans Johansen, "Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs", Supercomputing (SC), November 2019,

Download File: SC19-VectorScatter-final.pdf (pdf: 1019 KB)

Nan Ding, Samuel Williams, "An Instruction Roofline Model for GPUs", Performance Modeling, Benchmarking, and Simulation (PMBS), BEST PAPER AWARD, November 18, 2019,

Download File: InstructionRooflineModel-PMBS19-.pdf (pdf: 970 KB)

Khaled Ibrahim, Samuel Williams, Leonid Oliker, "Performance Analysis of GPU Programming Models using the Roofline Scaling Trajectories", International Symposium on Benchmarking, Measuring and Optimizing (Bench), BEST PAPER AWARD, November 2019,

Charlene Yang, Thorsten Kurth, Samuel Williams, "Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System", Cray User Group (CUG), May 2019,

Download File: cug19-roofline-final.pdf (pdf: 493 KB)

Tuowen Zhao, Samuel Williams, Mary Hall, Hans Johansen, "Delivering Performance Portable Stencil Computations on CPUs and GPUs Using Bricks", International Workshop on Performance, Portability and Productivity in HPC (P3HPC), November 2018,

Download File: p3hpc-bricks-final.pdf (pdf: 1.3 MB)

Charlene Yang, Rahulkumar Gayatri, Thorsten Kurth, Protonu Basu, Zahra Ronaghi, Adedoyin Adetokunbo, Brian Friesen, Brandon Cook, Douglas Doerfler, Leonid Oliker, Jack Deslippe, Samuel Williams, "An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability", International Workshop on Performance, Portability and Productivity in HPC (P3HPC), November 2018,

Download File: p3hpc-roofline-final.pdf (pdf: 372 KB)

Hongzhang Shan, Samuel Williams, Calvin W. Johnson, "Improving MPI Reduction Performance for Manycore Architectures with OpenMP and Data Compression", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2018,

Download File: pmbs18-reduce-final.pdf (pdf: 572 KB)

Khaled Ibrahim, Samuel Williams, Leonid Oliker, "Roofline Scaling Trajectories: A Method for Parallel Application and Architectural Performance Analysis", HPCS Special Session on High Performance Computing Benchmarking and Optimization (HPBench), July 2018,

Download File: hpbench18-roofline.pdf (pdf: 2.4 MB)

Tuomas Koskela, Zakhar Matveev, Charlene Yang, Adetokunbo Adedoyin, Roman Belenov, Philippe Thierry, Zhengji Zhao, Rahulkumar Gayatri, Hongzhang Shan, Leonid Oliker, Jack Deslippe, Ron Green, and Samuel Williams, "A Novel Multi-Level Integrated Roofline Model Approach for Performance Characterization", ISC, June 2018,

Download File: ISC18-RooflineAdvisor-final.pdf (pdf: 966 KB)

Charlene Yang, Brian Friesen, Thorsten Kurth, Brandon Cook, Samuel Williams, "Toward Automated Application Profiling on Cray Systems", Cray User Group (CUG), May 2018,

Download File: CUG18-profiling.pdf (pdf: 184 KB)

Tuowen Zhao, Mary Hall, Protonu Basu, Samuel Williams, Hans Johansen, "SIMD code generation for stencils on brick decompositions", Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2018,

Philip C. Roth, Hongzhang Shan, David Riegner, Nikolas Antolin, Sarat Sreepathi, Leonid Oliker, Samuel Williams, Shirley Moore, Wolfgang Windl, "Performance Analysis and Optimization of the RAMPAGE Metal Alloy Potential Generation Software", SIGPLAN International Workshop on Software Engineering for Parallel Systems (SEPS), October 2017,

Hongzhang Shan, Samuel Williams, Calvin Johnson, Kenneth McElvain, "A Locality-based Threading Algorithm for the Configuration-Interaction Method", Parallel and Distributed Scientific and Engineering Computing (PDSEC), June 2017,

Download File: pdsec17-bigstick.pdf (pdf: 715 KB)

Bryce Adelstein Lelbach, Hans Johansen, Samuel Williams, "Simultaneously Solving Swarms of Small Sparse Systems on SIMD Silicon", Parallel and Distributed Scientific and Engineering Computing (PDSEC), June 2017,

Brandon Cook, Thorsten Kurth, Brian Austin, Samuel Williams, Jack Deslippe, "Performance Variability on Xeon Phi", Intel Xeon Phi Users Group (IXPUG), June 2017,

Thorsten Kurth, William Arndt, Taylor Barnes, Brandon Cook, Jack Deslippe, Doug Doerfler, Brian Friesen, Yun (Helen) He, Tuomas Koskela, Mathieu Lobet, Tareq Malas, Leonid Oliker, Andrey Ovsyannikov, Samuel Williams, Woo-Sun Yang, and Zhengji Zhao, "Analyzing Performance of Selected NESAP Applications on the Cori HPC System", Intel Xeon Phi Users Group (IXPUG), June 2017,

Download File: ixpug17-nesap.pdf (pdf: 395 KB)

Nathan Zhang, Michael Driscoll, Armando Fox, Charles Markley, Samuel Williams, Protonu Basu, "Snowflake: A Lightweight Portable Stencil DSL", High-level Parallel Programming Models and Supportive Environments (HIPS), May 2017,

Download File: hips17-snowflake.pdf (pdf: 475 KB)

William Tang, Bei Wang, Stephane Ethier, Grzegorz Kwasniewski, Torsten Hoefler, Khaled Z. Ibrahim4, Kamesh Madduri, Samuel Williams, Leonid Oliker, Carlos Rosales-Fernandez, Tim Williams, "Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide", Supercomputing, November 2016,

Download File: sc16-gtcp-submit.pdf (pdf: 971 KB)

Taylor Barnes, Brandon Cook, Jack Deslippe, Douglas Doerfler, Brian Friesen, Yun (Helen) He, Thorsten Kurth, Tuomas Koskela, Mathieu Lobet, Tareq Malas, Leonid Oliker, Andrey Ovsyannikov, Abhinav Sarje, Jean-Luc Vay, Henri Vincenti, Samuel Williams, Pierre Carrier, Nathan Wichmann, Marcus Wagner, Paul Kent, Christopher Kerr, John Dennis, "Evaluating and Optimizing the NERSC Workload on Knights Landing", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2016,

Download File: PMBS16-KNL.pdf (pdf: 789 KB)

Zhaoyi Meng, Alice Koniges, Yun (Helen) He, Samuel Williams, Thorsten Kurth, Brandon Cook, Jack Deslippe, and Andrea L. Bertozzi, "OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms", 12th International Workshop on OpenMP (iWOMP), October 2016, doi: 10.1007/978-3-319-45550-1_2

Douglas Doerfer, Jack Deslippe, Samuel Williams, Leonid Oliker, Brandon Cook, Thorsten Kurth, Mathieu Lobet, Tareq Malas, Jean-Luc Vay, and Henri Vincenti, "Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor", Intel Xeon Phi User Group Workshop (IXPUG), June 2016,

Download File: ixpug16-roofline.pdf (pdf: 575 KB)

Abhinav Sarje, Douglas W. Jacobsen, Samuel W. Williams, Todd Ringler, Leonid Oliker, "Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers", Cray User Group (CUG), London, UK, May 2016,

H Shan, S Williams, Y Zheng, W Zhang, B Wang, S Ethier, Z Zhao, IEEE, "Experiences of Applying One-Sided Communication to Nearest-Neighbor Communication", PROCEEDINGS OF PAW 2016: 1ST PGAS APPLICATIONS WORKSHOP (PAW), January 2016, 17--24, doi: 10.1109/PAW.2016.008

Download File: PAW16-stencil.pdf (pdf: 601 KB)

Hongzhang Shan, Kenneth McElvain, Calvin Johnson, Samuel Williams, W. Erich Ormand, "Parallel Implementation and Performance Optimization of the Configuration-Interaction Method", Supercomputing (SC), November 2015, doi: 10.1145/2807591.2807618

Download File: sc15-bigstick.pdf (pdf: 864 KB)

Alex Druinsky, Pieter Ghysels, Xiaoye S. Li, Osni Marques, Samuel Williams, Andrew Barker, Delyan Kalchev, Panayot Vassilevski, "Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Multicore and Manycore Architectures", International Conference on Parallel Processing and Applied Mathematics (PPAM), September 6, 2015, doi: 10.1007/978-3-319-32149-3_12

Hongzhang Shan, Samuel Williams, Yili Zheng, Amir Kamil, Katherine Yelick,, "Implementing High-Performance Geometric Multigrid Solver with Naturally Grained Messages", 9th International Conference on Partitioned Global Address Space Programming Models (PGAS), September 2015, 38--46, doi: 10.1109/PGAS.2015.12

Download File: pgas15-hpgmg.pdf (pdf: 803 KB)

Abhinav Sarje, Sukhyun Song, Douglas Jacobsen, Kevin Huck, Jeffrey Hollingsworth, Allen Malony, Samuel Williams, and Leonid Oliker, "Parallel Performance Optimizations on Unstructured Mesh-Based Simulations", Procedia Computer Science, 1877-0509, June 2015, 51:2016-2025, doi: 10.1016/j.procs.2015.05.466

This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intra- node data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.

Protonu Basu, Samuel Williams, Brian Van Straalen, Mary Hall, Leonid Oliker, Phillip Colella, "Compiler-Directed Transformation for Higher-Order Stencils", International Parallel and Distributed Processing Symposium (IPDPS), May 2015,

Download File: ipdps15CHiLL.pdf (pdf: 1.8 MB)

Hongzhang Shan, Samuel Williams, Wibe de Jong, Leonid Oliker, "Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture", Programming Models and Applications for Multicores and Manycores (PMAM), February 2015,

Download File: pmam15nwchem.pdf (pdf: 1.1 MB)

Costin Iancu, Nicholas Chaimov, Khaled Z. Ibrahim, Samuel Williams, "Exploiting Communication Concurrency on High Performance Computing Systems", Programming Models and Applications for Multicores and Manycores (PMAM), February 2015,

Download File: pmam15-servers.pdf (pdf: 1.2 MB)

Khaled Z. Ibrahim, Samuel W. Williams, Evgeny Epifanovsky, Anna I. Krylov, "Analysis and Tuning of Libtensor Framework on Multicore Architectures", High Performance Computing Conference (HIPC), December 2014,

Download File: HIPC14-libtensor.pdf (pdf: 277 KB)

Yu Jung Lo, Samuel Williams, Brian Van Straalen, Terry J. Ligocki, Matthew J. Cordery, Leonid Oliker, Mary W. Hall, "Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2014, doi: 10.1007/978-3-319-17248-4_7

Download File: PMBS14-Roofline.pdf (pdf: 340 KB)

Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Mary Hall, "Converting Stencils to Accumulations for Communication-Avoiding Optimization in Geometric Multigrid", Workshop on Stencil Computations (WOSC), October 2014,

Download File: wosc14chill.pdf (pdf: 973 KB)

Hongzhang Shan, Amir Kamil, Samuel Williams, Yili Zheng, Katherine Yelick, "Evaluation of PGAS Communication Paradigms with Geometric Multigrid", Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (PGAS), October 2014, doi: 10.1145/2676870.2676874

Download File: PGAS14-miniGMG.pdf (pdf: 1.2 MB)

Partitioned Global Address Space (PGAS) languages and one-sided communication enable application developers to select the communication paradigm that balances the performance needs of applications with the productivity desires of programmers. In this paper, we evaluate three different one-sided communication paradigms in the context of geometric multigrid using the miniGMG benchmark. Although miniGMG's static, regular, and predictable communication does not exploit the ultimate potential of PGAS models, multigrid solvers appear in many contemporary applications and represent one of the most important communication patterns. We use UPC++, a PGAS extension of C++, as the vehicle for our evaluation, though our work is applicable to any of the existing PGAS languages and models. We compare performance with the highly tuned MPI baseline, and the results indicate that the most promising approach towards achieving performance and ease of programming is to use high-level abstractions, such as the multidimensional arrays provided by UPC++, that hide data aggregation and messaging in the runtime library.

George Michelogiannakis, Alexander Williams, Samuel Williams, John Shalf, "Collective Memory Transfers for Multi-Core Chips", International Conference on Supercomputing (ICS), June 2014, doi: 10.1145/2597652.2597654

Download File: cms2.pdf (pdf: 613 KB)

H. M. Aktulga, A. Buluc, S. Williams, C. Yang, "Optimizing Sparse Matrix-Multiple Vector Multiplication for Nuclear Configuration Interaction Calculations", International Parallel and Distributed Processing Symposium (IPDPS 2014), May 2014, doi: 10.1109/IPDPS.2014.125

Download File: ipdps14mfdnfinal.pdf (pdf: 631 KB)

Samuel Williams, Mike Lijewski, Ann Almgren, Brian Van Straalen, Erin Carson, Nicholas Knight, James Demmel, "s-step Krylov subspace methods as bottom solvers for geometric multigrid", Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, January 2014, 1149--1158, doi: 10.1109/IPDPS.2014.119

Download File: ipdps14cabicgstabfinal.pdf (pdf: 943 KB)
Download File: ipdps14CABiCGStabtalk.pdf (pdf: 944 KB)

Protonu Basu, Anand Venkat, Mary Hall, Samuel Williams, Brian Van Straalen, Leonid Oliker, "Compiler generation and autotuning of communication-avoiding operators for geometric multigrid", 20th International Conference on High Performance Computing (HiPC), December 2013, 452--461,

Download File: hipc13chill.pdf (pdf: 989 KB)

Bei Wang, Stephane Ethier, William Tang, Timothy Williams, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, "Kinetic Turbulence Simulations at Extreme Scale on Leadership-Class Systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2013, doi: 10.1145/2503210.2503258

Download File: sc13gtc.pdf (pdf: 1.3 MB)

P. Basu, A. Venkat, M. Hall, S. Williams, B. Van Straalen, L. Oliker, "Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid", Workshop on Stencil Computations (WOSC), 2013,

Christopher D. Krieger, Michelle Mills Strout, Catherine Olschanowsky, Andrew Stone, Stephen Guzik, Xinfeng Gao, Carlo Bertolli, Paul H.J. Kelly, Gihan Mudalige, Brian Van Straalen, Sam Williams, "Loop chaining: A programming abstraction for balancing locality and parallelism", Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International, May 2013, 375--384, doi: 10.1109/IPDPSW.2013.68

Aydın Buluç, Erika Duriakova, Armando Fox, John Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams, "High-Productivity and High-Performance Analysis of Filtered Semantic Graphs", International Parallel and Distributed Processing Symposium (IPDPS), 2013, doi: 10.1145/2370816.2370897

Download File: ipdps13-kdtsejits.pdf (pdf: 398 KB)

S. Williams, D. Kalamkar, A. Singh, A. Deshpande, B. Van Straalen, M. Smelyanskiy, A. Almgren, P. Dubey, J. Shalf, L. Oliker, "Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2012, doi: 10.1109/SC.2012.85

Download File: sc12-mg.pdf (pdf: 808 KB)
Download File: sc12mgtalk.pdf (pdf: 1.9 MB)

P. Narayanan, A. Koniges, L. Oliker, R. Preissl, S. Williams, N. Wright, M. Umansky, X. Xu, S. Ethier, W. Wang, J. Candy, J. Cary, "Performance Characterization for Fusion Co-design Applications", Cray Users Group (CUG), May 2011,

Download File: cug11-fusion.pdf (pdf: 377 KB)

Aydın Buluç, Samuel Williams, Leonid Oliker, James Demmel, "Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication", IPDPS, IEEE, 2011, doi: https://doi.org/10.1109/IPDPS.2011.73

Download File: ipdps2011.pdf (pdf: 770 KB)

Kamesh Madduri, Khaled Ibrahim, Samuel Williams, Eun-Jin Im, Stephane Ethier, John Shalf, Leonid Oliker, "Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), January 2011, 23, doi: 10.1145/2063384.2063415

Download File: sc11-gtc.pdf (pdf: 1.3 MB)

Samuel Williams, Oliker, Carter, John Shalf, "Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), New York, NY, USA, ACM, January 2011, 55, doi: 10.1145/2063384.2063458

Download File: sc11-lbmhd.pdf (pdf: 666 KB)
Download File: sc11lbmhdtalk.pdf (pdf: 1.4 MB)

Jens Krueger, David Donofrio, John Shalf, Marghoob Mohiyuddin, Samuel Williams, Leonid Oliker, Franz-Josef Pfreund, "Hardware/software co-design for energy-efficient seismic modeling", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), January 2011, 73, doi: 10.1145/2063384.2063482

Download File: sc11-greenwave.pdf (pdf: 614 KB)

E. Strohmaier, S. Williams, A. Kaiser, K. Madduri, K. Ibrahim, D. Bailey, J. Demmel,, "A Kernel Testbed for Parallel Architecture, Language, and Performance Research", International Conference of Numerical Analysis and Applied Mathematics (ICNAAM), June 1, 2010, doi: 10.1063/1.3497950

A. Kaiser, S. Williams, K. Madduri, K. Ibrahim, D. Bailey, J. Demmel, E. Strohmaier, "A Principled Kernel Testbed for Hardware/Software Co-Design Research", Proceedings of the 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar), 2010,

Download File: hotpar10-dwarfs.pdf (pdf: 128 KB)

Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams, "An auto-tuning framework for parallel multicore stencil computations", International Parallel & Distributed Processing Symposium (IPDPS), January 1, 2010, 1-12, doi: 10.1109/IPDPS.2010.5470421

Download File: ipdps10-ast.pdf (pdf: 789 KB)

A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros, R. Vuduc, "Optimizing and Tuning the Fast Multipole Method for State-of-the-Art Multicore Architectures", International Parallel & Distributed Processing Symposium (IPDPS), 2010, doi: 10.1109/IPDPS.2010.5470415

Download File: ipdps10-fmm.pdf (pdf: 671 KB)

Shoaib Kamil, Cy Chan, Samuel Williams, Leonid Oliker, John Shalf, Mark Howison, E. Wes Bethel, Prabhat, "A Generalized Framework for Auto-tuning Stencil Computations", BEST PAPER AWARD - Cray User Group Conference (CUG), Atlanta, GA, May 4, 2009, LBNL 2078E,

Download File: cug09-autotune.pdf (pdf: 354 KB)

Best Paper Award

S. Williams, J. Carter, L. Oliker, J. Shalf, K. Yelick, "Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4", Proceedings of the Cray User Group (CUG), Atlanta, GA, 2009,

Download File: cug09-lbmhd.pdf (pdf: 443 KB)

K Madduri, S Williams, S Ethier, L Oliker, J Shalf, E Strohmaier, K Yelick, "Memory-efficient optimization of gyrokinetic particle-to-grid interpolation for multicore processors", Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 09, January 2009, doi: 10.1145/1654059.1654108

Download File: sc09-gtc.pdf (pdf: 3 MB)

K. Datta, S. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, K. Yelick, "Auto-Tuning the 27-point Stencil for Multicore", Proceedings of Fourth International Workshop on Automatic Performance Tuning (iWAPT2009), January 2009,

Download File: iwapt09-27pt.pdf (pdf: 465 KB)

J Gebis, L Oliker, J Shalf, S Williams, K Yelick, "Improving memory subsystem performance using ViVA: Virtual vector architecture", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, 5455 LNC:146--158, doi: 10.1007/978-3-642-00454-4_16

Download File: arcs09-viva.pdf (pdf: 448 KB)

Marghoob Mohiyuddin, Murphy, Oliker, Shalf, Wawrzynek, Samuel Williams, "A design methodology for domain-optimized power-efficient supercomputing", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2009, doi: 10.1145/1654059.1654072

Download File: sc09-cotuning.pdf (pdf: 912 KB)

K Datta, M Murphy, V Volkov, S Williams, J Carter, L Oliker, D Patterson, J Shalf, K Yelick, "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures", 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, January 2008, doi: 10.1109/SC.2008.5222004

Download File: sc08-stencil.pdf (pdf: 598 KB)

S Williams, J Carter, L Oliker, J Shalf, K Yelick, "Lattice Boltzmann simulation optimization on leading multicore platforms", IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM, 2008, doi: 10.1109/IPDPS.2008.4536295

Download File: ipdps08-lbmhd.pdf (pdf: 560 KB)

Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, James Demmel, "Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2007, doi: 10.1145/1362622.1362674

Download File: sc07-spmv.pdf (pdf: 438 KB)

S. Williams, J. Shalf, L. Oliker, P. Husbands, S. Kamil, K. Yelick, "The Potential of the Cell Processor for Scientific Computing", ACM International Conference on Computing Frontiers, 2006, doi: 10.1145/1128022.1128027

Download File: cf06-cell-potential.pdf (pdf: 213 KB)

S Kamil, K Datta, S Williams, L Oliker, J Shalf, K Yelick, "Implicit and explicit optimizations for stencil computations", Proceedings of the 2006 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2006, 2006, 51--60, doi: 10.1145/1178597.1178605

Download File: mspc06-stencil.pdf (pdf: 421 KB)

J. Gebis, S. Williams, D. Patterson, C. Kozyrakis, "VIRAM1: A Media-Oriented Vector Processor with Embedded DRAM", 41st Design Automation Student Design Contest (DAC), 2004,

Download File: dac04-iram.pdf (pdf: 216 KB)

Books

David H. Bailey, Robert F. Lucas, Samuel W. Williams, ed., Performance Tuning of Scientific Applications, (CRC Press: 2011)

Book Chapters

James Demmel, Samuel Williams, Katherine Yelick, "Automatic Performance Tuning (Autotuning)", The Berkeley Par Lab: Progress in the Parallel Computing Landscape, edited by David Patterson, Dennis Gannon, Michael Wrinn, (Microsoft Research: August 2013) Pages: 337-376

Samuel W. Williams, David H. Bailey, "Parallel Computer Architecture", Performance Tuning of Scientific Applications, edited by David H. Bailey, Robert F. Lucas, Samuel W. Williams, (CRC Press: 2010) Pages: 11-33

S. Williams, N. Bell, J. W. Choi, M. Garland, L. Oliker, R. Vuduc, "Sparse Matrix-Vector Multiplication on Multicore and Accelerators", chapter in Scientific Computing with Multicore and Accelerators, edited by Jack Dongarra, David A. Bader, Jakub Kurzak, ( 2010)

S. Williams, "The Roofline Model", chapter in Performance Tuning of Scientific Applications, edited by David H. Bailey, Robert F. Lucas, Samuel W. Williams, (CRC Press: 2010)

S Williams, K Datta, L Oliker, J Carter, J Shalf, K Yelick, "Auto-Tuning Memory-Intensive Kernels for Multicore", Chapman \& Hall/CRC Computational Science, (CRC Press: 2010) Pages: 273--296 doi: 10.1201/b10509-14

K Datta, S Williams, V Volkov, J Carter, L Oliker, J Shalf, K Yelick, "Auto-tuning stencil computations on multicore and accelerators", Scientific Computing with Multicore and Accelerators, ( 2010) Pages: 219--254 doi: 10.1201/b10376

Presentation/Talks

Nan Ding, Muhammad Haseeb, Taylor Groves, Samuel Williams, Evaluating the Performance of One-sided Communication on CPUs and GPUs, 2023 International Workshop on Performance, Portability & Productivity in HPC, November 13, 2023,

Download File: ws_p3hpc112.pdf (pdf: 4.7 MB)

Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, February 8, 2023,

Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, May 2022,

Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, April 2021,

Download File: ECP21-Roofline-1-intro.pdf (pdf: 22 MB)

Samuel Williams, Roofline Analysis on NVIDIA GPUs, ECP Annual Meeting, April 2021,

Download File: ECP21-Roofline-2-NVIDIA.pdf (pdf: 14 MB)

Samuel Williams, Introduction to the Roofline Model, Supercomputing (SC), November 2020,

Download File: 2020.11.09-1005-tut108-Tutorial-Williams-Samuel.pdf (pdf: 25 MB)

Samuel Williams, The Roofline Model: A Bridge between Computer Science, Applied Math, and Computational Science, SciDAC Meeting, July 2020,

Download File: SciDAC20-Roofline-SWWilliams.pdf (pdf: 13 MB)

Samuel Williams, Introduction to the Roofline Model, NERSC NVIDIA Roofline Hackathon, July 2020,

Download File: NVIDIA-Roofline-intro.pdf (pdf: 33 MB)

Samuel Williams, Introduction to the Roofline Model, NERSC GPU For Science Workshop, July 2020,

Download File: GPU-For-Science-Roofline-SWWilliams.pdf (pdf: 9.6 MB)

Samuel Williams, Charlene Yang, Yunsong Wang, Roofline Performance Modeling for HPC and Deep Learning Applications, NVIDIA GPU Technology Conference (GTC), March 2020,

Download File: S21565-Roofline-1-Intro.pdf (pdf: 22 MB)

Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, February 2020,

Download File: ECP20-Roofline-1-intro.pdf (pdf: 24 MB)

Samuel Williams, Roofline on GPUs (Advanced Topics), ECP Annual Meeting, February 2020,

Download File: ECP20-Roofline-3-advanced-gpu.pdf (pdf: 18 MB)

Charlene Yang, Samuel Williams, Performance Analysis of GPU-Accelerated Applications using the Roofline Model, GPU Technology Conference (GTC), March 2019,

Download File: GTC19-Roofline.pdf (pdf: 73 MB)

Samuel Williams, Performance Modeling and Analysis, CS267 Lecture, University of California at Berkeley, February 14, 2019,

Download File: CS267-2019-Roofline-SWWilliams.pptx (pptx: 15 MB)
Download File: CS267-2019-Roofline-SWWilliams.pdf (pdf: 35 MB)

Samuel Williams, Introduction to the Roofline Model, Roofline Tutorial, ECP Annual Meeting, January 2019,

Download File: ECP19-Roofline-1-intro.pdf (pdf: 9.9 MB)

Samuel Williams, Roofline on CPU-based Systems, Roofline Tutorial, ECP Annual Meeting, January 2019,

Download File: ECP19-Roofline-3-cpu.pdf (pdf: 26 MB)

Samuel Williams, Introduction to the Roofline Model, Supercomputing, November 2018,

Download File: SC18-Roofline-1-intro.pdf (pdf: 18 MB)

Samuel Williams, Roofline on Manycore and Accelerated Systems, ModSim, August 2018,

Download File: ModSim18-SWWilliams.pdf (pdf: 15 MB)

Samuel Williams, Parallelism and Performance, MolSSI Summer School, August 2018,

Download File: MolSSI18-SWWilliams.pdf (pdf: 17 MB)

Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, February 8, 2018,

Download File: ECP18-Roofline-1-intro.pdf (pdf: 9.1 MB)

Samuel Williams, Advisor Hand-On: Stencil Example, ECP Annual Meeting, February 8, 2018,

Download File: ECP18-Roofline-6-stencil.pdf (pdf: 3.3 MB)

Samuel Williams, Performance Modeling and Analysis, CS267 lecture, University of California at Berkeley, January 30, 2018,

Download File: CS267-Roofline-SWWilliams.pdf (pdf: 18 MB)
Download File: CS267-Roofline-SWWilliams.pptx (pptx: 17 MB)

Samuel Williams, Introduction to the Roofline Model, Roofline Training, November 2017,

Download File: roofline-intro.pptx (pptx: 3.1 MB)
Download File: roofline-intro.pdf (pdf: 3.6 MB)

Mark Adams, Samuel Williams, HPGMG BoF - Introduction, HPGMG BoF, Supercomputing, November 2016,

Download File: SC16-HPGMG-BoF-Intro.pdf (pdf: 1020 KB)

Samuel Williams, HPGMG on the Knights Landing Processor, HPGMG BoF, Supercomputing, November 2016,

Download File: SC16-HPGMG-BoF-KNL.pdf (pdf: 958 KB)

Samuel Williams, HPGMG Benchmark, Top500 BoF, Supercomputing, November 2016,

Download File: SC16-Top500-BoF-HPGMG.pdf (pdf: 1003 KB)

Samuel Williams, Mark Adams, Brian Van Straalen, Performance Portability in Hybrid and Heterogeneous Multigrid Solvers, Copper Moutain, March 2016,

Download File: CU16SWWilliams.pptx (pptx: 1 MB)

S Williams, D Patterson, L Oliker, J Shalf, K Yelick, The roofline model: A pedagogical tool for program analysis and optimization, 2008 IEEE Hot Chips 20 Symposium, HCS 2008, 2016, doi: 10.1109/HOTCHIPS.2008.7476531

Download File: parlab08-roofline-talk.pdf (pdf: 4.2 MB)
Download File: parlab08-roofline-talk.ppt (ppt: 4.3 MB)

Samuel Williams, X-TUNE, X-Stack PI Meeting, December 2015,

Download File: XStackPI2015XTuneSWWilliams.pdf (pdf: 5.9 MB)

Samuel Williams, 4th Order HPGMG-FV Implementation, HPGMG BoF, Supercomputing, November 2015,

Download File: SC15HPGMGBoF4thOrder.pdf (pdf: 1.6 MB)

Samuel Williams, HPGMG-FV, FastForward2 Proxy App Presentation, December 2014,

Download File: HPGMG-FV-FF2-Proxy-App.pptx (pptx: 985 KB)
Download File: HPGMG-FV-FF2-Proxy-App.pdf (pdf: 1.9 MB)

Mark Adams, Samuel Williams, Jed Brown, HPGMG, Birds of a Feather (BoF), Supercomputing, November 2014,

Download File: SC14HPGMGBoF.pdf (pdf: 1.9 MB)

Samuel Williams, At Exascale, Will Bandwidth Be Free?, DOE ModSim Workshop, 2013,

Download File: modsim2013SWWilliams.pdf (pdf: 408 KB)

Samuel Williams, Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors, Supercomputing (SC), November 2012,

Download File: sc12-mg-talk.pdf (pdf: 1.9 MB)

S. Williams, et al., Extracting Ultra-Scale Lattice Boltzmann Performance via Hierarchical and Distributed Auto-Tuning, Supercomputing (SC), 2011,

Download File: sc11-lbmhd-talk.pptx (pptx: 933 KB)

S. Williams, et al., Stencil Computations on CPUs, Stanford Earth Sciences Algorithms and Architectures Initiative (SESAAI), 2011,

Download File: SESAAI11-stencilsonCPUs-talk.pptx (pptx: 2.3 MB)

S. Williams, et al., Performance Optimization of HPC Applications on Multi- and Manycore Processors, Workshop on Hybrid Technologies for NASA Applications, 4th Internation Conference on Space Mission Challenges for Information Technology, 2011,

Download File: smc11-lbnl-talk.pptx (pptx: 3 MB)

J. Demmel, K. Yelick, M. Anderson, G. Ballard, E. Carson, I. Dumitriu, L. Grigori, M. Hoemmen, O. Holtz, K. Keutzer, N. Knight, J. Langou, M. Mohiyuddin, O. Schwartz, E. Solomonik, S. Williams, Hua Xiang, Rethinking Algorithms for Future Architectures: Communication-Avoiding Algorithms, Hot Chips 23, 2011,

S. Williams, et al, Stencil Computations on CPUs, Society of Exploration Geophysicists High-Performance Computing Workshop (SEG), July 2011,

Download File: SEG11-stencilsonCPUs-talk.pptx (pptx: 1.5 MB)

S. Williams, et al., Lattice Boltzmann Hybrid Auto-tuning on High-End Computational Platforms, Workshop on Programming Environments for Emerging Parallel Systems (PEEPS), 2010,

Download File: peeps10-lbmhd-talk.pdf (pdf: 1.2 MB)
Download File: peeps10-lbmhd-talk.pptx (pptx: 1.3 MB)

S. Williams, et al., A Generalized Framework for Auto-tuning Stencil Computations, Cray User Group (CUG), 2009,

Download File: cug09-ast-talk.pdf (pdf: 835 KB)
Download File: cug09-ast-talk.pptx (pptx: 814 KB)

S. Williams, et al., Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4, Cray User Group (CUG), 2009,

Download File: cug09-hybridLBMHD-talk.pdf (pdf: 911 KB)
Download File: cug09-hybridLBMHD-talk.pptx (pptx: 981 KB)

Kamesh Madduri, Williams, Ethier, Oliker, Shalf, Strohmaier, Katherine A. Yelick, Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2009,

Download File: siampp10-gtc-talk.pdf (pdf: 2.7 MB)
Download File: siampp10-gtc-talk.pptx (pptx: 1.3 MB)

S. Williams, Auto-tuning Performance on Multicore Computers, Ph.D. Thesis Dissertation Talk, University of California at Berkeley, 2008,

Download File: SWWilliams-Thesis-Talk.pdf (pdf: 9.8 MB)
Download File: SWWilliams-Thesis-Talk.ppt (ppt: 5 MB)

S. Williams, et al., The Roofline Model: A Pedagogical Tool for Auto-tuning Kernels on Multicore Architectures, Hot Chips 20, August 10, 2008,

Download File: hotchips08-roofline-talk.pdf (pdf: 8 MB)

S. Williams, et al., A Vision for Integrating Performance Counters into the Roofline model, UPCRC PMU Workshop (Performance Counters), 2008,

Download File: pmu08-roofline-talk.pdf (pdf: 2.9 MB)
Download File: pmu08-roofline-talk.ppt (ppt: 2 MB)

S. Williams, et al., PERI: Auto-tuning Memory Intensive Kernels for Multicore, SciDAC PI Meeting, 2008,

Download File: scidac08-peri-talk.pdf (pdf: 9.5 MB)
Download File: scidac08-peri-talk.ppt (ppt: 5.5 MB)

S. Williams, J. Carter, L. Oliker, J. Shalf, K. Yelick, Lattice Boltzmann simulation optimization on leading multicore platforms, IEEE International Symposium on Parallel & Distributed Processing (IPDPS)., Pages: 1-14 2008,

Download File: ipdps08-lbmhd-talk.pdf (pdf: 10 MB)
Download File: ipdps08-lbmhd-talk.ppt (ppt: 2.6 MB)

S. Williams, et al., Autotuning Sparse and Structured Grid Kernels, Parlab Winter Retreat, 2008,

Download File: parlab08-spmvstructured-talk.pdf (pdf: 8.1 MB)
Download File: parlab08-spmvstructured-talk.ppt (ppt: 2.9 MB)

S. Williams, et al., Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms, DOE/DOD Workshop on Emerging High-Performance Architectures and Applications, 2007,

Download File: hpa07-spmv-talk.pdf (pdf: 7.9 MB)
Download File: hpa07-spmv-talk.ppt (ppt: 2.7 MB)

S. Williams, et al., Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms, Supercomputing (SC), 2007,

Download File: sc07-spmv-talk.pdf (pdf: 6.4 MB)
Download File: sc07-spmv-talk.ppt (ppt: 2.5 MB)

S. Williams, et al., Tuning Sparse Matrix Vector Multiplication for multi-core processors, Center for Scalable Application Development Software (CScADS), 2007,

Download File: cscads07-spmv-talk.pdf (pdf: 1.4 MB)
Download File: cscads07-spmv-talk.ppt (ppt: 754 KB)

S. Williams, et al., Tuning Sparse Matrix Vector Multiplication for multi-core SMPs, Parlab Seminar, 2007,

Download File: parlab07-spmv-talk.pdf (pdf: 1.2 MB)
Download File: parlab07-spmv-talk.ppt (ppt: 1.6 MB)

S. Williams, et al., Structured Grids and Sparse Matrix Vector Multiplication on the Cell Processor, Global Signal Processing Expo (GSPx), 2006,

Download File: gspx06-cell-talk.pdf (pdf: 4.8 MB)
Download File: gspx06-cell-talk.ppt (ppt: 844 KB)

S. Williams, et al., 3D Lattice Boltzmann Magneto-hydrodynamics (LBMHD3D), UTK Summit on Software and Algorithms for the Cell Processor, 2006,

Download File: utk06-lbmhd-talk.pdf (pdf: 3.7 MB)
Download File: utk06-lbmhd-talk.ppt (ppt: 784 KB)

S. Williams, et al., The Potential of the Cell Processor for Scientific Computing, LBL Scientific Computing Seminar, 2006,

Download File: lbl06-cell-talk.pdf (pdf: 4.8 MB)

Samuel Williams, Shalf, Oliker, Kamil, Husbands, Katherine A. Yelick, The potential of the cell processor for scientific computing, Conf. Computing Frontiers, Pages: 9-20 2006,

Download File: edge06-handout.pdf (pdf: 270 KB)

S Williams, J Shalf, L Oliker, S Kamil, P Husbands, K Yelick, The potential of the cell processor for scientific computing, Proceedings of the 3rd Conference on Computing Frontiers 2006, CF 06, Pages: 9--20 2006, doi: 10.1145/1128022.1128027

Download File: transmeta06-cell-talk.ppt (ppt: 896 KB)

C. Kozyrakis, J. Gebis, D. Martin, S. Williams, I. Mavroidis, S. Pope, D. Jones, D. Patterson, K. Yelick, Vector IRAM: A media-oriented vector processor with embedded DRAM, Hot Chips 12, 2000,

Download File: hotchips00-viram-talk.pdf (pdf: 57 KB)

Reports

Esmond Ng, Katherine J. Evans, Peter Caldwell, Forrest M. Hoffman, Charles Jackson, Kerstin Van Dam, Ruby Leung, Daniel F. Martin, George Ostrouchov, Raymond Tuminaro, Paul Ullrich, Stefan Wild, Samuel Williams, "Advances in Cross-Cutting Ideas for Computational Climate Science (AXICCS)", January 2017, doi: 10.2172/1341564

Download File: AXICCS-Report.pdf (pdf: 4 MB)

Khaled Z. Ibrahim, Evgeny Epifanovsky, Samuel Williams, Anna I. Krylov, "Cross-scale Efficient Tensor Contractions for Coupled Cluster Computations Through Multiple Programming Model Backends (tech report version)", LBNL. - Report Number: LBNL-1005853, July 1, 2016, LBNL 1005853, doi: 10.2172/1274416

Hongzhang Shan, Samuel Williams, Wibe de Jong, Leonid Oliker, "Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture", LBNL Technical Report, October 2014, LBNL 6806E,

Download File: rpt83549.PDF (PDF: 615 KB)

Mark F. Adams, Jed Brown, John Shalf, Brian Van Straalen, Erich Strohmaier, Samuel Williams, "HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems", LBNL Technical Report, 2014, LBNL 6630E,

Download File: hpgmg.pdf (pdf: 183 KB)

Abhinav Sarje, Samuel Williams, David H. Bailey, "MPQC: Performance analysis and optimization", LBNL Technical Report, February 2013, LBNL 6076E,

Samuel Williams, Dhiraj D. Kalamkar, Amik Singh, Anand M. Deshpande, Brian Van Straalen, Mikhail Smelyanskiy,
Ann Almgren, Pradeep Dubey, John Shalf, Leonid Oliker, "Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark", December 2012, LBNL 6676E,

Download File: miniGMGLBNL-6676E.pdf (pdf: 906 KB)

J. Krueger, P. Micikevicius, S. Williams, "Optimization of Forward Wave Modeling on Contemporary HPC Architectures", LBNL Technical Report, 2012, LBNL 5751E,

A. Kaiser, S. Williams, K. Madduri, K. Ibrahim, D. Bailey, J. Demmel, E. Strohmaier, "TORCH Computational Reference Kernels: A Testbed for Computer Science Research", LBNL Technical Report, 2011, LBNL 4172E,

M. Christen, N. Keen, T. Ligocki, L. Oliker, J. Shalf, B. van Straalen, S. Williams, "Automatic Thread-Level Parallelization in the Chombo AMR Library", LBNL Technical Report, 2011, LBNL 5109E,

Samuel Webb Williams, Andrew Waterman, David A. Patterson, "Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures", EECS Tech Report UCB/EECS-2008-134, October 2008,

K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, K. Yelick, "The Landscape of Parallel Computing Research: A View from Berkeley", EECS Technical Report, December 2006,

S. Williams, J. Shalf, L. Oliker, P. Husbands, K. Yelick, "Dense and Sparse Matrix Operations on the Cell Processor", LBNL Technical Report, 2005,

S. Williams, "Verification of VIRAM1", Masters Report, University of California at Berkeley, 2003,

Thesis/Dissertations

Auto-tuning Performance on Multicore Computers, Samuel Williams, PhD, 2008,

Web Articles

"Accelerating Time-to-Solution for Computational Science and Engineering", J. Demmel, J. Dongarra, A. Fox, S. Williams, V. Volkov, K. Yelick, SciDAC Review, Number 15, December 2009,

Posters

Nan Ding, Samuel Williams, Sherry Li, Yang Liu, "Leveraging One-Sided Communication for Sparse Triangular Solvers", SciDAC19, July 18, 2019,

Download File: SciDAC19-Poster-SpTRSV-NanDing.pdf (pdf: 774 KB)

Samuel Williams, Charlene Yang, Khaled Ibrahim, Thorsten Kurth, Nan Ding, Jack Deslippe, Leonid Oliker, "Performance Analysis using the Roofline Model", SciDAC PI Meeting, July 2019,

Download File: SciDAC19-Poster-Roofline-SWWilliams.pdf (pdf: 4.9 MB)

Alex Druinsky, Brian Austin, Sherry Li, Osni Marques, Eric Roman, Samuel Williams, "A Roofline Performance Analysis of an Algebraic Multigrid Solver", Supercomputing (SC), November 2014,

B. Wang, S. Ethier, W. Tang, K. Ibrahim, K. Madduri, S. Williams, "Advances in gyrokinetic particle in cell simulation for fusion plasmas to Extreme scale", Supercomputing (SC), 2012,

A. Buluç, A. Fox, J. R. Gilbert, S. Kamil, A. Lugowski, L. Oliker, S. Williams, "High-performance analysis of filtered semantic graphs", PACT '12 Proceedings of the 21st international conference on Parallel architectures and compilation techniques (extended abstract), 2012, doi: 10.1145/2370816.2370897

A. Kaiser, S. Williams, K. Madduri, K. Ibrahim, D. Bailey, J. Demmel, E. Strohmaier, "A Principled Kernel Testbed for Hardware/Software Co-Design Research", Proceedings of the 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar), 2010,

Download File: hotpar10-dwarfs-poster.pdf (pdf: 679 KB)

S. Williams, et al, "Auto-tuning and the Roofline model", View From the Top: Craig Mundie (Ph.D student poster session), 2008,

Download File: swwilliams-mundie-poster.pdf (pdf: 866 KB)

S. Williams, J. Carter, J. Demmel, L. Oliker, D. Patterson, J. Shalf, K. Yelick, R. Vuduc, "Autotuning Scientific Kernels on Multicore Systems", ASCR PI Meeting, 2008,

Download File: ascrpi08-autotuning-poster.pdf (pdf: 2.2 MB)

K. Datta, S. Williams, V. Volkov, M. Murphy, "Autotuning Structured Grid Kernels", ParLab Summer Retreat, 2008,

Download File: parlab08-stencillbmhd-poster.pdf (pdf: 3.6 MB)

S. Zhou, D. Duffy, T. Clune, M. Suarez, S. Williams, M. Halem, "Impacts of the IBM Cell Processor on Supporting Climate Models", International Supercomputing Conference (ISC), 2008,

S. Williams, et. al, "The Roofline Model: A Pedagogical Tool for Program Analysis and Optimization", Parlab Summer Retreat, 2008,

Download File: parlab08-roofline-poster.pdf (pdf: 1.3 MB)

K. Datta, S. Williams, S. Kamil, "Autotuning Structured Grid Kernels", Parlab Winter Retreat, 2008,

Download File: parlab08-structured-poster.pdf (pdf: 1.8 MB)