Leonid Oliker

Lenny Oliker

Computer Senior Scientist, PAR Group Lead

Phone: +1 510 486 6625

Fax: +1 510 486 6900

Lenny Oliker is a senior scientist and group lead of the performance and algorithms group (PAR) in the computer science department. His research interests focus on performance optimization, evaluation, and modeling on leading high-end computing systems. Lenny has published over 150 peer-reviewed publications, including five best paper awards, and is the deputy of the recently formed SciDAC4 RAPIDS institute for computer science and data. He is the executive director of the ECP Exabiome project, which aims to develop the world’s fastest genome assembly and analysis algorithms and parallel implementations. Other research activities include the Roofline methodology, which has widely adopted as an effective performance modeling tool within the HPC community. Lenny is also interested in optimization and evaluation of scientific computations, and has participated in studies in the areas of fusion, genomics, climate, cosmology, and materials science.

Journal Articles

Zhe Bai, Xishuo Wei, William Tang, Leonid Oliker, Zhihong Lin, Samuel Williams, "Transfer Learning Nonlinear Plasma Dynamic Transitions in Low Dimensional Embeddings via Deep Neural Networks", Machine Learning: Science and Technology, April 8, 2025, doi: 10.1088/2632-2153/adca83

Nan Ding, Pieter Maris, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, LeAnn Lindsey, Christopher Daley, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, Samuel Williams, "Evaluating the potential of disaggregated memory systems for HPC applications", Concurrency and Computation, Practice and Experience (CCPE), May 2024, doi: https://doi.org/10.1002/cpe.8147

Muaaz G Awan, Jack Deslippe, Aydin Buluc, Oguz Selvitopi, Steven Hofmeyr, Leonid Oliker, Katherine Yelick, "ADEPT: a domain independent sequence alignment strategy for gpu architectures", BMC Bioinformatics, September 2020, 21, doi: 10.1186/s12859-020-03720-1

Steven Hofmeyr, Rob Egan, Evangelos Georganas, Alex C Copeland, Robert Riley, Alicia Clum, Emiley Eloe-Fadrosh, Simon Roux, Eugene Goltsman, Aydin Buluc, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, "Terabase-scale metagenome coassembly with MetaHipMer", Scientific Reports, June 1, 2020, 10, doi: https://doi.org/10.1038/s41598-020-67416-5

Download File: s41598-020-67416-5.pdf (pdf: 1.4 MB)

Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample (multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer’s scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.

Katherine Yelick, Aydın Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi, Cristina Teodoropol, Leonid Oliker, "The parallelism motifs of genomic data analysis", Philosophical Transactions of The Royal Society A: Mathematical, Physical and Engineering Sciences, 2020,

Bei Wang, Stephane Ethier, William Tang, Khaled Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, "Modern Gyrokinetic Particle-in-cell Simulation of Fusion Plasmas on Top Supercomputers", International Journal of High-Performance Computing Applications (IJHPCA), May 2017, doi: https://doi.org/10.1177/1094342017712059

Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Phillip Colella, Mary Hall, "Compiler-Based Code Generation and Autotuning for Geometric Multigrid on GPU-Accelerated Supercomputers", Parallel Computing (PARCO), April 2017, doi: 10.1016/j.parco.2017.04.002

Aydin Buluc, John Gilbert, Leonid Oliker, "Special Issue: Graph Analysis for Scientific Discovery", Parallel Computing Journal Special Issue Editors, August 1, 2015,

J. Chapman, M. Mascher, A. Buluç, K. Barry, E. Georganas, A. Session, V. Strnadova, J. Jenkins, S. Sehgal, L. Oliker, J Schmutz, K. Yelick, U. Scholz, R. Waugh, J. Poland, G. Muehlbauer, N. Stein, D. Rokhsar, "A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome", Genome biology, 2015,

Adam Lugowski, Shoaib Kamil, Aydın Buluç, Samuel Williams, Erika Duriakova, Leonid Oliker, Armando Fox, John R. Gilbert,, "Parallel processing of filtered queries in attributed semantic graphs", Journal of Parallel and Distributed Computing (JPDC), September 2014, doi: 10.1016/j.jpdc.2014.08.010

L. Oliker and R. Vuduc, "Introduction for Special Issue on Autotuning", International Journal of High Performance Computing Applications (IJHPCA), 2013,

Khaled Z Ibrahim, Kamesh Madduri, Samuel Williams, Bei Wang, Stephane Ethier, Leonid Oliker, "Analysis and optimization of gyrokinetic toroidal simulations on homogeneous and heterogeneous platforms", International Journal of High Performance Computing Applications (IJHPCA), July 2013, doi: 10.1177/1094342013492446

K Madduri, J Su, S Williams, L Oliker, S Ethier, K Yelick, "Optimization of parallel particle-to-grid interpolation on leading multicore platforms", IEEE Transactions on Parallel and Distributed Systems, January 1, 2012, 23:1915--1922, doi: 10.1109/TPDS.2012.28

M. Wehner, L. Oliker, J. Shalf, D. Donofrio, L. Drummond, et al., "Hardware/Software Co-design of Global Cloud System Resolving Models", Journal of Advances in Modeling Earth Systems (JAMES), 2011, 3, M1000:22, doi: 10.1029/2011MS000073

Download File: james11-climate.pdf (pdf: 1.7 MB)

"Emerging Programming Paradigms for Large-Scale Scientific Computing", Guest editors, Parallel Computing special issue,'Emerging Programming Paradigms for Large-Scale Scientific Computing", 2011,

Download File: parco11-guestintro.pdf (pdf: 82 KB)

Kamesh Madduri, Eun-Jin Im, Khaled Z. Ibrahim, Samuel Williams, Stephane Ethier, Leonid Oliker, "Gyrokinetic Particle-in-cell Optimization on Emerging Multi- and Manycore Platforms", Parallel Computing (PARCO), January 2011, 37:501 - 520, doi: 10.1016/j.parco.2011.02.001

Download File: parco11-gtc.pdf (pdf: 2 MB)

Shoaib Kamil, Oliker, Pinar, John Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications", IEEE Transactions on Parallel and Distributed Systems, Volume (TPDS), January 1, 2010, 21:188-202,

Download File: tpds09-hfast.pdf (pdf: 8.3 MB)

M. Wehner, L. Oliker., and J. Shalf, "Low Power Supercomputers", IEEE Spectrum, October 2009,

High-performance computing for such things as climate modeling is not going to advance at anything like the pace it has during the last two decades unless we apply fundamentally new ideas. Here we describe one possible approach. Rather than constructing supercomputers from the kinds of microprocessors found in fast desktop computers or servers, we propose adopting designs and design principles drawn, oddly enough, from the portable-electronics marketplace.

David Donofrio, Oliker, Shalf, F. Wehner, Rowen, Krueger, Kamil, Marghoob Mohiyuddin, "Energy-Efficient Computing for Extreme-Scale Science", IEEE Computer, January 2009, 42:62-71, doi: 10.1109/MC.2009.35

S. Kamil, L. Oliker, A. Pinar, J. Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications", IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009,

Download File: TPDS09-comm.pdf (pdf: 8.3 MB)

K Datta, S Kamill, S Williams, L Oliker, J Shalf, K Yelick, "Optimization and performance modeling of stencil computations on modern microprocessors", SIAM Review, 2009, 51:129--159, doi: 10.1137/070693199

Download File: sirev09-stencil.pdf (pdf: 2.8 MB)

R. Biswas, J. Vetter, L. Oliker, "Revolutionary Technologies for Acceleration of Emerging Petascale Applications", Guest Editors, Parallel Computing Journal, 2009,

Download File: parco09-abstract.pdf (pdf: 49 KB)

S Williams, J Carter, L Oliker, J Shalf, K Yelick, "Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms", Journal of Parallel and Distributed Computing, 2009, 69:762--777, doi: 10.1016/j.jpdc.2009.04.002

Download File: jpdc09-lbmhd.pdf (pdf: 1.1 MB)

J. Borrill, L. Oliker, J. Shalf, H. Shan, A. Uselton, "HPC global file system performance analysis using a scientific-application derived benchmark", Parallel Computing, 2009, 35:358-373, doi: 10.1016/j.parco.2009.02.002

Download File: parco09-MADbench.pdf (pdf: 4.4 MB)

S. Kamil, L. Oliker, A. Pinar, J. Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications\", IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009,

S. Williams, K. Datta, J. Carter, L. Oliker, J. Shalf, K. Yelick, D. Bailey, "PERI: Auto-tuning Memory Intensive Kernels for Multicore", SciDAC PI Meeting, Journal of Physics: Conference Series, 125 012038, July 2008, doi: 10.1088/1742-6596/125/1/012038

Download File: jpconf8125012089.pdf (pdf: 874 KB)

M. Wehner, L. Oliker, J. Shalf, "Performance Characterization of the World's Most Powerful Supercomputers", Internation Journal of High Performance Computing Applications (IJHPCA), April 2008,

Download File: IJHPCA08Abstract.pdf (pdf: 64 KB)

Michael F. Wehner, L. Oliker, John Shalf, "Towards Ultra-High Resolution Models of Climate and Weather", Internation Journal of High Performance Computing Applications (IJHPCA), January 2008, 22:149-165,

Download File: IJHPCA08-climate.pdf (pdf: 580 KB)

S. Ethier, W. M. Tang, R. Walkup, L. Oliker, "Large-Scale Gyrokenetic particle simulation of Microturbulence in Magnetically Confined Fusion Plasmas", IBM Journal of Research and Development, 2008,

Download File: IBMJRD08-fusion.pdf (pdf: 365 KB)

L. Oliker, A. Canning, J. Carter, J. Shalf, S. Ethier, "Scientific application performance on leading scalar and vector supercomputering platforms", International Journal of High Performance Computing Applications, 2008, 22:5-20, doi: 10.1177/1094342006085020

S Williams, J Shalf, L Oliker, S Kamil, P Husbands, K Yelick, "Scientific computing kernels on the cell processor", International Journal of Parallel Programming, January 2007, 35:263--298, doi: 10.1007/s10766-007-0034-5

Download File: ijpp07-cell.pdf (pdf: 1000 KB)

S Williams, L Oliker, R Vuduc, J Shalf, K Yelick, J Demmel, "Optimization of sparse matrix-vector multiplication on emerging multicore platforms", Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC 07, 2007, doi: 10.1145/1362622.1362674

Download File: parco08-spmv.pdf (pdf: 1.5 MB)

L. Oliker, J. Carter, M. Wehner, A. Canning, S. Ethier, A. Mirin, G. Bala, D. Parks, P. Worley, S. Kitawaki, Y. Tsuda, "Scientific Application Performance on Leading Scalar and Vector Supercomputing Platforms", International Journal of High Performance Computing Applications (IJHPCA), 2006,

Download File: IJHPCA06-eval.pdf (pdf: 7.2 MB)

H. Simon, W. Kramer, W. Saphir, J. Shalf, D. Bailey, L. Oliker, et al, "Science Driven System Architecture: A New Process for Leadership Class Computing", Journal of the Earth Simulator, 2005,

Download File: JES2-Simon.pdf (pdf: 110 KB)

L. Oliker, A. Canning, J. Carter, J. Shalf, H. Simon, S. Ethier, D. Parks, S. Kitawaki, Y. Tsuda, T. Sato, "Performance of Ultra-Scale Applications on Leading Vector and Scalar HPC Platforms", Journal of the Earth Simulator, January 2005, 3,

Download File: JES3-Oliker.pdf (pdf: 101 KB)

L. Oliker, A. Canning, J. Carter, J. Shalf, D. Skinner, S. Ethier, R. Biswas, J. Djomehri, R. Van Der Wijngaart, "Performance evaluation of the SX-6 vector architecture for scientific computations", Concurrency Computation Practice and Experience, January 2005, 17:69-93, doi: 10.1002/cpe.884

Download File: CCPE05-sx6.pdf (pdf: 1 MB)

R. Biswas, L. Oliker, H. Shan, "Parallel Computing Strategies for Irregular Algorithms", Annual Review of Scalable Computing, April 2003,

Download File: ARSCsubmit.pdf (pdf: 690 KB)

Hongzhang Shan, Jaswinder P. Singh, Leonid Oliker, Rupak Biswas, "Message Passing and Shared Address Space Parallelism on an SMP Cluster", Parallel Computing Journal, Volume 29, Issue 2, February 2003,

Download File: pc03-smp.pdf (pdf: 307 KB)

L. Oliker. X. Li, P. Husbands, R. Biswas, "Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations", SIAM Review Journal, 2002,

Download File: sirev02-sparse.pdf (pdf: 475 KB)

H. Shan, J. P. Singh, L. Oliker, R. Biswas, "A Comparison of Three Programming Models for Adaptive Applications on the Origin2000", Journal of Parallel and Distributed Computing (JPDC), January 1, 2002, doi: doi:10.1006/jpdc.2001.1777

L. Oliker, R. Biswas, S. Das, D. Harvey, "Parallel Dynamic Load Balancing Strategies for Adaptive Irregular Applications", DRAMA special issue of Applied Mathematical Modeling Journal, 2000,

Download File: drama00.ps.gz (gz: 479 KB)

L. Oliker, R. Biswas, "Parallelization of a Dynamic Unstructured Algorithm using Three Leading Programming Paradigms", IEEE Transactions on Parallel and Distributed System (TPDS), 2000,

Download File: tpds00-programmingparadigms.pdf (pdf: 1014 KB)

L. Oliker, R. Biswas and H. Gabow, "Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing", Parallel Computing Journal, Special Issue on Graph Partitioning, pp 1583-1608, 2000,

Download File: parco99-mesh.pdf (pdf: 284 KB)

R. Biswas, L. Oliker, "Experiments with Repartitioning and Load Balancing Adaptive Meshes", Grid Generation and Adaptive Algorithms, IMA Volumes in Mathematics and its Applications, Vol. 113, Springer-Verlag, pp.89-112, 1999,

Download File: ima99-loadbalancing.pdf (pdf: 284 KB)

L. Oliker, R. Biswas, "PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes (JPDC version)", Journal of Parallel and Distributed Computing (JPDC), 1998,

Download File: jpdc98-plum.pdf (pdf: 337 KB)

R. Strawn, L. Oliker, R. Biswas, "New Computational Methods for the Prediction and Analysis of Helicopter Noise", Journal of Aircraft, 34, pp. 665-672, 1997,

Download File: aercon96.ps.gz (gz: 1.3 MB)

S. Chatterjee, J. Gilbert, L. Oliker, R. Schreiber, and T. Sheffler, "Algorithms for Automatic Alignment of Arrays", Journal of Parallel and Distributed Computing (JPDC), July 1996,

Download File: jpdc96.ps.gz (gz: 89 KB)

Conference Papers

Oscar Antepara, Zhengji Zhao, Brian Austin, Nan Ding, Leonid Oliker, Nicholas J. Wright, Samuel Williams, "Benchmark-driven Models for Energy Analysis and Attribution of GPU-accelerated Supercomputing", Supercomputing (SC), November 2025,

Abdullah Alperen, Nan Ding, Khaled Z. Ibrahim, Pieter Maris, Leonid Oliker, Chao Yang, Hasan Metin Aktulga, "Optimizing Nuclear Configuration Interaction Calculations on GPUs: A Comparative Performance Study of Programming Models", https://isc.app.swapcard.com/event/isc-high-performance-2025/planning/UGxhbm5pbmdfMjU4OTMyNg==, June 12, 2025,

Download File: ISC25_MFDn_opt.pdf (pdf: 7.7 MB)

Nan Ding, Oscar Antepara, Zhengji Zhao, Brian Austin, Leonid Oliker, Nicholas J. Wright, Samuel Williams, "Maximizing Power-Constrained Supercomputing Throughput", ISC'25, June 11, 2025,

Download File: ISC25_GPU_Power_Cap.pdf (pdf: 5.2 MB)

Nan Ding, Brian Austin, Yang Liu, Neil Mehta, Steven Farrell, Johannes P. Blaschke, Leonid Oliker, Hai Ah Nam, Nicholas J. Wright, Samuel Williams, "A Workflow Roofline Model for End-to-End Workflow Performance Analysis", Supercomputing (SC), November 17, 2024,

Download File: Workflow_roofline-6.pdf (pdf: 1.2 MB)

Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", RESDIS, https://resdis.github.io/ws/2022/sc/, November 18, 2022,

Download File: Methodology-for-Evaluating-the-Potential-of-Disaggregated-Memory-Systems.pdf (pdf: 5.1 MB)

Taylor Groves, Chris Daley, Rahulkumar Gayatri, Hai Ah Nam, Nan Ding, Lenny Oliker, Nicholas J. Wright, Samuel Williams, "A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures", PMBS, November 2022,

Download File: PMBS22_GPU_final.pdf (pdf: 719 KB)

K. Ibrahim, L. Oliker,, "Preprocessing Pipeline Optimization for Scientific Deep-Learning Workloads", IPDPS 22, June 3, 2022,

Download File: SciML-optimization-12.pdf (pdf: 17 MB)

Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,

Download File: pmbs21-DL-final.pdf (pdf: 632 KB)

Jonathan R Madsen, Muaaz G Awan, Hugo Brunie, Jack Deslippe, Rahul Gayatri, Leonid Oliker, Yunsong Wang, Charlene Yang, Samuel Williams, "TiMemory: Modular Performance Analysis for HPC", International Supercomputing Conference (ISC), June 2020, doi: 10.1007/978-3-030-50743-5_22

A Zeni, G Guidi, M Ellis, N Ding, MD Santambrogio, S Hofmeyr, A Buluc, L Oliker, K Yelick, "LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment", Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020, 2020, 462--471, doi: 10.1109/IPDPS47924.2020.00055

T Groves, B Brock, Y Chen, KZ Ibrahim, L Oliker, NJ Wright, S Williams, K Yelick, "Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches", Proceedings of PMBS 2020: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis, January 2020, 126--137, doi: 10.1109/PMBS51919.2020.00016

Download File: PMBS20-NVSHMEM-final.pdf (pdf: 659 KB)

G Guidi, O Selvitopi, M Ellis, L Oliker, K Yelick, A Buluc, "Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly", January 1, 2020,

Khaled Ibrahim, Samuel Williams, Leonid Oliker, "Performance Analysis of GPU Programming Models using the Roofline Scaling Trajectories", International Symposium on Benchmarking, Measuring and Optimizing (Bench), BEST PAPER AWARD, November 2019,

M Ellis, G Guidi, A Buluç, L Oliker, K Yelick, "DiBELLA: Distributed long read to long read alignment", ACM International Conference Proceeding Series, January 1, 2019, doi: 10.1145/3337821.3337919

Charlene Yang, Rahulkumar Gayatri, Thorsten Kurth, Protonu Basu, Zahra Ronaghi, Adedoyin Adetokunbo, Brian Friesen, Brandon Cook, Douglas Doerfler, Leonid Oliker, Jack Deslippe, Samuel Williams, "An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability", International Workshop on Performance, Portability and Productivity in HPC (P3HPC), November 2018,

Download File: p3hpc-roofline-final.pdf (pdf: 372 KB)

Khaled Ibrahim, Samuel Williams, Leonid Oliker, "Roofline Scaling Trajectories: A Method for Parallel Application and Architectural Performance Analysis", HPCS Special Session on High Performance Computing Benchmarking and Optimization (HPBench), July 2018,

Download File: hpbench18-roofline.pdf (pdf: 2.4 MB)

Tuomas Koskela, Zakhar Matveev, Charlene Yang, Adetokunbo Adedoyin, Roman Belenov, Philippe Thierry, Zhengji Zhao, Rahulkumar Gayatri, Hongzhang Shan, Leonid Oliker, Jack Deslippe, Ron Green, and Samuel Williams, "A Novel Multi-Level Integrated Roofline Model Approach for Performance Characterization", ISC, June 2018,

Download File: ISC18-RooflineAdvisor-final.pdf (pdf: 966 KB)

P Koanantakool, A Ali, A Azad, A Buluç, D Morozov, L Oliker, KA Yelick, S-Y Oh, "Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation.", Proceedings of Machine Learning Research, PMLR, 2018, 84:1376--1386,

Philip C. Roth, Hongzhang Shan, David Riegner, Nikolas Antolin, Sarat Sreepathi, Leonid Oliker, Samuel Williams, Shirley Moore, Wolfgang Windl, "Performance Analysis and Optimization of the RAMPAGE Metal Alloy Potential Generation Software", SIGPLAN International Workshop on Software Engineering for Parallel Systems (SEPS), October 2017,

Thorsten Kurth, William Arndt, Taylor Barnes, Brandon Cook, Jack Deslippe, Doug Doerfler, Brian Friesen, Yun (Helen) He, Tuomas Koskela, Mathieu Lobet, Tareq Malas, Leonid Oliker, Andrey Ovsyannikov, Samuel Williams, Woo-Sun Yang, and Zhengji Zhao, "Analyzing Performance of Selected NESAP Applications on the Cori HPC System", Intel Xeon Phi Users Group (IXPUG), June 2017,

Download File: ixpug17-nesap.pdf (pdf: 395 KB)

M Ellis, E Georganas, R Egan, S Hofmeyr, A Buluç, B Cook, L Oliker, K Yelick, "Performance characterization of de novo genome assembly on leading parallel systems", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, 10417 LN:79--91, doi: 10.1007/978-3-319-64203-1_6

E Georganas, M Ellis, R Egan, S Hofmeyr, A Buluç, B Cook, L Oliker, K Yelick, "MerBench: PGAS benchmarks for high performance genome assembly", Proceedings of PAW 2017: 2nd Annual PGAS Applications Workshop - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis, 2017, 2017-Jan:1--4, doi: 10.1145/3144779.3169109

William Tang, Bei Wang, Stephane Ethier, Grzegorz Kwasniewski, Torsten Hoefler, Khaled Z. Ibrahim4, Kamesh Madduri, Samuel Williams, Leonid Oliker, Carlos Rosales-Fernandez, Tim Williams, "Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide", Supercomputing, November 2016,

Download File: sc16-gtcp-submit.pdf (pdf: 971 KB)

Taylor Barnes, Brandon Cook, Jack Deslippe, Douglas Doerfler, Brian Friesen, Yun (Helen) He, Thorsten Kurth, Tuomas Koskela, Mathieu Lobet, Tareq Malas, Leonid Oliker, Andrey Ovsyannikov, Abhinav Sarje, Jean-Luc Vay, Henri Vincenti, Samuel Williams, Pierre Carrier, Nathan Wichmann, Marcus Wagner, Paul Kent, Christopher Kerr, John Dennis, "Evaluating and Optimizing the NERSC Workload on Knights Landing", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2016,

Download File: PMBS16-KNL.pdf (pdf: 789 KB)

Veronika Strnadova-Neeley, Aydin Buluc, John R. Gilbert, Leonid Oliker, Weimin Ouyang, "LiRa: A New Likelihood-Based Similarity Score for Collaborative Filtering", August 30, 2016,

Douglas Doerfer, Jack Deslippe, Samuel Williams, Leonid Oliker, Brandon Cook, Thorsten Kurth, Mathieu Lobet, Tareq Malas, Jean-Luc Vay, and Henri Vincenti, "Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor", Intel Xeon Phi User Group Workshop (IXPUG), June 2016,

Download File: ixpug16-roofline.pdf (pdf: 575 KB)

Abhinav Sarje, Douglas W. Jacobsen, Samuel W. Williams, Todd Ringler, Leonid Oliker, "Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers", Cray User Group (CUG), London, UK, May 2016,

P Koanantakool, A Azad, A Buluc, D Morozov, SY Oh, L Oliker, K Yelick, "Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication", Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016, January 2016, 842--853, doi: 10.1109/IPDPS.2016.117

Veronika Strnadová-Neeley, Aydın Buluç, Jarrod Chapman, John R. Gilbert, Joseph Gonzalez, Leonid Oliker, "Efficient Data Reduction for Large-Scale Genetic Mapping", ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), September 10, 2015,

Abhinav Sarje, Sukhyun Song, Douglas Jacobsen, Kevin Huck, Jeffrey Hollingsworth, Allen Malony, Samuel Williams, and Leonid Oliker, "Parallel Performance Optimizations on Unstructured Mesh-Based Simulations", Procedia Computer Science, 1877-0509, June 2015, 51:2016-2025, doi: 10.1016/j.procs.2015.05.466

This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intra- node data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.

Protonu Basu, Samuel Williams, Brian Van Straalen, Mary Hall, Leonid Oliker, Phillip Colella, "Compiler-Directed Transformation for Higher-Order Stencils", International Parallel and Distributed Processing Symposium (IPDPS), May 2015,

Download File: ipdps15CHiLL.pdf (pdf: 1.8 MB)

Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, Katherine Yelick, "MerAligner: A Fully Parallel Sequence Aligner", IEEE 29th International Parallel and Distributed Processing Symposium (IPDPS), May 2015, 561--570, doi: 10.1109/IPDPS.2015.96

Aligning a set of query sequences to a set of target sequences is an important task in bioinformatics. In this work we present merAligner, a highly parallel sequence aligner that implements a seed -- and -- extend algorithm and employs parallelism in all of its components. MerAligner relies on a high performance distributed hash table (seed index) and uses one-sided communication capabilities of the Unified Parallel C to facilitate a fine-grained parallelism. We leverage communication optimizations at the construction of the distributed hash table and software caching schemes to reduce communication during the aligning phase. Additionally, merAligner preprocesses the target sequences to extract properties enabling exact sequence matching with minimal communication. Finally, we efficiently parallelize the I/O intensive phases and implement an effective load balancing scheme. Results show that merAligner exhibits efficient scaling up to thousands of cores on a Cray XC30 supercomputer using real human and wheat genome data while significantly outperforming existing parallel alignment tools.

Hongzhang Shan, Samuel Williams, Wibe de Jong, Leonid Oliker, "Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture", Programming Models and Applications for Multicores and Manycores (PMAM), February 2015,

Download File: pmam15nwchem.pdf (pdf: 1.1 MB)

E Georganas, A Buluç, J Chapman, S Hofmeyr, C Aluru, R Egan, L Oliker, D Rokhsar, K Yelick, "HipMer: An extreme-scale de novo genome assembler", International Conference for High Performance Computing, Networking, Storage and Analysis, SC, January 1, 2015, 15-20-No, doi: 10.1145/2807591.2807664

Yu Jung Lo, Samuel Williams, Brian Van Straalen, Terry J. Ligocki, Matthew J. Cordery, Leonid Oliker, Mary W. Hall, "Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2014, doi: 10.1007/978-3-319-17248-4_7

Download File: PMBS14-Roofline.pdf (pdf: 340 KB)

Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, Katherine Yelick, "Parallel de Bruijn Graph Construction and Traversal for de Novo Genome Assembly", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), November 16, 2014, 437--448, doi: 10.1109/SC.2014.41

Download File: sc14genome.pdf (pdf: 719 KB)

Veronika Strnadova, Aydın Buluç, Joseph Gonzalez, Stefanie Jegelka, Jarrod Chapman, John Gilbert, Daniel Rokhsar, Leonid Oliker, "Efficient and accurate clustering for large-scale genetic mapping", IEEE International Conference on Bioinformatics and Biomedicine (BIBM'14), November 1, 2014,

Download File: bibm14.pdf (pdf: 764 KB)

Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Mary Hall, "Converting Stencils to Accumulations for Communication-Avoiding Optimization in Geometric Multigrid", Workshop on Stencil Computations (WOSC), October 2014,

Download File: wosc14chill.pdf (pdf: 973 KB)

W.A. de Jong, L. Lin, H. Shan, C. Yang and L. Oliker, "Towards modelling complex mesoscale molecular environments", International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE), 2014,

Protonu Basu, Anand Venkat, Mary Hall, Samuel Williams, Brian Van Straalen, Leonid Oliker, "Compiler generation and autotuning of communication-avoiding operators for geometric multigrid", 20th International Conference on High Performance Computing (HiPC), December 2013, 452--461,

Download File: hipc13chill.pdf (pdf: 989 KB)

Hongzhang Shan, Brian Austin, Wibe de Jong, Leonid Oliker, Nick Wright, Edoardo Apra, "Performance Tuning of Fock Matrix and Two Electron Integral Calculations for NWChem on Leading HPC Platforms", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2013, doi: 10.1007/978-3-319-10214-6_13

Bei Wang, Stephane Ethier, William Tang, Timothy Williams, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, "Kinetic Turbulence Simulations at Extreme Scale on Leadership-Class Systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2013, doi: 10.1145/2503210.2503258

Download File: sc13gtc.pdf (pdf: 1.3 MB)

P. Basu, A. Venkat, M. Hall, S. Williams, B. Van Straalen, L. Oliker, "Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid", Workshop on Stencil Computations (WOSC), 2013,

Aydın Buluç, Erika Duriakova, Armando Fox, John Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams, "High-Productivity and High-Performance Analysis of Filtered Semantic Graphs", International Parallel and Distributed Processing Symposium (IPDPS), 2013, doi: 10.1145/2370816.2370897

Download File: ipdps13-kdtsejits.pdf (pdf: 398 KB)

S. Williams, D. Kalamkar, A. Singh, A. Deshpande, B. Van Straalen, M. Smelyanskiy, A. Almgren, P. Dubey, J. Shalf, L. Oliker, "Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2012, doi: 10.1109/SC.2012.85

Download File: sc12-mg.pdf (pdf: 808 KB)
Download File: sc12mgtalk.pdf (pdf: 1.9 MB)

K. Kandalla, A. Buluç, H. Subramoni, K. Tomko, J. Vienne, L. Oliker, D. K. Panda, "Can network-offload based non-blocking neighborhood MPI collectives improve communication overheads of irregular graph algorithms?", International Workshop on Parallel Algorithms and Parallel Software (IWPAPS 2012), 2012,

P. Narayanan, A. Koniges, L. Oliker, R. Preissl, S. Williams, N. Wright, M. Umansky, X. Xu, S. Ethier, W. Wang, J. Candy, J. Cary, "Performance Characterization for Fusion Co-design Applications", Cray Users Group (CUG), May 2011,

Download File: cug11-fusion.pdf (pdf: 377 KB)

Aydın Buluç, Samuel Williams, Leonid Oliker, James Demmel, "Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication", IPDPS, IEEE, 2011, doi: https://doi.org/10.1109/IPDPS.2011.73

Download File: ipdps2011.pdf (pdf: 770 KB)

Kamesh Madduri, Khaled Ibrahim, Samuel Williams, Eun-Jin Im, Stephane Ethier, John Shalf, Leonid Oliker, "Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), January 2011, 23, doi: 10.1145/2063384.2063415

Download File: sc11-gtc.pdf (pdf: 1.3 MB)

Samuel Williams, Oliker, Carter, John Shalf, "Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), New York, NY, USA, ACM, January 2011, 55, doi: 10.1145/2063384.2063458

Download File: sc11-lbmhd.pdf (pdf: 666 KB)
Download File: sc11lbmhdtalk.pdf (pdf: 1.4 MB)

Jens Krueger, David Donofrio, John Shalf, Marghoob Mohiyuddin, Samuel Williams, Leonid Oliker, Franz-Josef Pfreund, "Hardware/software co-design for energy-efficient seismic modeling", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), January 2011, 73, doi: 10.1145/2063384.2063482

Download File: sc11-greenwave.pdf (pdf: 614 KB)

R. Sudarsan, J. Borrill, C. Cantalupo, T. Kisner, K. Madduri, L. Oliker, Y. Zheng, H. Simon, "Cosmic microwave background map-making at the petascale and beyond", Proceedings of the International Conference on Supercomputing, 2011, 305-316, doi: 10.1145/1995896.1995944

Download File: ics11-madmap.pdf (pdf: 2.5 MB)

G. Hendry, J, Chan, S, Kamil, L. Oliker , J. Shalf, L. Carloni , K. Bergman, "Silicon Nanophotonic Network-On-Chip using TDM Arbitration", Hot Interconnects, August 2010,

Download File: hoti10-siphotonics.pdf (pdf: 552 KB)

Testing

S. Ethier, M. Adams, J. Carter, L. Oliker, "Petascale Parallelization of the Gyrokinetic Toroidal Code", VECPAR: High Performance Computing for Computational Science, June 2010,

Download File: vecpar10-gtc.pdf (pdf: 4.3 MB)

Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams, "An auto-tuning framework for parallel multicore stencil computations", International Parallel & Distributed Processing Symposium (IPDPS), January 1, 2010, 1-12, doi: 10.1109/IPDPS.2010.5470421

Download File: ipdps10-ast.pdf (pdf: 789 KB)

A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros, R. Vuduc, "Optimizing and Tuning the Fast Multipole Method for State-of-the-Art Multicore Architectures", International Parallel & Distributed Processing Symposium (IPDPS), 2010, doi: 10.1109/IPDPS.2010.5470415

Download File: ipdps10-fmm.pdf (pdf: 671 KB)

Andrew Uselton, Howison, J. Wright, Skinner, Keen, Shalf, L. Karavanic, Leonid Oliker, "Parallel I/O performance: From events to ensembles", International Parallel & Distributed Processing Symposium (IPDPS), 2010, 1-11,

Download File: ipdps10ipm.pdf (pdf: 1.7 MB)

J. Shalf, M. Wehner, L. Oliker, "The Challenge of Energy-Efficient HPC", SCIDAC Review, Fall, 2009,

Shoaib Kamil, Cy Chan, Samuel Williams, Leonid Oliker, John Shalf, Mark Howison, E. Wes Bethel, Prabhat, "A Generalized Framework for Auto-tuning Stencil Computations", BEST PAPER AWARD - Cray User Group Conference (CUG), Atlanta, GA, May 4, 2009, LBNL 2078E,

Download File: cug09-autotune.pdf (pdf: 354 KB)

Best Paper Award

S. Williams, J. Carter, L. Oliker, J. Shalf, K. Yelick, "Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4", Proceedings of the Cray User Group (CUG), Atlanta, GA, 2009,

Download File: cug09-lbmhd.pdf (pdf: 443 KB)

K Madduri, S Williams, S Ethier, L Oliker, J Shalf, E Strohmaier, K Yelick, "Memory-efficient optimization of gyrokinetic particle-to-grid interpolation for multicore processors", Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 09, January 2009, doi: 10.1145/1654059.1654108

Download File: sc09-gtc.pdf (pdf: 3 MB)

K. Datta, S. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, K. Yelick, "Auto-Tuning the 27-point Stencil for Multicore", Proceedings of Fourth International Workshop on Automatic Performance Tuning (iWAPT2009), January 2009,

Download File: iwapt09-27pt.pdf (pdf: 465 KB)

J Gebis, L Oliker, J Shalf, S Williams, K Yelick, "Improving memory subsystem performance using ViVA: Virtual vector architecture", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, 5455 LNC:146--158, doi: 10.1007/978-3-642-00454-4_16

Download File: arcs09-viva.pdf (pdf: 448 KB)

G. Hendry, S.A. Kamil, A. Biberman, J. Chan, B.G. Lee, M Mohiyuddin, A. Jain, K. Bergman, L.P. Carloni, J. Kubiatocics, L. Oliker, J. Shalf, "Analysis of Photonic Networks for Chip Multiprocessor Using Scientific Applications", International Symposium on Networks-on-Chip (NOCS), 2009,

Download File: nocs09-photonics.pdf (pdf: 1.2 MB)

Marghoob Mohiyuddin, Murphy, Oliker, Shalf, Wawrzynek, Samuel Williams, "A design methodology for domain-optimized power-efficient supercomputing", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2009, doi: 10.1145/1654059.1654072

Download File: sc09-cotuning.pdf (pdf: 912 KB)

K Datta, M Murphy, V Volkov, S Williams, J Carter, L Oliker, D Patterson, J Shalf, K Yelick, "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures", 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, January 2008, doi: 10.1109/SC.2008.5222004

Download File: sc08-stencil.pdf (pdf: 598 KB)

S Williams, J Carter, L Oliker, J Shalf, K Yelick, "Lattice Boltzmann simulation optimization on leading multicore platforms", IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM, 2008, doi: 10.1109/IPDPS.2008.4536295

Download File: ipdps08-lbmhd.pdf (pdf: 560 KB)

William T.C. Kramer, John M. Shalf, E. Wes Bethel, D. Agarwal, Michael Banda, John Hules, Juan C. Meza, Leonid Oliker, Horst Simon, David Skinner, Francesca Verdier, Howard Walter, Michael Wehner, and Katherine Yelick, "HPC in 2016: A View Point from NERSC", Proceedings of the Cray User Group Conference, Helsinki, Finland, 2008,

Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, James Demmel, "Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2007, doi: 10.1145/1362622.1362674

Download File: sc07-spmv.pdf (pdf: 438 KB)

J. Borrill, L. Oliker. J. Shalf, H. Shan, "Investigation Of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2007,

Download File: SC07-MADbench2.pdf (pdf: 581 KB)

Shoaib Kamil, Pinar, Gunter, Lijewski, Oliker, John Shalf, "Reconfigurable hybrid interconnection for static and dynamic scientific applications", Conf. Computing Frontiers, 2007, 183-194, LBNL 60060,

Download File: CF07.pdf (pdf: 9.5 MB)

L. Oliker, A. Canning, J. Carter, C. Iancu, M. Lijewski, S. Kamil, J. Shalf, H. Shan, E. Strohmaier, S. Ethier, T. Goodale, "Scientific application performance on candidate petascale platforms", Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM, 2007, doi: 10.1109/IPDPS.2007.370259

Download File: ipdps07-petascale.pdf (pdf: 4.4 MB)

J. Carter, L. Oliker, J. Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", Extended Version: Lecture Notes in Computer Science, 2007,

Download File: LNCS07.pdf (pdf: 445 KB)

S. Williams, J. Shalf, L. Oliker, P. Husbands, S. Kamil, K. Yelick, "The Potential of the Cell Processor for Scientific Computing", ACM International Conference on Computing Frontiers, 2006, doi: 10.1145/1128022.1128027

Download File: cf06-cell-potential.pdf (pdf: 213 KB)

Michael Welcome, Charles Rendleman, Leonid Okiker, Rupak Biswas, "Performance Characteristics of an Adaptive Mesh Refinement Calculation on Scalar and Vector Platforms", ACM International Conference on Computing Frontiers,, Italy, May 2006, LBNL 59238, doi: 10.1145/1128022.1128074

Download File: CF06-amr.pdf (pdf: 1.4 MB)

Adaptive mesh refinement (AMR) is a powerful technique that reduces the resources necessary to solve otherwise intractable problems in computational science. The AMR strategy solves the problem on a relatively coarse grid, and dynamically refines it in regions requiring higher resolution. However, AMR codes tend to be far more complicated than their uniform grid counterparts due to the software infrastructure necessary to dynamically manage the hierarchical grid framework. Despite this complexity, it is generally believed that future multi-scale applications will increasingly rely on adaptive methods to study problems at unprecedented scale and resolution. Recently, a new generation of parallel-vector architectures have become available that promise to achieve extremely high sustained performance for a wide range of applications, and are the foundation of many leadership-class computing systems worldwide. It is therefore imperative to understand the tradeoffs between conventional scalar and parallel-vector platforms for solving AMR-based calculations. In this paper, we examine the LibraryHyperCLaw AMR framework to compare and contrast performance on the Cray X1E, IBM Power3 and Power5, and SGI Altix. To the best of our knowledge, this is the first work that investigates and characterizes the performance of an AMR calculation on modern parallel-vector systems.

S Kamil, K Datta, S Williams, L Oliker, J Shalf, K Yelick, "Implicit and explicit optimizations for stencil computations", Proceedings of the 2006 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2006, 2006, 51--60, doi: 10.1145/1178597.1178605

Download File: mspc06-stencil.pdf (pdf: 421 KB)

J. Carter, L. Oliker, J. Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", VECPAR, 2006,

Download File: vecpar06-vector.pdf (pdf: 410 KB)

J. Carter, L. Oliker, "Performance Evaluation of Lattice-Boltzmann Magnetohyrodynamics Simulations on Modern Parallel Vector Systems", Proceedings of the 2nd Teraflop Workshop. Lecture Notes in Computer Science (LNCS), Stuttgard, Germany, January 1, 2006,

Download File: LNCSHLRS06.pdf (pdf: 546 KB)

Jonathan Carter, Oliker, John Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", VECPAR, Springer Berlin/Heidelberg, 2006, 4395:490-503,

Download File: LNCS07-vector.pdf (pdf: 445 KB)

J. Carter, L. Oliker, J. Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", High Performance Computing for Computational Science., 2006,

Download File: vecpar06-vector.pdf (pdf: 410 KB)

Highest Ranked Conference Paper

L. Oliker, J. Carter, M. Wehner, A. Canning, S. Ethier, A. Mirin, G. Bala, D. Parks, P. Worley, S. Kitawaki, Y. Tsuda, "Leading computational methods on scalar and vector HEC platforms", Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC 05, 2005, 2005, doi: 10.1109/SC.2005.41

Download File: SC05eval.pdf (pdf: 8.4 MB)

John Shalf, Kamil, Oliker, David Skinner, "Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2005, 17,

Download File: sc05-communication.pdf (pdf: 6.9 MB)

J. Carter, M. Soe, L. Oliker, Y. Tsuda, G. Vahala, L. Vahala, A. Macnab, "Magnetohydrodynamic Turbulence Simulations on the Earth Simulator Using the Lattice Boltzmann Method", International Conference for High Performance Computing, Networking, Storage and Analysis (SC) - Gordon Bell Finalist, Washington, DC, USA, IEEE Computer Society, 2005,

Download File: SC05-LBMHD-GordonBell.pdf (pdf: 794 KB)

S. Kamil, J. Shalf, L. Oliker, D. Skinner,, "Understanding Ultra-Scale Application Communication Requirements", IEEE International Symposium on Workload Characterization (IISWC), 2005,

Download File: IISWC05-communication.pdf (pdf: 4.3 MB)

S Kamil, P Husbands, L Oliker, J Shalf, K Yelick, "Impact of modern memory subsystems on cache optimizations for stencil computations", Proceedings of the 3rd 2005 ACM SIGPLAN Workshop on Memory Systems Performance, MSP 2005, 2005, 36--43, doi: 10.1145/1111583.1111589

Download File: msp05-stencil.pdf (pdf: 902 KB)

J. Borrill, J. Carter, L. Oliker, D. Skinner, R. Biswas, "Integrated performance monitoring of a cosmology application on leading HEC platforms", Proceedings of the International Conference on Parallel Processing, 2005, 2005:119-128, doi: 10.1109/ICPP.2005.47

Download File: icpp05-ipm.pdf (pdf: 2.9 MB)

L. Oliker, R. Biswas, J. Borrill, A. Canning, J. Carter, M.J. Djomehri, H. Shan, D. Skinner, "A performance evaluation of the cray X1 for scientific applications", Lecture Notes in Computer Science, 2005, 3402:51-65,

Horst Simon, William Kramer, William Saphir, John Shalf, David Bailey, Leonid Oliker, Michael Banda, C. William McCurdy, John Hules, Andrew Canning, Marc Day, Philip Colella, David Serafini, Michael Wehner, Peter Nugent, "Science-Driven System Architecture: A New Process for Leadership Class Computing", Journal of the Earth Simulator, Volume 2., 2005, LBNL 56545,

Download File: JES-SDSA.pdf (pdf: 110 KB)

J. Carter, J. Borrill, L. Oliker, "Performance characteristics of a cosmology package on leading HPC architectures", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Berlin/Heidelberg, 2004, 3296:176-188,

Download File: HIPC04final.pdf (pdf: 210 KB)

H. Shan, L. Oliker, R. Biswas, W. Smith, "Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration", International Conference on Advanced Computing and Communication: ADCOM, 2004,

Download File: ADCOM04final.pdf (pdf: 195 KB)

L. Oliker, A. Canning, J. Carter, J. Shalf, S. Ethier, "Scientific Computations on Modern Parallel Vector Systems", Proceedings of the ACM/IEEE SC 2004 Conference: Bridging Communities, 2004, doi: 10.1109/SC.2004.54

Download File: SC04-vector.pdf (pdf: 1.9 MB)

L. Oliker, J. Borril, A. Canning, J. Carter, H. Shan, D. Skinner, R. Biswas, J. Djomheri, "A Performance Evaluation of the Cray X1 for Scientific Applications", VECPAR'04: 6th International Meeting on High Performance Computing for Computational Science, 2004,

Download File: vecpar04-x1.pdf (pdf: 224 KB)

G Griem, L Oliker, J Shalf, K Yelick, "Identifying performance bottlenecks on modern microarchitectures using an adaptable probe", Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM), 2004, 18:3505--3512,

Download File: pmeo2004.pdf (pdf: 419 KB)

H. Shan, E. Strohmaier, L. Oliker, "Optimizing Performance of Superscalar Codes for a Single Cray X1 MSP", Proceedings of the 46th Cray User Group Conference:CUG, 2004,

P. A. Agarwal,R. A. Alexander , E. Apra, S. Balay, A. S. Bland, J. Colgan, E. F.D’Azevedo , J. J. Dongarra , T. H. Dunigan, Jr. , M. R. Fahey, R. A. Fahey, A. Geist, M. Gordon, R. J. Harrison , D. Kaushik, M. Krishnakumar , P. Luszczek , A. Mezzacappa, J. A. Nichols , J. Nieplocha, L. Oliker, T. Packwood , M.S. Pindzola, T. C. Schulthess, J. S. Vetter, J. B. White, III , T. L. Windus , P. H. Worley, T. Zacharia, "Cray X1 Evaluation Status Report", Proceedings of the 46th Cray User Group Conference:CUG, 2004,

L. Oliker, G. Griem, "Transitive Closure on the Imagine Stream Processor", Fifth Workshop on Media and Stream Processors (MSP5), 2003,

Download File: msp52003final.pdf (pdf: 117 KB)

L. Oliker, A. Canning, J. Carter, J. Shalf, D. Skinner, S. Ethier, R. Biswas, J. Djomehri, R. Van Der Wijngaart, "Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations", Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003, 2003, doi: 10.1145/1048935.1050213

Download File: SC03-SX6.pdf (pdf: 1 MB)

H. Shan, L. Oliker, R.Biswas, "Job Superscheduler Architecture and Performance in Computational Grid Environments", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2003,

Download File: SC03-gridsched.pdf (pdf: 113 KB)

S. Chatterji, J. Duell, M. Narayanan, L. Oliker, "Performance Evaluation of Two Emerging Media Processors: VIRAM and Imagine", Workshop on Parallel and Distributed Image Processing, Video Processing, and Multimedia (PDIVM), 2003,

Download File: PDIVMfinal.pdf (pdf: 612 KB)

BR Gaeke, P Husbands, XS Li, L Oliker, KA Yelick, R Biswas, "Memory-intensive benchmarks: IRAM vs. cache-based machines", Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002, 2002, 290--296, doi: 10.1109/IPDPS.2002.1015506

Download File: ipdps02-iram.pdf (pdf: 91 KB)

H. Shan, J. Singh, L. Oliker, R. Biswas, "Message Passing vs. Shared Address Space on a Cluster of SMPs", International Parallel & Distributed Processing Symposium (IPDPS), 2001,

Download File: ipdps01.pdf (pdf: 194 KB)

L. Oliker, X. Li, P. Husbands, R. Biswas, "Ordering Schemes for Sparse Matrices using Modern Programming Paradigms", The IASTED International Conference on Applied Informatics (AI), 2001,

Download File: ai01.pdf (pdf: 163 KB)

H. Shan, J. Singh, L. Oliker, R. Biswas, "A Comparison of Three Programming Models for Adaptive Applications on the Origin2000", International Conference for High Performance Computing, Networking, Storage and Analysis (SC) - BEST STUDENT PAPER AWARD, 2000,

Download File: sc00-programmingparadigms.pdf (pdf: 209 KB)

L. Oliker, A. Wong, W. Kramer, T. Kaltz, D. Bailey, "ESP: A System Utilization Benchmark", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2000,

Download File: sc00-esp.pdf (pdf: 49 KB)

L. Oliker, X. Li. G. Heber, R. Biswas, "Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms", 13th International Conference on Parallel and Distributed Computing Systems, 2000,

Download File: pdcs00-pcg.pdf (pdf: 167 KB)

L. Oliker, A. Wong, W. Kramer, T. Kaltz, D. Bailey, "System Utilization Benchmark on the Cray T3E and IBM SP", Fifth Workshop on Job Scheduling, 2000,

Download File: JSSPP00-esp.pdf (pdf: 55 KB)

L. Oliker, X. Li, G. Heber, R. Biswas, "Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems", Seventh International Workshop on solving Irregularly Structured Problems in Parallel, 2000,

Download File: irr00awk.pdf (pdf: 130 KB)

L. Oliker, R. Biswas, "Multithreaded Implementation of a Dynamic Irregular Application", 5th NASA Computational Aerosciences Workshop, 2000,

L. Oliker, R. Biswas, "Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms", International Conference for High Performance Computing, Networking, Storage and Analysis (SC) - BEST PAPER AWARD, 1999,

Download File: sc99.pdf (pdf: 165 KB)
Download File: sc99.ppt (ppt: 451 KB)

R. Biswas, S.K. Das, and D.J. Harvey, L. Oliker, "Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications", 13th International Parallel Processing Symposium, 1999,

Download File: ipps99-loadbalancing.pdf (pdf: 106 KB)

K. Schloegel, G. Karypis, V. Kumar, R. Biswas, L. Oliker, "A Performance Study of Diffusive vs. Remapped Load-Balancing Schemes", 11th International Conference on Parallel and Distributed Computer Systems, pp. 59-66, 1998,

Download File: icpdc98-loadbalancing.pdf (pdf: 172 KB)

L. Oliker, R. Biswas, H.N. Gabow, "Performance Analysis and Portability of the PLUM Load Balancing System", Euro-Par'98 Parallel Processing, Lecture Notes in Computer Science, Vol. 1470, Springer-Verlag, pp. 307-317, 1998,

Download File: europar98-plum.pdf (pdf: 183 KB)

L. Oliker, R. Biswas, "Dynamic Domain Decomposition for Large-Scale Adaptive Calculations", 10th International Conference on Domain Decomposition Methods, 1997,

L. Oliker, R. Biswas, "Load Balancing Unstructured Adaptive Grid Computations", 4th U.S. National Congress on Computaional Mechanics, 1997,

R. Biswas, L. Oliker, "Load Balancing Sequences of Unstructured Adaptive Grids", 4th International Conference on High Performance Computing (HiPC), 1997,

Download File: hipc97-loadbalancing.pdf (pdf: 215 KB)

L.Oliker, R. Biswas, "Efficient Load Balancing and Data Remapping for Adaptive Grid Calculations", 9th ACM Symposium on Parallel Algorithms and Architectures (SPAA), 1997,

Download File: spaa97-loadbalancing.pdf (pdf: 227 KB)

R. Biswas, L. Oliker, A. Sohn, "Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 1996,

Download File: sc96-unstructured.pdf (pdf: 197 KB)

L. Oliker, R. Biswas, S. Strawn, "Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2", Parallel Algorithms for Irregularly Structured Problems, Lecture notes in Computer Science, Vol. 1117, Springer-Verlag, pp. 35-47, 1996,

Download File: irr96-unstructured.pdf (pdf: 155 KB)

L. Oliker, R. Biswas, S. Strawn, "Parallel Mesh Adaption with Global Load Balancing on the SP2", NASA Computational Aerosciences Workshop, 1996,

A.M. Wissink, A.S. Lyrintzis, R.C. Strawn, L. Oliker, R. Biswas, "Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers", 34th AIAA Aerospace Sciences Meeting, Paper 96-0153, 1996,

Book Chapters

E. Georganas, S. Hofmeyr, L. Oliker, R. Egan, D. Rokhsar, A. Buluc, K. Yelick, "Extreme-scale de novo genome assembly", Exascale Scientific Applications: Scalability and Performance Portability, edited by T.P. Straatsma, K. B. Antypas, T. J. Williams, ( November 13, 2017) doi: 10.1201/b21930

S. Williams, N. Bell, J. W. Choi, M. Garland, L. Oliker, R. Vuduc, "Sparse Matrix-Vector Multiplication on Multicore and Accelerators", chapter in Scientific Computing with Multicore and Accelerators, edited by Jack Dongarra, David A. Bader, Jakub Kurzak, ( 2010)

L. Oliker, J. Carter, V. Beckner, J. Bell, H. Wasserman, M. Adams, S. Ethier, E. Schnetter, "Large-Scale Numerical Simulations on High-End Computational Platforms", Chapman & Hall/CRC Computational Science, edited by D. H. Bailey, R. F. Lucas, S. W. Williams, (CRC Press: 2010) Pages: 123

S Williams, K Datta, L Oliker, J Carter, J Shalf, K Yelick, "Auto-Tuning Memory-Intensive Kernels for Multicore", Chapman \& Hall/CRC Computational Science, (CRC Press: 2010) Pages: 273--296 doi: 10.1201/b10509-14

K Datta, S Williams, V Volkov, J Carter, L Oliker, J Shalf, K Yelick, "Auto-tuning stencil computations on multicore and accelerators", Scientific Computing with Multicore and Accelerators, ( 2010) Pages: 219--254 doi: 10.1201/b10376

John Shalf, Donofrio, Rowen, Oliker, Michael F. Wehner, "Green Flash: Climate Machine (LBNL)", Encyclopedia of Parallel Computing, (Springer: 2010) Pages: 809-819

Green Flash is a research project focused on an application-driven manycore chip design that leverages commodity-embedded circuit designs and hardware/software codesign processes to create a highly programmable and energy-efficient HPC design. The project demonstrates how a multidisciplinary hardware/software codesign process that facilitates close interactions between applications scientists, computer scientists, and hardware engineers can be used to develop a system tailored for the requirements of scientific computing.

L. Oliker, A. Canning, J. Carter, C. Iancu, M. Lijewski, S. Kamil, J. Shalf, H. Shan, E. Strohmaier, S. Ethier, T. Goodale, "Performance Characteristics of Potential Petascale Scientific Applications", Petascale Computing: Algorithms and Applications. Chapman & Hall/CRC Computational Science Series (Hardcover), edited by David A. Bader, ( 2007)

Chapter

L. Oliker, R. Biswas, R. Van der Wijngaart, D. Baily, A. Snavely, "Performance Evaluation and Modeling of Ultra-Scale Systems", Parallel Processing for Scientific Computing, edited by Michael A. Heroux, Padma Raghavan, and Horst D. Simon, (SIAM: 2007) doi: 0.1137/1.9780898718133.ch5

Download File: SIAMPPchapter06.pdf (pdf: 158 KB)

J. Shalf, L. Oliker, M. Lijewski, S. Kamil, J. Carter, A. Canning, S. Ethier, "Performance Characteristics of Potential Petascale Scientific Applications", Chapman & Hall/CRC Computational Science, (CRC Press: 2007) Pages: 1

Download File: CactusGRB.pdf (pdf: 712 KB)

Book Chapter

Presentation/Talks

Kamesh Madduri, Williams, Ethier, Oliker, Shalf, Strohmaier, Katherine A. Yelick, Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2009,

Download File: siampp10-gtc-talk.pdf (pdf: 2.7 MB)
Download File: siampp10-gtc-talk.pptx (pptx: 1.3 MB)

S. Williams, et al., The Roofline Model: A Pedagogical Tool for Auto-tuning Kernels on Multicore Architectures, Hot Chips 20, August 10, 2008,

Download File: hotchips08-roofline-talk.pdf (pdf: 8 MB)

M. Wehner, L. Oliker, J. Shalf, Ultra-Efficient Exascale Scientific Computing, 2008,

Download File: ASCAC08-final.ppt (ppt: 5.4 MB)

L. Oliker, J. Shalf, M. Wehner, Climate Modeling at the Petaflop Scale using Semi-Custom Computing, SIAM Conference on Computational Science and Engineering, 2007,

John Shalf, Shoaib Kamil, David Skinner, Leonid Oliker, Interconnect Requirements for HPC Applications, 2007,

Download File: IPMfinalBrocade.ppt (ppt: 13 MB)

Leonid Oliker, Julian Borrill, Hongzhang Shan, John Shalf, Investigation Of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark., 2007,

Download File: SC07-MadBench-talk.ppt (ppt: 2.7 MB)

L. Oliker, J. Carter, Leading Computational Methods on the Earth Simulator, SIAM Conference on Parallel Processing for Scientific Computing, 2006,

L. Oliker, J. Carter, Evaluation of Vector Architectures for Scientific Codes, SIAM Conference on Parallel Processing for Scientific Computing, 2004,

L. Oliker, M. Wehner, D. Parks, W.S. Wang, High Resolution Atmospheric General Circulation Model Simulations on Vector and Cache-based Architectures, SIAM Conference on Parallel Processing for Scientific Computing, 2004,

H. Shan, J. Singh, L. Oliker, R. Biswas, Design Strategies for Irregularly Adapting Parallel Applications, SIAM Conference on Parallel Processing, 2001,

Download File: siampp01abstacta.pdf (pdf: 1.6 MB)

L. Oliker, R. Biswas, P. Husbands, X. Li, Ordering Sparse Matrices for Cache-Based Systems, SIAM Conference on Parallel Processing, 2001,

Download File: siampp01abstactb.pdf (pdf: 2.1 MB)

L. Oliker, R. Biswas, Multithreading for Dynamic Irregular Applications, First SIAM Conference on Computational Science and Engineering, 2000,

R. Biswas, L. Oliker, Load Balancing Unstructured Adaptive Grids for CFD Problems, 8th SIAM Conference on Parallel Processing for Scientific Computing, 1997,

Download File: siam97-loadbalancing.pdf (pdf: 182 KB)

Reports

Hongzhang Shan, Samuel Williams, Wibe de Jong, Leonid Oliker, "Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture", LBNL Technical Report, October 2014, LBNL 6806E,

Download File: rpt83549.PDF (PDF: 615 KB)

Samuel Williams, Dhiraj D. Kalamkar, Amik Singh, Anand M. Deshpande, Brian Van Straalen, Mikhail Smelyanskiy,
Ann Almgren, Pradeep Dubey, John Shalf, Leonid Oliker, "Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark", December 2012, LBNL 6676E,

Download File: miniGMGLBNL-6676E.pdf (pdf: 906 KB)

M. Christen, N. Keen, T. Ligocki, L. Oliker, J. Shalf, B. van Straalen, S. Williams, "Automatic Thread-Level Parallelization in the Chombo AMR Library", LBNL Technical Report, 2011, LBNL 5109E,

W. Kramer, J. Carter, D. Skinner, L. Oliker, P. Husbands, P. Hargrove, J. Shalf, O. Marques, E. Ng, A. Drummond, K. Yelick, "Software Roadmap to Plug and Play Petaflop/s", 2006,

S. Williams, J. Shalf, L. Oliker, P. Husbands, K. Yelick, "Dense and Sparse Matrix Operations on the Cell Processor", LBNL Technical Report, 2005,

Simon, H., Kramer, W., Saphir, W., Shalf, J., Bailey, D., Oliker, L., Banda, M., McCurdy, C.W., Hules, J., Canning, A., Day, M., Colella, P., Serafini, D., Wehner, M., Nugent, P., "National Facility for Advanced Computational Science: A Sustainable Path to Scientific Discovery", April 2004, LBNL 5500,

Download File: PUB-5500.pdf (pdf: 1.8 MB)

Thesis/Dissertations

PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes, L. Oliker, 1998,

Download File: oliker-thesis.pdf (pdf: 722 KB)

Posters

Samuel Williams, Charlene Yang, Khaled Ibrahim, Thorsten Kurth, Nan Ding, Jack Deslippe, Leonid Oliker, "Performance Analysis using the Roofline Model", SciDAC PI Meeting, July 2019,

Download File: SciDAC19-Poster-Roofline-SWWilliams.pdf (pdf: 4.9 MB)

A. Buluç, A. Fox, J. R. Gilbert, S. Kamil, A. Lugowski, L. Oliker, S. Williams, "High-performance analysis of filtered semantic graphs", PACT '12 Proceedings of the 21st international conference on Parallel architectures and compilation techniques (extended abstract), 2012, doi: 10.1145/2370816.2370897

S. Williams, J. Carter, J. Demmel, L. Oliker, D. Patterson, J. Shalf, K. Yelick, R. Vuduc, "Autotuning Scientific Kernels on Multicore Systems", ASCR PI Meeting, 2008,

Download File: ascrpi08-autotuning-poster.pdf (pdf: 2.2 MB)