Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Performance and Algorithms Research

Leonid Oliker

Lenny photo
Lenny Oliker
Computer Senior Scientist, PAR Group Lead
Phone: +1 510 486 6625
Fax: +1 510 486 6900

Lenny Oliker is a senior scientist and group lead of the performance and algorithms group (PAR) in the computer science department. His research interests focus on performance optimization, evaluation, and modeling on leading high-end computing systems. Lenny has published over 150 peer-reviewed publications, including five best paper awards, and is the deputy of the recently formed SciDAC4 RAPIDS institute for computer science and data. He is the executive director of the ECP Exabiome project, which aims to develop the world’s fastest genome assembly and analysis algorithms and parallel implementations. Other research activities include the Roofline methodology, which has widely adopted as an effective performance modeling tool within the HPC community.  Lenny is also interested in optimization and evaluation of scientific computations, and has participated in studies in the areas of fusion, genomics, climate, cosmology, and materials science.

Journal Articles

Muaaz G Awan, Jack Deslippe, Aydin Buluc, Oguz Selvitopi, Steven Hofmeyr, Leonid Oliker, Katherine Yelick, "ADEPT: a domain independent sequence alignment strategy for gpu architectures", BMC Bioinformatics, September 2020, 21, doi: 10.1186/s12859-020-03720-1

Steven Hofmeyr, Rob Egan, Evangelos Georganas, Alex C Copeland, Robert Riley, Alicia Clum, Emiley Eloe-Fadrosh, Simon Roux, Eugene Goltsman, Aydin Buluc, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, "Terabase-scale metagenome coassembly with MetaHipMer", Scientific Reports, June 1, 2020, 10, doi: https://doi.org/10.1038/s41598-020-67416-5

Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample (multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer’s scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.

Katherine Yelick, Aydın Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi, Cristina Teodoropol, Leonid Oliker, "The parallelism motifs of genomic data analysis", Philosophical Transactions of The Royal Society A: Mathematical, Physical and Engineering Sciences, 2020,

Bei Wang, Stephane Ethier, William Tang, Khaled Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, "Modern Gyrokinetic Particle-in-cell Simulation of Fusion Plasmas on Top Supercomputers", International Journal of High-Performance Computing Applications (IJHPCA), May 2017, doi: https://doi.org/10.1177/1094342017712059

Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Phillip Colella, Mary Hall, "Compiler-Based Code Generation and Autotuning for Geometric Multigrid on GPU-Accelerated Supercomputers", Parallel Computing (PARCO), April 2017, doi: 10.1016/j.parco.2017.04.002

Aydin Buluc, John Gilbert, Leonid Oliker, "Special Issue: Graph Analysis for Scientific Discovery", Parallel Computing Journal Special Issue Editors, August 1, 2015,

J. Chapman, M. Mascher, A. Buluç, K. Barry, E. Georganas, A. Session, V. Strnadova, J. Jenkins, S. Sehgal, L. Oliker, J Schmutz, K. Yelick, U. Scholz, R. Waugh, J. Poland, G. Muehlbauer, N. Stein, D. Rokhsar, "A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome", Genome biology, 2015,

Adam Lugowski, Shoaib Kamil, Aydın Buluç, Samuel Williams, Erika Duriakova, Leonid Oliker, Armando Fox, John R. Gilbert,, "Parallel processing of filtered queries in attributed semantic graphs", Journal of Parallel and Distributed Computing (JPDC), September 2014, doi: 10.1016/j.jpdc.2014.08.010

L. Oliker and R. Vuduc, "Introduction for Special Issue on Autotuning", International Journal of High Performance Computing Applications (IJHPCA), 2013,

Khaled Z Ibrahim, Kamesh Madduri, Samuel Williams, Bei Wang, Stephane Ethier, Leonid Oliker, "Analysis and optimization of gyrokinetic toroidal simulations on homogeneous and heterogeneous platforms", International Journal of High Performance Computing Applications (IJHPCA), July 2013, doi: 10.1177/1094342013492446

K Madduri, J Su, S Williams, L Oliker, S Ethier, K Yelick, "Optimization of parallel particle-to-grid interpolation on leading multicore platforms", IEEE Transactions on Parallel and Distributed Systems, January 1, 2012, 23:1915--1922, doi: 10.1109/TPDS.2012.28

M. Wehner, L. Oliker, J. Shalf, D. Donofrio, L. Drummond, et al., "Hardware/Software Co-design of Global Cloud System Resolving Models", Journal of Advances in Modeling Earth Systems (JAMES), 2011, 3, M1000:22, doi: 10.1029/2011MS000073

"Emerging Programming Paradigms for Large-Scale Scientific Computing", Guest editors, Parallel Computing special issue,'Emerging Programming Paradigms for Large-Scale Scientific Computing", 2011,

Kamesh Madduri, Eun-Jin Im, Khaled Z. Ibrahim, Samuel Williams, Stephane Ethier, Leonid Oliker, "Gyrokinetic Particle-in-cell Optimization on Emerging Multi- and Manycore Platforms", Parallel Computing (PARCO), January 2011, 37:501 - 520, doi: 10.1016/j.parco.2011.02.001

Shoaib Kamil, Oliker, Pinar, John Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications", IEEE Transactions on Parallel and Distributed Systems, Volume (TPDS), January 1, 2010, 21:188-202,

M. Wehner, L. Oliker., and J. Shalf, "Low Power Supercomputers", IEEE Spectrum, October 2009,

High-performance computing for such things as climate modeling is not going to advance at anything like the pace it has during the last two decades unless we apply fundamentally new ideas. Here we describe one possible approach. Rather than constructing supercomputers from the kinds of microprocessors found in fast desktop computers or servers, we propose adopting designs and design principles drawn, oddly enough, from the portable-electronics marketplace.

David Donofrio, Oliker, Shalf, F. Wehner, Rowen, Krueger, Kamil, Marghoob Mohiyuddin, "Energy-Efficient Computing for Extreme-Scale Science", IEEE Computer, January 2009, 42:62-71, doi: 10.1109/MC.2009.35

 

 

S. Kamil, L. Oliker, A. Pinar, J. Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications", IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009,

K Datta, S Kamill, S Williams, L Oliker, J Shalf, K Yelick, "Optimization and performance modeling of stencil computations on modern microprocessors", SIAM Review, 2009, 51:129--159, doi: 10.1137/070693199

R. Biswas, J. Vetter, L. Oliker, "Revolutionary Technologies for Acceleration of Emerging Petascale Applications", Guest Editors, Parallel Computing Journal, 2009,

S Williams, J Carter, L Oliker, J Shalf, K Yelick, "Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms", Journal of Parallel and Distributed Computing, 2009, 69:762--777, doi: 10.1016/j.jpdc.2009.04.002

J. Borrill, L. Oliker, J. Shalf, H. Shan, A. Uselton, "HPC global file system performance analysis using a scientific-application derived benchmark", Parallel Computing, 2009, 35:358-373, doi: 10.1016/j.parco.2009.02.002

S. Kamil, L. Oliker, A. Pinar, J. Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications\", IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009,

S. Williams, K. Datta, J. Carter, L. Oliker, J. Shalf, K. Yelick, D. Bailey, "PERI: Auto-tuning Memory Intensive Kernels for Multicore", SciDAC PI Meeting, Journal of Physics: Conference Series, 125 012038, July 2008, doi: 10.1088/1742-6596/125/1/012038

M. Wehner, L. Oliker, J. Shalf, "Performance Characterization of the World's Most Powerful Supercomputers", Internation Journal of High Performance Computing Applications (IJHPCA), April 2008,

Michael F. Wehner, L. Oliker, John Shalf, "Towards Ultra-High Resolution Models of Climate and Weather", Internation Journal of High Performance Computing Applications (IJHPCA), January 2008, 22:149-165,

S. Ethier, W. M. Tang, R. Walkup, L. Oliker, "Large-Scale Gyrokenetic particle simulation of Microturbulence in Magnetically Confined Fusion Plasmas", IBM Journal of Research and Development, 2008,

L. Oliker, A. Canning, J. Carter, J. Shalf, S. Ethier, "Scientific application performance on leading scalar and vector supercomputering platforms", International Journal of High Performance Computing Applications, 2008, 22:5-20, doi: 10.1177/1094342006085020

S Williams, J Shalf, L Oliker, S Kamil, P Husbands, K Yelick, "Scientific computing kernels on the cell processor", International Journal of Parallel Programming, January 2007, 35:263--298, doi: 10.1007/s10766-007-0034-5

S Williams, L Oliker, R Vuduc, J Shalf, K Yelick, J Demmel, "Optimization of sparse matrix-vector multiplication on emerging multicore platforms", Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC 07, 2007, doi: 10.1145/1362622.1362674

L. Oliker, J. Carter, M. Wehner, A. Canning, S. Ethier, A. Mirin, G. Bala, D. Parks, P. Worley, S. Kitawaki, Y. Tsuda, "Scientific Application Performance on Leading Scalar and Vector Supercomputing Platforms", International Journal of High Performance Computing Applications (IJHPCA), 2006,

H. Simon, W. Kramer, W. Saphir, J. Shalf, D. Bailey, L. Oliker, et al, "Science Driven System Architecture: A New Process for Leadership Class Computing", Journal of the Earth Simulator, 2005,

L. Oliker, A. Canning, J. Carter, J. Shalf, H. Simon, S. Ethier, D. Parks, S. Kitawaki, Y. Tsuda, T. Sato, "Performance of Ultra-Scale Applications on Leading Vector and Scalar HPC Platforms", Journal of the Earth Simulator, January 2005, 3,

L. Oliker, A. Canning, J. Carter, J. Shalf, D. Skinner, S. Ethier, R. Biswas, J. Djomehri, R. Van Der Wijngaart, "Performance evaluation of the SX-6 vector architecture for scientific computations", Concurrency Computation Practice and Experience, January 2005, 17:69-93, doi: 10.1002/cpe.884

R. Biswas, L. Oliker, H. Shan, "Parallel Computing Strategies for Irregular Algorithms", Annual Review of Scalable Computing, April 2003,

Hongzhang Shan, Jaswinder P. Singh, Leonid Oliker, Rupak Biswas, "Message Passing and Shared Address Space Parallelism on an SMP Cluster", Parallel Computing Journal, Volume 29, Issue 2, February 2003,

L. Oliker. X. Li, P. Husbands, R. Biswas, "Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations", SIAM Review Journal, 2002,

H. Shan, J. P. Singh, L. Oliker, R. Biswas, "A Comparison of Three Programming Models for Adaptive Applications on the Origin2000", Journal of Parallel and Distributed Computing (JPDC), January 1, 2002, doi: doi:10.1006/jpdc.2001.1777

L. Oliker, R. Biswas, S. Das, D. Harvey, "Parallel Dynamic Load Balancing Strategies for Adaptive Irregular Applications", DRAMA special issue of Applied Mathematical Modeling Journal, 2000,

L. Oliker, R. Biswas, "Parallelization of a Dynamic Unstructured Algorithm using Three Leading Programming Paradigms", IEEE Transactions on Parallel and Distributed System (TPDS), 2000,

L. Oliker, R. Biswas and H. Gabow, "Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing", Parallel Computing Journal, Special Issue on Graph Partitioning, pp 1583-1608, 2000,

R. Biswas, L. Oliker, "Experiments with Repartitioning and Load Balancing Adaptive Meshes", Grid Generation and Adaptive Algorithms, IMA Volumes in Mathematics and its Applications, Vol. 113, Springer-Verlag, pp.89-112, 1999,

L. Oliker, R. Biswas, "PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes (JPDC version)", Journal of Parallel and Distributed Computing (JPDC), 1998,

R. Strawn, L. Oliker, R. Biswas, "New Computational Methods for the Prediction and Analysis of Helicopter Noise", Journal of Aircraft, 34, pp. 665-672, 1997,

S. Chatterjee, J. Gilbert, L. Oliker, R. Schreiber, and T. Sheffler, "Algorithms for Automatic Alignment of Arrays", Journal of Parallel and Distributed Computing (JPDC), July 1996,

Conference Papers

Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", RESDIS, https://resdis.github.io/ws/2022/sc/, November 18, 2022,

Taylor Groves, Chris Daley, Rahulkumar Gayatri, Hai Ah Nam, Nan Ding, Lenny Oliker, Nicholas J. Wright, Samuel Williams, "A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures", PMBS, November 2022,

K. Ibrahim, L. Oliker,, "Preprocessing Pipeline Optimization for Scientific Deep-Learning Workloads", IPDPS 22, June 3, 2022,

Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,

Jonathan R Madsen, Muaaz G Awan, Hugo Brunie, Jack Deslippe, Rahul Gayatri, Leonid Oliker, Yunsong Wang, Charlene Yang, Samuel Williams, "TiMemory: Modular Performance Analysis for HPC", International Supercomputing Conference (ISC), June 2020, doi: 10.1007/978-3-030-50743-5_22

A Zeni, G Guidi, M Ellis, N Ding, MD Santambrogio, S Hofmeyr, A Buluc, L Oliker, K Yelick, "LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment", Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020, 2020, 462--471, doi: 10.1109/IPDPS47924.2020.00055

T Groves, B Brock, Y Chen, KZ Ibrahim, L Oliker, NJ Wright, S Williams, K Yelick, "Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches", Proceedings of PMBS 2020: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis, January 2020, 126--137, doi: 10.1109/PMBS51919.2020.00016

G Guidi, O Selvitopi, M Ellis, L Oliker, K Yelick, A Buluc, "Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly", January 1, 2020,

Khaled Ibrahim, Samuel Williams, Leonid Oliker, "Performance Analysis of GPU Programming Models using the Roofline Scaling Trajectories", International Symposium on Benchmarking, Measuring and Optimizing (Bench), BEST PAPER AWARD, November 2019,

M Ellis, G Guidi, A Buluç, L Oliker, K Yelick, "DiBELLA: Distributed long read to long read alignment", ACM International Conference Proceeding Series, January 1, 2019, doi: 10.1145/3337821.3337919

Charlene Yang, Rahulkumar Gayatri, Thorsten Kurth, Protonu Basu, Zahra Ronaghi, Adedoyin Adetokunbo, Brian Friesen, Brandon Cook, Douglas Doerfler, Leonid Oliker, Jack Deslippe, Samuel Williams, "An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability", International Workshop on Performance, Portability and Productivity in HPC (P3HPC), November 2018,

Khaled Ibrahim, Samuel Williams, Leonid Oliker, "Roofline Scaling Trajectories: A Method for Parallel Application and Architectural Performance Analysis", HPCS Special Session on High Performance Computing Benchmarking and Optimization (HPBench), July 2018,

Tuomas Koskela, Zakhar Matveev, Charlene Yang, Adetokunbo Adedoyin, Roman Belenov, Philippe Thierry, Zhengji Zhao, Rahulkumar Gayatri, Hongzhang Shan, Leonid Oliker, Jack Deslippe, Ron Green, and Samuel Williams, "A Novel Multi-Level Integrated Roofline Model Approach for Performance Characterization", ISC, June 2018,

P Koanantakool, A Ali, A Azad, A Buluç, D Morozov, L Oliker, KA Yelick, S-Y Oh, "Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation.", Proceedings of Machine Learning Research, PMLR, 2018, 84:1376--1386,

Philip C. Roth, Hongzhang Shan, David Riegner, Nikolas Antolin, Sarat Sreepathi, Leonid Oliker, Samuel Williams, Shirley Moore, Wolfgang Windl, "Performance Analysis and Optimization of the RAMPAGE Metal Alloy Potential Generation Software", SIGPLAN International Workshop on Software Engineering for Parallel Systems (SEPS), October 2017,

Thorsten Kurth, William Arndt, Taylor Barnes, Brandon Cook, Jack Deslippe, Doug Doerfler, Brian Friesen, Yun (Helen) He, Tuomas Koskela, Mathieu Lobet, Tareq Malas, Leonid Oliker, Andrey Ovsyannikov, Samuel Williams, Woo-Sun Yang, and Zhengji Zhao, "Analyzing Performance of Selected NESAP Applications on the Cori HPC System", Intel Xeon Phi Users Group (IXPUG), June 2017,

M Ellis, E Georganas, R Egan, S Hofmeyr, A Buluç, B Cook, L Oliker, K Yelick, "Performance characterization of de novo genome assembly on leading parallel systems", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, 10417 LN:79--91, doi: 10.1007/978-3-319-64203-1_6

E Georganas, M Ellis, R Egan, S Hofmeyr, A Buluç, B Cook, L Oliker, K Yelick, "MerBench: PGAS benchmarks for high performance genome assembly", Proceedings of PAW 2017: 2nd Annual PGAS Applications Workshop - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis, 2017, 2017-Jan:1--4, doi: 10.1145/3144779.3169109

William Tang, Bei Wang, Stephane Ethier, Grzegorz Kwasniewski, Torsten Hoefler, Khaled Z. Ibrahim4, Kamesh Madduri, Samuel Williams, Leonid Oliker, Carlos Rosales-Fernandez, Tim Williams, "Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide", Supercomputing, November 2016,

Taylor Barnes, Brandon Cook, Jack Deslippe, Douglas Doerfler, Brian Friesen, Yun (Helen) He, Thorsten Kurth, Tuomas Koskela, Mathieu Lobet, Tareq Malas, Leonid Oliker, Andrey Ovsyannikov, Abhinav Sarje, Jean-Luc Vay, Henri Vincenti, Samuel Williams, Pierre Carrier, Nathan Wichmann, Marcus Wagner, Paul Kent, Christopher Kerr, John Dennis, "Evaluating and Optimizing the NERSC Workload on Knights Landing", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2016,

Veronika Strnadova-Neeley, Aydin Buluc, John R. Gilbert, Leonid Oliker, Weimin Ouyang, "LiRa: A New Likelihood-Based Similarity Score for Collaborative Filtering", August 30, 2016,

Douglas Doerfer, Jack Deslippe, Samuel Williams, Leonid Oliker, Brandon Cook, Thorsten Kurth, Mathieu Lobet, Tareq Malas, Jean-Luc Vay, and Henri Vincenti, "Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor", Intel Xeon Phi User Group Workshop (IXPUG), June 2016,

Abhinav Sarje, Douglas W. Jacobsen, Samuel W. Williams, Todd Ringler, Leonid Oliker, "Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers", Cray User Group (CUG), London, UK, May 2016,

P Koanantakool, A Azad, A Buluc, D Morozov, SY Oh, L Oliker, K Yelick, "Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication", Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016, January 2016, 842--853, doi: 10.1109/IPDPS.2016.117

Veronika Strnadová-Neeley, Aydın Buluç, Jarrod Chapman, John R. Gilbert, Joseph Gonzalez, Leonid Oliker, "Efficient Data Reduction for Large-Scale Genetic Mapping", ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), September 10, 2015,

Abhinav Sarje, Sukhyun Song, Douglas Jacobsen, Kevin Huck, Jeffrey Hollingsworth, Allen Malony, Samuel Williams, and Leonid Oliker, "Parallel Performance Optimizations on Unstructured Mesh-Based Simulations", Procedia Computer Science, 1877-0509, June 2015, 51:2016-2025, doi: 10.1016/j.procs.2015.05.466

This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intra- node data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.

Protonu Basu, Samuel Williams, Brian Van Straalen, Mary Hall, Leonid Oliker, Phillip Colella, "Compiler-Directed Transformation for Higher-Order Stencils", International Parallel and Distributed Processing Symposium (IPDPS), May 2015,

Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, Katherine Yelick, "MerAligner: A Fully Parallel Sequence Aligner", IEEE 29th International Parallel and Distributed Processing Symposium (IPDPS), May 2015, 561--570, doi: 10.1109/IPDPS.2015.96

Aligning a set of query sequences to a set of target sequences is an important task in bioinformatics. In this work we present merAligner, a highly parallel sequence aligner that implements a seed -- and -- extend algorithm and employs parallelism in all of its components. MerAligner relies on a high performance distributed hash table (seed index) and uses one-sided communication capabilities of the Unified Parallel C to facilitate a fine-grained parallelism. We leverage communication optimizations at the construction of the distributed hash table and software caching schemes to reduce communication during the aligning phase. Additionally, merAligner preprocesses the target sequences to extract properties enabling exact sequence matching with minimal communication. Finally, we efficiently parallelize the I/O intensive phases and implement an effective load balancing scheme. Results show that merAligner exhibits efficient scaling up to thousands of cores on a Cray XC30 supercomputer using real human and wheat genome data while significantly outperforming existing parallel alignment tools.

Hongzhang Shan, Samuel Williams, Wibe de Jong, Leonid Oliker, "Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture", Programming Models and Applications for Multicores and Manycores (PMAM), February 2015,

E Georganas, A Buluç, J Chapman, S Hofmeyr, C Aluru, R Egan, L Oliker, D Rokhsar, K Yelick, "HipMer: An extreme-scale de novo genome assembler", International Conference for High Performance Computing, Networking, Storage and Analysis, SC, January 1, 2015, 15-20-No, doi: 10.1145/2807591.2807664

Yu Jung Lo, Samuel Williams, Brian Van Straalen, Terry J. Ligocki, Matthew J. Cordery, Leonid Oliker, Mary W. Hall, "Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2014, doi: 10.1007/978-3-319-17248-4_7

Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, Katherine Yelick, "Parallel de Bruijn Graph Construction and Traversal for de Novo Genome Assembly", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), November 16, 2014, 437--448, doi: 10.1109/SC.2014.41

Veronika Strnadova, Aydın Buluç, Joseph Gonzalez, Stefanie Jegelka, Jarrod Chapman, John Gilbert, Daniel Rokhsar, Leonid Oliker, "Efficient and accurate clustering for large-scale genetic mapping", IEEE International Conference on Bioinformatics and Biomedicine (BIBM'14), November 1, 2014,

Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Mary Hall, "Converting Stencils to Accumulations for Communication-Avoiding Optimization in Geometric Multigrid", Workshop on Stencil Computations (WOSC), October 2014,

W.A. de Jong, L. Lin, H. Shan, C. Yang and L. Oliker, "Towards modelling complex mesoscale molecular environments", International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE), 2014,

Protonu Basu, Anand Venkat, Mary Hall, Samuel Williams, Brian Van Straalen, Leonid Oliker, "Compiler generation and autotuning of communication-avoiding operators for geometric multigrid", 20th International Conference on High Performance Computing (HiPC), December 2013, 452--461,

Hongzhang Shan, Brian Austin, Wibe de Jong, Leonid Oliker, Nick Wright, Edoardo Apra, "Performance Tuning of Fock Matrix and Two Electron Integral Calculations for NWChem on Leading HPC Platforms", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2013, doi: 10.1007/978-3-319-10214-6_13

Bei Wang, Stephane Ethier, William Tang, Timothy Williams, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker, "Kinetic Turbulence Simulations at Extreme Scale on Leadership-Class Systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2013, doi: 10.1145/2503210.2503258

P. Basu, A. Venkat, M. Hall, S. Williams, B. Van Straalen, L. Oliker, "Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid", Workshop on Stencil Computations (WOSC), 2013,

Aydın Buluç, Erika Duriakova, Armando Fox, John Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, Samuel Williams, "High-Productivity and High-Performance Analysis of Filtered Semantic Graphs", International Parallel and Distributed Processing Symposium (IPDPS), 2013, doi: 10.1145/2370816.2370897

S. Williams, D. Kalamkar, A. Singh, A. Deshpande, B. Van Straalen, M. Smelyanskiy, A. Almgren, P. Dubey, J. Shalf, L. Oliker, "Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2012, doi: 10.1109/SC.2012.85

K. Kandalla, A. Buluç, H. Subramoni, K. Tomko, J. Vienne, L. Oliker, D. K. Panda, "Can network-offload based non-blocking neighborhood MPI collectives improve communication overheads of irregular graph algorithms?", International Workshop on Parallel Algorithms and Parallel Software (IWPAPS 2012), 2012,

P. Narayanan, A. Koniges, L. Oliker, R. Preissl, S. Williams, N. Wright, M. Umansky, X. Xu, S. Ethier, W. Wang, J. Candy, J. Cary, "Performance Characterization for Fusion Co-design Applications", Cray Users Group (CUG), May 2011,

Aydın Buluç, Samuel Williams, Leonid Oliker, James Demmel, "Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication", IPDPS, IEEE, 2011, doi: https://doi.org/10.1109/IPDPS.2011.73

Kamesh Madduri, Khaled Ibrahim, Samuel Williams, Eun-Jin Im, Stephane Ethier, John Shalf, Leonid Oliker, "Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), January 2011, 23, doi: 10.1145/2063384.2063415

Samuel Williams, Oliker, Carter, John Shalf, "Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), New York, NY, USA, ACM, January 2011, 55, doi: 10.1145/2063384.2063458

Jens Krueger, David Donofrio, John Shalf, Marghoob Mohiyuddin, Samuel Williams, Leonid Oliker, Franz-Josef Pfreund, "Hardware/software co-design for energy-efficient seismic modeling", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), January 2011, 73, doi: 10.1145/2063384.2063482

R. Sudarsan, J. Borrill, C. Cantalupo, T. Kisner, K. Madduri, L. Oliker, Y. Zheng, H. Simon, "Cosmic microwave background map-making at the petascale and beyond", Proceedings of the International Conference on Supercomputing, 2011, 305-316, doi: 10.1145/1995896.1995944

G. Hendry, J, Chan, S, Kamil, L. Oliker , J. Shalf, L. Carloni , K. Bergman, "Silicon Nanophotonic Network-On-Chip using TDM Arbitration", Hot Interconnects, August 2010,

Testing

S. Ethier, M. Adams, J. Carter, L. Oliker, "Petascale Parallelization of the Gyrokinetic Toroidal Code", VECPAR: High Performance Computing for Computational Science, June 2010,

Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams, "An auto-tuning framework for parallel multicore stencil computations", International Parallel & Distributed Processing Symposium (IPDPS), January 1, 2010, 1-12, doi: 10.1109/IPDPS.2010.5470421

A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros, R. Vuduc, "Optimizing and Tuning the Fast Multipole Method for State-of-the-Art Multicore Architectures", International Parallel & Distributed Processing Symposium (IPDPS), 2010, doi: 10.1109/IPDPS.2010.5470415

Andrew Uselton, Howison, J. Wright, Skinner, Keen, Shalf, L. Karavanic, Leonid Oliker, "Parallel I/O performance: From events to ensembles", International Parallel & Distributed Processing Symposium (IPDPS), 2010, 1-11,

J. Shalf, M. Wehner, L. Oliker, "The Challenge of Energy-Efficient HPC", SCIDAC Review, Fall, 2009,

Shoaib Kamil, Cy Chan, Samuel Williams, Leonid Oliker, John Shalf, Mark Howison, E. Wes Bethel, Prabhat, "A Generalized Framework for Auto-tuning Stencil Computations", BEST PAPER AWARD - Cray User Group Conference (CUG), Atlanta, GA, May 4, 2009, LBNL 2078E,

Best Paper Award

S. Williams, J. Carter, L. Oliker, J. Shalf, K. Yelick, "Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4", Proceedings of the Cray User Group (CUG), Atlanta, GA, 2009,

K Madduri, S Williams, S Ethier, L Oliker, J Shalf, E Strohmaier, K Yelick, "Memory-efficient optimization of gyrokinetic particle-to-grid interpolation for multicore processors", Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 09, January 2009, doi: 10.1145/1654059.1654108

K. Datta, S. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, K. Yelick, "Auto-Tuning the 27-point Stencil for Multicore", Proceedings of Fourth International Workshop on Automatic Performance Tuning (iWAPT2009), January 2009,

J Gebis, L Oliker, J Shalf, S Williams, K Yelick, "Improving memory subsystem performance using ViVA: Virtual vector architecture", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, 5455 LNC:146--158, doi: 10.1007/978-3-642-00454-4_16

G. Hendry, S.A. Kamil, A. Biberman, J. Chan, B.G. Lee, M Mohiyuddin, A. Jain, K. Bergman, L.P. Carloni, J. Kubiatocics, L. Oliker, J. Shalf, "Analysis of Photonic Networks for Chip Multiprocessor Using Scientific Applications", International Symposium on Networks-on-Chip (NOCS), 2009,

Marghoob Mohiyuddin, Murphy, Oliker, Shalf, Wawrzynek, Samuel Williams, "A design methodology for domain-optimized power-efficient supercomputing", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2009, doi: 10.1145/1654059.1654072

K Datta, M Murphy, V Volkov, S Williams, J Carter, L Oliker, D Patterson, J Shalf, K Yelick, "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures", 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, January 2008, doi: 10.1109/SC.2008.5222004

S Williams, J Carter, L Oliker, J Shalf, K Yelick, "Lattice Boltzmann simulation optimization on leading multicore platforms", IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM, 2008, doi: 10.1109/IPDPS.2008.4536295

William T.C. Kramer, John M. Shalf, E. Wes Bethel, D. Agarwal, Michael Banda, John Hules, Juan C. Meza, Leonid Oliker, Horst Simon, David Skinner, Francesca Verdier, Howard Walter, Michael Wehner, and Katherine Yelick, "HPC in 2016: A View Point from NERSC", Proceedings of the Cray User Group Conference, Helsinki, Finland, 2008,

Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, James Demmel, "Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2007, doi: 10.1145/1362622.1362674

J. Borrill, L. Oliker. J. Shalf, H. Shan, "Investigation Of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2007,

Shoaib Kamil, Pinar, Gunter, Lijewski, Oliker, John Shalf, "Reconfigurable hybrid interconnection for static and dynamic scientific applications", Conf. Computing Frontiers, 2007, 183-194, LBNL 60060,

L. Oliker, A. Canning, J. Carter, C. Iancu, M. Lijewski, S. Kamil, J. Shalf, H. Shan, E. Strohmaier, S. Ethier, T. Goodale, "Scientific application performance on candidate petascale platforms", Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM, 2007, doi: 10.1109/IPDPS.2007.370259

J. Carter, L. Oliker, J. Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", Extended Version: Lecture Notes in Computer Science, 2007,

S. Williams, J. Shalf, L. Oliker, P. Husbands, S. Kamil, K. Yelick, "The Potential of the Cell Processor for Scientific Computing", ACM International Conference on Computing Frontiers, 2006, doi: 10.1145/1128022.1128027

Michael Welcome, Charles Rendleman, Leonid Okiker, Rupak Biswas, "Performance Characteristics of an Adaptive Mesh Refinement Calculation on Scalar and Vector Platforms", ACM International Conference on Computing Frontiers,, Italy, May 2006, LBNL 59238, doi: 10.1145/1128022.1128074

Adaptive mesh refinement (AMR) is a powerful technique that reduces the resources necessary to solve otherwise intractable problems in computational science. The AMR strategy solves the problem on a relatively coarse grid, and dynamically refines it in regions requiring higher resolution. However, AMR codes tend to be far more complicated than their uniform grid counterparts due to the software infrastructure necessary to dynamically manage the hierarchical grid framework. Despite this complexity, it is generally believed that future multi-scale applications will increasingly rely on adaptive methods to study problems at unprecedented scale and resolution. Recently, a new generation of parallel-vector architectures have become available that promise to achieve extremely high sustained performance for a wide range of applications, and are the foundation of many leadership-class computing systems worldwide. It is therefore imperative to understand the tradeoffs between conventional scalar and parallel-vector platforms for solving AMR-based calculations. In this paper, we examine the LibraryHyperCLaw AMR framework to compare and contrast performance on the Cray X1E, IBM Power3 and Power5, and SGI Altix. To the best of our knowledge, this is the first work that investigates and characterizes the performance of an AMR calculation on modern parallel-vector systems.

S Kamil, K Datta, S Williams, L Oliker, J Shalf, K Yelick, "Implicit and explicit optimizations for stencil computations", Proceedings of the 2006 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2006, 2006, 51--60, doi: 10.1145/1178597.1178605

J. Carter, L. Oliker, J. Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", VECPAR, 2006,

J. Carter, L. Oliker, "Performance Evaluation of Lattice-Boltzmann Magnetohyrodynamics Simulations on Modern Parallel Vector Systems", Proceedings of the 2nd Teraflop Workshop. Lecture Notes in Computer Science (LNCS), Stuttgard, Germany, January 1, 2006,

Jonathan Carter, Oliker, John Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", VECPAR, Springer Berlin/Heidelberg, 2006, 4395:490-503,

J. Carter, L. Oliker, J. Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", High Performance Computing for Computational Science., 2006,

Highest Ranked Conference Paper

L. Oliker, J. Carter, M. Wehner, A. Canning, S. Ethier, A. Mirin, G. Bala, D. Parks, P. Worley, S. Kitawaki, Y. Tsuda, "Leading computational methods on scalar and vector HEC platforms", Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC 05, 2005, 2005, doi: 10.1109/SC.2005.41

John Shalf, Kamil, Oliker, David Skinner, "Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2005, 17,

J. Carter, M. Soe, L. Oliker, Y. Tsuda, G. Vahala, L. Vahala, A. Macnab, "Magnetohydrodynamic Turbulence Simulations on the Earth Simulator Using the Lattice Boltzmann Method", International Conference for High Performance Computing, Networking, Storage and Analysis (SC) - Gordon Bell Finalist, Washington, DC, USA, IEEE Computer Society, 2005,

S. Kamil, J. Shalf, L. Oliker, D. Skinner,, "Understanding Ultra-Scale Application Communication Requirements", IEEE International Symposium on Workload Characterization (IISWC), 2005,

S Kamil, P Husbands, L Oliker, J Shalf, K Yelick, "Impact of modern memory subsystems on cache optimizations for stencil computations", Proceedings of the 3rd 2005 ACM SIGPLAN Workshop on Memory Systems Performance, MSP 2005, 2005, 36--43, doi: 10.1145/1111583.1111589

J. Borrill, J. Carter, L. Oliker, D. Skinner, R. Biswas, "Integrated performance monitoring of a cosmology application on leading HEC platforms", Proceedings of the International Conference on Parallel Processing, 2005, 2005:119-128, doi: 10.1109/ICPP.2005.47

L. Oliker, R. Biswas, J. Borrill, A. Canning, J. Carter, M.J. Djomehri, H. Shan, D. Skinner, "A performance evaluation of the cray X1 for scientific applications", Lecture Notes in Computer Science, 2005, 3402:51-65,

Horst Simon, William Kramer, William Saphir, John Shalf, David Bailey, Leonid Oliker, Michael Banda, C. William McCurdy, John Hules, Andrew Canning, Marc Day, Philip Colella, David Serafini, Michael Wehner, Peter Nugent, "Science-Driven System Architecture: A New Process for Leadership Class Computing", Journal of the Earth Simulator, Volume 2., 2005, LBNL 56545,

J. Carter, J. Borrill, L. Oliker, "Performance characteristics of a cosmology package on leading HPC architectures", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Berlin/Heidelberg, 2004, 3296:176-188,

H. Shan, L. Oliker, R. Biswas, W. Smith, "Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration", International Conference on Advanced Computing and Communication: ADCOM, 2004,

L. Oliker, A. Canning, J. Carter, J. Shalf, S. Ethier, "Scientific Computations on Modern Parallel Vector Systems", Proceedings of the ACM/IEEE SC 2004 Conference: Bridging Communities, 2004, doi: 10.1109/SC.2004.54

L. Oliker, J. Borril, A. Canning, J. Carter, H. Shan, D. Skinner, R. Biswas, J. Djomheri, "A Performance Evaluation of the Cray X1 for Scientific Applications", VECPAR'04: 6th International Meeting on High Performance Computing for Computational Science, 2004,

G Griem, L Oliker, J Shalf, K Yelick, "Identifying performance bottlenecks on modern microarchitectures using an adaptable probe", Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM), 2004, 18:3505--3512,

H. Shan, E. Strohmaier, L. Oliker, "Optimizing Performance of Superscalar Codes for a Single Cray X1 MSP", Proceedings of the 46th Cray User Group Conference:CUG, 2004,

P. A. Agarwal,R. A. Alexander , E. Apra, S. Balay, A. S. Bland, J. Colgan, E. F.D’Azevedo , J. J. Dongarra , T. H. Dunigan, Jr. , M. R. Fahey, R. A. Fahey, A. Geist, M. Gordon, R. J. Harrison , D. Kaushik, M. Krishnakumar , P. Luszczek , A. Mezzacappa, J. A. Nichols , J. Nieplocha, L. Oliker, T. Packwood , M.S. Pindzola, T. C. Schulthess, J. S. Vetter, J. B. White, III , T. L. Windus , P. H. Worley, T. Zacharia, "Cray X1 Evaluation Status Report", Proceedings of the 46th Cray User Group Conference:CUG, 2004,

L. Oliker, G. Griem, "Transitive Closure on the Imagine Stream Processor", Fifth Workshop on Media and Stream Processors (MSP5), 2003,

L. Oliker, A. Canning, J. Carter, J. Shalf, D. Skinner, S. Ethier, R. Biswas, J. Djomehri, R. Van Der Wijngaart, "Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations", Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003, 2003, doi: 10.1145/1048935.1050213

H. Shan, L. Oliker, R.Biswas, "Job Superscheduler Architecture and Performance in Computational Grid Environments", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2003,

S. Chatterji, J. Duell, M. Narayanan, L. Oliker, "Performance Evaluation of Two Emerging Media Processors: VIRAM and Imagine", Workshop on Parallel and Distributed Image Processing, Video Processing, and Multimedia (PDIVM), 2003,

BR Gaeke, P Husbands, XS Li, L Oliker, KA Yelick, R Biswas, "Memory-intensive benchmarks: IRAM vs. cache-based machines", Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002, 2002, 290--296, doi: 10.1109/IPDPS.2002.1015506

H. Shan, J. Singh, L. Oliker, R. Biswas, "Message Passing vs. Shared Address Space on a Cluster of SMPs", International Parallel & Distributed Processing Symposium (IPDPS), 2001,

L. Oliker, X. Li, P. Husbands, R. Biswas, "Ordering Schemes for Sparse Matrices using Modern Programming Paradigms", The IASTED International Conference on Applied Informatics (AI), 2001,

H. Shan, J. Singh, L. Oliker, R. Biswas, "A Comparison of Three Programming Models for Adaptive Applications on the Origin2000", International Conference for High Performance Computing, Networking, Storage and Analysis (SC) - BEST STUDENT PAPER AWARD, 2000,

L. Oliker, A. Wong, W. Kramer, T. Kaltz, D. Bailey, "ESP: A System Utilization Benchmark", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2000,

L. Oliker, X. Li. G. Heber, R. Biswas, "Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms", 13th International Conference on Parallel and Distributed Computing Systems, 2000,

L. Oliker, A. Wong, W. Kramer, T. Kaltz, D. Bailey, "System Utilization Benchmark on the Cray T3E and IBM SP", Fifth Workshop on Job Scheduling, 2000,

L. Oliker, X. Li, G. Heber, R. Biswas, "Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems", Seventh International Workshop on solving Irregularly Structured Problems in Parallel, 2000,

L. Oliker, R. Biswas, "Multithreaded Implementation of a Dynamic Irregular Application", 5th NASA Computational Aerosciences Workshop, 2000,

L. Oliker, R. Biswas, "Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms", International Conference for High Performance Computing, Networking, Storage and Analysis (SC) - BEST PAPER AWARD, 1999,

R. Biswas, S.K. Das, and D.J. Harvey, L. Oliker, "Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications", 13th International Parallel Processing Symposium, 1999,

K. Schloegel, G. Karypis, V. Kumar, R. Biswas, L. Oliker, "A Performance Study of Diffusive vs. Remapped Load-Balancing Schemes", 11th International Conference on Parallel and Distributed Computer Systems, pp. 59-66, 1998,

L. Oliker, R. Biswas, H.N. Gabow, "Performance Analysis and Portability of the PLUM Load Balancing System", Euro-Par'98 Parallel Processing, Lecture Notes in Computer Science, Vol. 1470, Springer-Verlag, pp. 307-317, 1998,

L. Oliker, R. Biswas, "Dynamic Domain Decomposition for Large-Scale Adaptive Calculations", 10th International Conference on Domain Decomposition Methods, 1997,

L. Oliker, R. Biswas, "Load Balancing Unstructured Adaptive Grid Computations", 4th U.S. National Congress on Computaional Mechanics, 1997,

R. Biswas, L. Oliker, "Load Balancing Sequences of Unstructured Adaptive Grids", 4th International Conference on High Performance Computing (HiPC), 1997,

L.Oliker, R. Biswas, "Efficient Load Balancing and Data Remapping for Adaptive Grid Calculations", 9th ACM Symposium on Parallel Algorithms and Architectures (SPAA), 1997,

R. Biswas, L. Oliker, A. Sohn, "Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 1996,

L. Oliker, R. Biswas, S. Strawn, "Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2", Parallel Algorithms for Irregularly Structured Problems, Lecture notes in Computer Science, Vol. 1117, Springer-Verlag, pp. 35-47, 1996,

L. Oliker, R. Biswas, S. Strawn, "Parallel Mesh Adaption with Global Load Balancing on the SP2", NASA Computational Aerosciences Workshop, 1996,

A.M. Wissink, A.S. Lyrintzis, R.C. Strawn, L. Oliker, R. Biswas, "Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers", 34th AIAA Aerospace Sciences Meeting, Paper 96-0153, 1996,

Book Chapters

E. Georganas, S. Hofmeyr, L. Oliker, R. Egan, D. Rokhsar, A. Buluc, K. Yelick, "Extreme-scale de novo genome assembly", Exascale Scientific Applications: Scalability and Performance Portability, edited by T.P. Straatsma, K. B. Antypas, T. J. Williams, ( November 13, 2017) doi: 10.1201/b21930

S. Williams, N. Bell, J. W. Choi, M. Garland, L. Oliker, R. Vuduc, "Sparse Matrix-Vector Multiplication on Multicore and Accelerators", chapter in Scientific Computing with Multicore and Accelerators, edited by Jack Dongarra, David A. Bader, Jakub Kurzak, ( 2010)

L. Oliker, J. Carter, V. Beckner, J. Bell, H. Wasserman, M. Adams, S. Ethier, E. Schnetter, "Large-Scale Numerical Simulations on High-End Computational Platforms", Chapman & Hall/CRC Computational Science, edited by D. H. Bailey, R. F. Lucas, S. W. Williams, (CRC Press: 2010) Pages: 123

S Williams, K Datta, L Oliker, J Carter, J Shalf, K Yelick, "Auto-Tuning Memory-Intensive Kernels for Multicore", Chapman \& Hall/CRC Computational Science, (CRC Press: 2010) Pages: 273--296 doi: 10.1201/b10509-14

K Datta, S Williams, V Volkov, J Carter, L Oliker, J Shalf, K Yelick, "Auto-tuning stencil computations on multicore and accelerators", Scientific Computing with Multicore and Accelerators, ( 2010) Pages: 219--254 doi: 10.1201/b10376

John Shalf, Donofrio, Rowen, Oliker, Michael F. Wehner, "Green Flash: Climate Machine (LBNL)", Encyclopedia of Parallel Computing, (Springer: 2010) Pages: 809-819

Green Flash is a research project focused on an application-driven manycore chip design that leverages commodity-embedded circuit designs and hardware/software codesign processes to create a highly programmable and energy-efficient HPC design. The project demonstrates how a multidisciplinary hardware/software codesign process that facilitates close interactions between applications scientists, computer scientists, and hardware engineers can be used to develop a system tailored for the requirements of scientific computing.

L. Oliker, A. Canning, J. Carter, C. Iancu, M. Lijewski, S. Kamil, J. Shalf, H. Shan, E. Strohmaier, S. Ethier, T. Goodale, "Performance Characteristics of Potential Petascale Scientific Applications", Petascale Computing: Algorithms and Applications. Chapman & Hall/CRC Computational Science Series (Hardcover), edited by David A. Bader, ( 2007)

Chapter

L. Oliker, R. Biswas, R. Van der Wijngaart, D. Baily, A. Snavely, "Performance Evaluation and Modeling of Ultra-Scale Systems", Parallel Processing for Scientific Computing, edited by Michael A. Heroux, Padma Raghavan, and Horst D. Simon, (SIAM: 2007) doi: 0.1137/1.9780898718133.ch5

J. Shalf, L. Oliker, M. Lijewski, S. Kamil, J. Carter, A. Canning, S. Ethier, "Performance Characteristics of Potential Petascale Scientific Applications", Chapman & Hall/CRC Computational Science, (CRC Press: 2007) Pages: 1

Book Chapter

Presentation/Talks

Kamesh Madduri, Williams, Ethier, Oliker, Shalf, Strohmaier, Katherine A. Yelick, Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2009,

S. Williams, et al., The Roofline Model: A Pedagogical Tool for Auto-tuning Kernels on Multicore Architectures, Hot Chips 20, August 10, 2008,

M. Wehner, L. Oliker, J. Shalf, Ultra-Efficient Exascale Scientific Computing, 2008,

L. Oliker, J. Shalf, M. Wehner, Climate Modeling at the Petaflop Scale using Semi-Custom Computing, SIAM Conference on Computational Science and Engineering, 2007,

John Shalf, Shoaib Kamil, David Skinner, Leonid Oliker, Interconnect Requirements for HPC Applications, 2007,

Leonid Oliker, Julian Borrill, Hongzhang Shan, John Shalf, Investigation Of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark., 2007,

L. Oliker, J. Carter, Leading Computational Methods on the Earth Simulator, SIAM Conference on Parallel Processing for Scientific Computing, 2006,

L. Oliker, J. Carter, Evaluation of Vector Architectures for Scientific Codes, SIAM Conference on Parallel Processing for Scientific Computing, 2004,

L. Oliker, M. Wehner, D. Parks, W.S. Wang, High Resolution Atmospheric General Circulation Model Simulations on Vector and Cache-based Architectures, SIAM Conference on Parallel Processing for Scientific Computing, 2004,

H. Shan, J. Singh, L. Oliker, R. Biswas, Design Strategies for Irregularly Adapting Parallel Applications, SIAM Conference on Parallel Processing, 2001,

L. Oliker, R. Biswas, P. Husbands, X. Li, Ordering Sparse Matrices for Cache-Based Systems, SIAM Conference on Parallel Processing, 2001,

L. Oliker, R. Biswas, Multithreading for Dynamic Irregular Applications, First SIAM Conference on Computational Science and Engineering, 2000,

R. Biswas, L. Oliker, Load Balancing Unstructured Adaptive Grids for CFD Problems, 8th SIAM Conference on Parallel Processing for Scientific Computing, 1997,

Reports

Hongzhang Shan, Samuel Williams, Wibe de Jong, Leonid Oliker, "Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture", LBNL Technical Report, October 2014, LBNL 6806E,

Samuel Williams, Dhiraj D. Kalamkar, Amik Singh, Anand M. Deshpande, Brian Van Straalen, Mikhail Smelyanskiy,
Ann Almgren, Pradeep Dubey, John Shalf, Leonid Oliker,
"Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark", December 2012, LBNL 6676E,

M. Christen, N. Keen, T. Ligocki, L. Oliker, J. Shalf, B. van Straalen, S. Williams, "Automatic Thread-Level Parallelization in the Chombo AMR Library", LBNL Technical Report, 2011, LBNL 5109E,

W. Kramer, J. Carter, D. Skinner, L. Oliker, P. Husbands, P. Hargrove, J. Shalf, O. Marques, E. Ng, A. Drummond, K. Yelick, "Software Roadmap to Plug and Play Petaflop/s", 2006,

S. Williams, J. Shalf, L. Oliker, P. Husbands, K. Yelick, "Dense and Sparse Matrix Operations on the Cell Processor", LBNL Technical Report, 2005,

Simon, H., Kramer, W., Saphir, W., Shalf, J., Bailey, D., Oliker, L., Banda, M., McCurdy, C.W., Hules, J., Canning, A., Day, M., Colella, P., Serafini, D., Wehner, M., Nugent, P., "National Facility for Advanced Computational Science: A Sustainable Path to Scientific Discovery", April 2004, LBNL 5500,

Thesis/Dissertations

PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes, L. Oliker, 1998,

Posters

Samuel Williams, Charlene Yang, Khaled Ibrahim, Thorsten Kurth, Nan Ding, Jack Deslippe, Leonid Oliker, "Performance Analysis using the Roofline Model", SciDAC PI Meeting, July 2019,

A. Buluç, A. Fox, J. R. Gilbert, S. Kamil, A. Lugowski, L. Oliker, S. Williams, "High-performance analysis of filtered semantic graphs", PACT '12 Proceedings of the 21st international conference on Parallel architectures and compilation techniques (extended abstract), 2012, doi: 10.1145/2370816.2370897

S. Williams, J. Carter, J. Demmel, L. Oliker, D. Patterson, J. Shalf, K. Yelick, R. Vuduc, "Autotuning Scientific Kernels on Multicore Systems", ASCR PI Meeting, 2008,