John Shalf

John M. Shalf

Department Head for Computer Science

JShalf@lbl.gov

Phone: +1 510 486 4508 | +1 510 316 9427

Fax: +1 510 486 4300

Current Projects

Continuing the Scaling of Digital Computing Post Moore’s Law
DFT Beyond Moore’s Law: Extreme Hardware Specialization for the Future of HPC Demonstrate the performance potential of purpose-built architectures as potential future for HPC applications in absence of Moore’s Law
iARPA SuperTools
PINE: An Energy Efficient Flexibly Interconnected Photonic Data Center Architecture for Extreme Scalability
Project 38: A set of vendor-agnostic architectural explorations involving NSA, the DOE Office of Science, and NNSA

Journal Articles

Zhenguo Wu, Liang Yuan Dai, Asher Novick, Madeleine Glick, Ziyi Zhu, Sébastien Rumley, George Michelogiannakis, John Shalf, Keren Bergman, "Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications", IEEE Journal of Lightwave Technology, May 2023,

George Michelogiannakis, Benjamin Klenk, Brandon Cook, Min Yee Teh, Madeleine Glick, Larry Dennison, Keren Bergman, John Shalf, "A Case For Intra-Rack Resource Disaggregation in HPC", ACM Transactions on Architecture and Code Optimization, February 2022,

Georgios Tzimpragos, Jennifer Volk, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, John Shalf, Timothy Sherwood, "Temporal Computing With Superconductors", IEEE MIcro, March 2021, 41:71-79, doi: 10.1109/MM.2021.3066377

Madeleine Glick, Nathan C. Abrams, Qixiang Cheng, Min Yee Teh, Yu-Han Hung, Oscar Jimenez, Songtao Liu, Yoshitomo Okawachi, Xiang Meng, Leif Johansson, Manya Ghobadi, Larry Dennison, George Michelogiannakis, John Shalf, Alan Liu, John Bowers, Alex Gaeta, Michal Lipson, and Keren Bergman, "PINE: Photonic Integrated Networked Energy efficient datacenters (ENLITENED Program)", IEEE Journal of Optical Communications and Networking, 2020, 12:443-456,

Weiqun Zhang, Ann Almgren, Marcus Day, Tan Nguyen, John Shalf, Didem Unat, "BoxLib with Tiling: An AMR Software Framework", SIAM Journal on Scientific Computing, 2016,

George Michelogiannakis, Xiaoye S. Li, David H. Bailey, John Shalf, "Extending Summation Precision for Network Reduction Operations", Springer International Journal of Parallel Programming, December 2015, 43:6:1218-1243, doi: 10.1007/s10766-014-0326-5

D Unat, C Chan, W Zhang, S Williams, J Bachan, J Bell, J Shalf, "ExaSAT: An exascale co-design tool for performance modeling", International Journal of High Performance Computing Applications, January 2015, 29:209--232, doi: 10.1177/1094342014568690

Download File: International-Journal-of-High-Performance-Computing-Applications-2015-Unat-209-32.pdf (pdf: 4.3 MB)

M. Wehner, L. Oliker, J. Shalf, D. Donofrio, L. Drummond, et al., "Hardware/Software Co-design of Global Cloud System Resolving Models", Journal of Advances in Modeling Earth Systems (JAMES), 2011, 3, M1000:22, doi: 10.1029/2011MS000073

Download File: james11-climate.pdf (pdf: 1.7 MB)

Shoaib Kamil, Oliker, Pinar, John Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications", IEEE Transactions on Parallel and Distributed Systems, Volume (TPDS), January 1, 2010, 21:188-202,

Download File: tpds09-hfast.pdf (pdf: 8.3 MB)

M. Wehner, L. Oliker., and J. Shalf, "Low Power Supercomputers", IEEE Spectrum, October 2009,

High-performance computing for such things as climate modeling is not going to advance at anything like the pace it has during the last two decades unless we apply fundamentally new ideas. Here we describe one possible approach. Rather than constructing supercomputers from the kinds of microprocessors found in fast desktop computers or servers, we propose adopting designs and design principles drawn, oddly enough, from the portable-electronics marketplace.

David Donofrio, Oliker, Shalf, F. Wehner, Rowen, Krueger, Kamil, Marghoob Mohiyuddin, "Energy-Efficient Computing for Extreme-Scale Science", IEEE Computer, January 2009, 42:62-71, doi: 10.1109/MC.2009.35

S. Kamil, L. Oliker, A. Pinar, J. Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications", IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009,

Download File: TPDS09-comm.pdf (pdf: 8.3 MB)

K Datta, S Kamill, S Williams, L Oliker, J Shalf, K Yelick, "Optimization and performance modeling of stencil computations on modern microprocessors", SIAM Review, 2009, 51:129--159, doi: 10.1137/070693199

Download File: sirev09-stencil.pdf (pdf: 2.8 MB)

S Williams, J Carter, L Oliker, J Shalf, K Yelick, "Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms", Journal of Parallel and Distributed Computing, 2009, 69:762--777, doi: 10.1016/j.jpdc.2009.04.002

Download File: jpdc09-lbmhd.pdf (pdf: 1.1 MB)

J. Borrill, L. Oliker, J. Shalf, H. Shan, A. Uselton, "HPC global file system performance analysis using a scientific-application derived benchmark", Parallel Computing, 2009, 35:358-373, doi: 10.1016/j.parco.2009.02.002

Download File: parco09-MADbench.pdf (pdf: 4.4 MB)

S. Kamil, L. Oliker, A. Pinar, J. Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications\", IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009,

John Shalf and Jason Hick (Arie Shoshani and Doron Rotem), "Storage Technology Fundamentals", Scientific Data Management: Challenges, Technology, and Deployment, Volume . Chapman & Hall/CRC, 2009,

Download File: iobook08.pdf (pdf: 1.2 MB)

S. Williams, K. Datta, J. Carter, L. Oliker, J. Shalf, K. Yelick, D. Bailey, "PERI: Auto-tuning Memory Intensive Kernels for Multicore", SciDAC PI Meeting, Journal of Physics: Conference Series, 125 012038, July 2008, doi: 10.1088/1742-6596/125/1/012038

Download File: jpconf8125012089.pdf (pdf: 874 KB)

M. Wehner, L. Oliker, J. Shalf, "Performance Characterization of the World's Most Powerful Supercomputers", Internation Journal of High Performance Computing Applications (IJHPCA), April 2008,

Download File: IJHPCA08Abstract.pdf (pdf: 64 KB)

Michael F. Wehner, L. Oliker, John Shalf, "Towards Ultra-High Resolution Models of Climate and Weather", Internation Journal of High Performance Computing Applications (IJHPCA), January 2008, 22:149-165,

Download File: IJHPCA08-climate.pdf (pdf: 580 KB)

Shantenu Jha, Hartmut Kaiser, Andre Merzky, John Shalf, "SAGA - The Simple API for Grid Applications - Motivation, Design, and Implementation", Encyclopedia of Grid Technologies and Applications, Volume 1. Information Science Reference (www.info-sci-ref.com), 2008,

Download File: egc2007.pdf (pdf: 227 KB)

S Williams, J Shalf, L Oliker, S Kamil, P Husbands, K Yelick, "Scientific computing kernels on the cell processor", International Journal of Parallel Programming, January 2007, 35:263--298, doi: 10.1007/s10766-007-0034-5

Download File: ijpp07-cell.pdf (pdf: 1000 KB)

S Williams, L Oliker, R Vuduc, J Shalf, K Yelick, J Demmel, "Optimization of sparse matrix-vector multiplication on emerging multicore platforms", Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC 07, 2007, doi: 10.1145/1362622.1362674

Download File: parco08-spmv.pdf (pdf: 1.5 MB)

John Shalf, "The New Landscape of Parallel Computer Architecture", Journal of Physics: Conference Series, Volume . IOP Electronics Journals, 2007,

Download File: ShalfSciDAC2007JPCSfinal.pdf (pdf: 4.3 MB)

Tom Goodale, Shantenu Jha, Hartmut Kaiser, Thilo Kielmann, Pascal Kleijer, Gregor von Laszewski, Craig Lee, Andre Merzky, Hrabri Rajic, Hrabri, John Shalf, "SAGA: A Simple API for Grid Applications -- High-Level Application Programming on the Grid", Computational Methods in Science and Technology, Volume 12(1). Poznan, 2006, LBNL 59066,

Download File: SAGAJournal.pdf (pdf: 1 MB)

H. Simon, W. Kramer, W. Saphir, J. Shalf, D. Bailey, L. Oliker, et al, "Science Driven System Architecture: A New Process for Leadership Class Computing", Journal of the Earth Simulator, 2005,

Download File: JES2-Simon.pdf (pdf: 110 KB)

L. Oliker, A. Canning, J. Carter, J. Shalf, H. Simon, S. Ethier, D. Parks, S. Kitawaki, Y. Tsuda, T. Sato, "Performance of Ultra-Scale Applications on Leading Vector and Scalar HPC Platforms", Journal of the Earth Simulator, January 2005, 3,

Download File: JES3-Oliker.pdf (pdf: 101 KB)

L. Oliker, A. Canning, J. Carter, J. Shalf, D. Skinner, S. Ethier, R. Biswas, J. Djomehri, R. Van Der Wijngaart, "Performance evaluation of the SX-6 vector architecture for scientific computations", Concurrency Computation Practice and Experience, January 2005, 17:69-93, doi: 10.1002/cpe.884

Download File: CCPE05-sx6.pdf (pdf: 1 MB)

Conference Papers

Jie Li, George Michelogiannakis, Samuel Maloney, Brandon Cook, Estela Suarez, John Shalf, "Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources", IEEE International Conference on Cluster Computing (CLUSTER), September 2024, doi: 10.1109/CLUSTER59578.2024.00033

Jie Li, George Michelogiannakis, Brandon Cook, John Shalf, Yong Chen, "Scheduling and Allocation of Disaggregated Memory Resources in HPC Systems", IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, May 2024,

George Michelogiannakis, Yehia Arafa, Brandon Cook, Liang Yuan Dai, Abdel-Hameed Hameed Badawy, Madeleine Glick, Yuyang Wang, Keren Bergman, John shalf, "Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics", IEEE International Conference on Cluster Computing (CLUSTER), November 2023,

George Michelogiannakis, Madeleine Glick, John Shalf, Keren Bergman, "Photonics as a means to implement intra-rack resource disaggregation", Proceedings Volume 12027, Metro and Data Center Optical Networks and Short-Reach Links V, March 2022, doi: https://doi.org/10.1117/12.2607317

Md Abdul M Faysal, Shaikh Arifuzzaman, Cy Chan, Maximilian Bremer, Doru Popovici, John Shalf, "HyPC-Map: A Hybrid Parallel Community Detection Algorithm Using Information-Theoretic Approach", HPEC, September 20, 2021,

George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman, "Maximizing the impact of emerging photonic switches at the system level", SPIE 11692, Optical Interconnects XXI, 116920Z, March 2021,

Anastasiia Butko, George Michelogiannakis, Samuel Williams, Costin Iancu, David Donofrio, John Shalf, Jonathan Carter, Irfan Siddiqi, "Understanding Quantum Control Processor Capabilities and Limitations through Circuit Characterization", IEEE Conference on Rebooting Computing (ICRC), December 2020,

Download File: ICRC20-QUASAR-final.pdf (pdf: 1.1 MB)

Min Yee Teh, Yu-Han Hung, George Michelogiannakis, Shijia Yan, Madeleine Glick, John Shalf, Keren Bergman, "TAGO: rethinking routing design in high performance reconfigurable networks", SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2020,

John Shalf, George Michelogiannakis, Brian Austin, Taylor Groves, Manya Ghobadi, Larry Dennison, Tom Gray, Yiwen Shen, Min Yee Teh, Madeleine Glick, and Keren Bergman, "Photonic Memory Disaggregation in Datacenters", OSA Advanced Photonics Congress (AP), July 2020,

Georgios Tzimpragos, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, Jennifer Volk, John Shalf, Timothy Sherwood, "A Computational Temporal Logic for Superconducting Accelerators", ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, March 2020,

George Michelogiannakis, Yiwen Shen, Min Yeh Teh, Xian Meng, Benjamin Aivazi, Taylor Groves, John Shalf, Madeleine Glick, Manya Ghobadi, Larry Dennison, Keren Bergman, "Bandwidth Steering in HPC Using Silicon Nanophotonics", SC19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2019,

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "Extending classical processors to support future large scale quantum accelerators", Proceedings of the 16th ACM International Conference on Computing Frontiers Pages, April 2019,

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "TIGER: topology-aware task assignment approach using ising machines", Proceedings of the 16th ACM International Conference on Computing Frontiers, April 2019,

George Michelogiannakis, Jeremiah Wilke, Min Yee Teh, Madeleine Glick, John Shalf, Keren Bergman, "Challenges and opportunities in system-level evaluation of photonics", Proceedings Volume 10946, Metro and Data Center Optical Networks and Short-Reach Links II, February 2019, doi: https://doi.org/10.1117/12.2510443

D Vasudevan, G Michclogiannakis, D Donofrio, J Shalf, "PARADISE - Post-Moore Architecture and Accelerator Design Space Exploration Using Device Level Simulation and Experiments", 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE, January 2019, doi: 10.1109/ispass.2019.00022

Anastasiia Butko, Albert Chen, David Donofrio, Farzad Fatollahi-Fard, John Shalf, "Open2C: Open-source Generator for Exploration of Coherent Cache Memory Subsystems", MEMSYS '18, New York, NY, USA, ACM, 2018, 311--317, doi: 10.1145/3240302.3270314

George Michelogiannakis, Benjamin Aivazi, Yiwen Shen, Larry Dennison, John Shalf, Keren Bergman, Madeleine Glick, "Architectural Opportunities and Challenges from Emerging Photonics in Future Systems", Photonics in Switching and Computing (PSC), September 2018,

Keren Bergman, John Shalf, George Michelogiannakis, Sebastien Rumley, Larry Dennison, Monia Ghobadi, "PINE: An Energy Efficient Flexibly Interconnected Photonic Data Center Architecture for Extreme Scalability", 31st annual conference of the IEEE Photonics Society, IEEE, June 2018,

George Michelogiannakis, John Shalf, "Last Level Collective Hardware Prefetching For Data-Parallel Applications", IEEE 24th International Conference on High Performance Computing, IEEE, December 2017,

Dilip Vasudevan, George Michelogiannakis, John Shalf, "CASPER - Configurable Design Space Exploration of Programmable Architectures for Machine Learning using Beyond Moore Devices", IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH), July 2017,

George Michelogiannakis, Khaled Z. Ibrahim, John Shalf, Jeremiah J. Wilke, Samuel Knight, Joseph P. Kenny, "APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks", 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, IEEE, May 2017, LBNL 1007126,

T Nguyen, D Unat, W Zhang, A Almgren, N Farooqi, J Shalf, "Perilla: Metadata-Based Optimizations of an Asynchronous Runtime for Adaptive Mesh Refinement", International Conference for High Performance Computing, Networking, Storage and Analysis, SC, January 1, 2017, 945--956, doi: 10.1109/SC.2016.80

MN Farooqi, D Unat, T Nguyen, W Zhang, A Almgren, J Shalf, "Nonintrusive AMR asynchrony for communication optimization", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), January 1, 2017, 10417 LN:682--694, doi: 10.1007/978-3-319-64203-1_49

D Vasudevan, A Butko, G Michelogiannakis, D Donofrio, J Shalf, "Towards an Integrated Strategy to Preserve Digital Computing Performance Scaling Using Emerging Technologies", Springer International Publishing, January 1, 2017, 115--123, doi: 10.1007/978-3-319-67630-2_10

With the decline and eventual end of historical rates of lithographic scaling, we arrive at a crossroad where synergistic and holistic decisions are required to preserve Moore's law technology scaling. Numerous emerging technologies aim to extend digital electronics scaling of performance, energy efficiency, and computational power/density,
ranging from devices (transistors), memories, 3D integration capabilities, specialized architectures, photonics, and others.
The wide range of technology options creates the need for an integrated strategy to understand the impact of these emerging technologies on future large-scale digital systems for diverse application requirements and optimization metrics.
In this paper, we argue for a comprehensive methodology that spans the different levels of abstraction -- from materials, to devices, to complex digital systems and applications. Our approach integrates compact models of low-level characteristics of the emerging technologies to inform higher-level simulation models to evaluate their responsiveness to application requirements.
The integrated framework can then automate the search for an optimal architecture using available emerging technologies to maximize a targeted optimization metric.

B. Bastem, D. Unat, W. Zhang, A. Almgren, J. Shalf, "Overlapping Data Transfers with Computation on GPU with Tiles", 2017 46th International Conference on Parallel Processing (ICPP), pp. 171-180, 2017,

George Michelogiannakis, Dave Donofrio, John Shalf, "Modeling of Novel Transistors, Manufacturing Technologies, and Architectures to Preserve Digital Computing Performance Scaling", 1ST INTERNATIONAL WORKSHOP ON POST-MOORE’S ERA SUPERCOMPUTING (PMES), November 2016,

Download File: pmes.pdf (pdf: 103 KB)

Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf, "OpenSoC Fabric: On-Chip Network Generator", ISPASS 2016: International Symposium on Performance Analysis of Systems and Software, IEEE, April 2016, 194-203, doi: 10.1109/ISPASS.2016.7482094

D Unat, T Nguyen, W Zhang, MN Farooqi, B Bastem, G Michelogiannakis, A Almgren, J Shalf, "TiDA: High-level programming abstractions for data locality management", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), January 2016, 9697:116--135, doi: 10.1007/978-3-319-41321-1_7

Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf, "OpenSoC Fabric: On-Chip Network Generator", Proceedings of the Workshop on Network on Chip Architectures, ACM, December 2014, 45-50, LBNL LBNL-1005675, doi: 10.1145/2685342.2685351

Download File: opensocnocarc.pdf (pdf: 1.1 MB)

J.A. Ang, R.F. Barrett, R.E. Benner, D. Burke, C. Chan, D. Donofrio, S.D. Hammond, K.S. Hemmert, S.M. Kelly, H. Le, V.J. Leung, D.R. Resnick, A.F. Rodrigues, J. Shalf, D. Stark, D. Unat, N.J. Wright, "Abstract Machine Models and Proxy Architectures for Exascale Computing", 2014 Hardware-Software Co-Design for High Performance Computing, November 17, 2014,

Download File: CALAbstractMachineModelsv1.1.pdf (pdf: 2.4 MB)

George Michelogiannakis, John shalf, "Variable-Width Datapath for On-Chip Network Static Power Reduction", 8th International Symposium on Networks-on-Chip (NOCS), September 2014,

Download File: abn.pdf (pdf: 277 KB)

George Michelogiannakis, Alexander Williams, Samuel Williams, John Shalf, "Collective Memory Transfers for Multi-Core Chips", International Conference on Supercomputing (ICS), June 2014, doi: 10.1145/2597652.2597654

Download File: cms2.pdf (pdf: 613 KB)

M. Jung, E. H. Wilson III, W. Choi, J. Shalf, H. M. Aktulga, C. Yang, E. Saule, U. V. Catalyurek, M. Kandemir, "Exploring the Future of Out-of-core Computing with Compute-Local Non-Volatile Memory", International Conference for High Performance Computing, Networking, Storage and Analysis 2013 (SC13), NY, USA, ACM New York, November 17, 2013, doi: 10.1145/2503210.2503261

George Michelogiannakis, Xiaoye S. Li, David H. Bailey, John Shalf, "Extending Summation Precision for Network Reduction Operations", 25th International Symposium on Computer Architecture and High Performance Computing, IEEE Computer Society, October 2013,

Download File: sbac2013personal.pdf (pdf: 195 KB)

Double precision summation is at the core of numerous important algorithms such as Newton-Krylov methods and other operations involving inner products, but the effectiveness of summation is limited by the accumulation of rounding errors, which are an increasing problem with the scaling of modern HPC systems and data sets. To reduce the impact of precision loss, researchers have proposed increased- and arbitrary-precision libraries that provide reproducible error or even bounded error accumulation for large sums, but do not guarantee an exact result. Such libraries can also increase computation time significantly. We propose big integer (BigInt) expansions of double precision variables that enable arbitrarily large summations without error and provide exact and reproducible results. This is feasible with performance comparable to that of double-precision floating point summation, by the inclusion of simple and inexpensive logic into modern NICs to accelerate performance on large-scale systems.

Cy Chan, Didem Unat, Michael Lijewski, Weiqun Zhang, John Bell, John Shalf, "Software Design Space Exploration for Exascale Combustion Co-Design", International Supercomputing Conference (ISC), Leipzig, Germany, June 16, 2013,

Download File: isc13-exasat.pdf (pdf: 1.5 MB)

D Unat, CP Chan, W Zhang, J Bell, J Shalf, "Tiling as a Durable Abstraction for Parallelism and Data Locality", WOLFHPC 2013 - SC13 Workshop on Domain-Specific Languages and High-Level Frameworks for High-Performance Computing, 2013,

S. Williams, D. Kalamkar, A. Singh, A. Deshpande, B. Van Straalen, M. Smelyanskiy, A. Almgren, P. Dubey, J. Shalf, L. Oliker, "Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2012, doi: 10.1109/SC.2012.85

Download File: sc12-mg.pdf (pdf: 808 KB)
Download File: sc12mgtalk.pdf (pdf: 1.9 MB)

Hongzhang Shan, Brian Austin, Nicholas Wright, Erich Strohmaier, John Shalf, Katherine Yelick, "Accelerating Applications at Scale Using One-Sided Communication", Santa Barbara, CA, The 6th Conference on Partitioned Global Address Programming Models, October 10, 2012,

Download File: ScaleUsingOneSided.pdf (pdf: 522 KB)

Mehmet Balman, Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, and Brian L. Tierney, "Experiences with 100G Network Applications", In Proceedings of the Fifth international Workshop on Data-intensive Distributed Computing, in conjunction with ACM High Performance Distributing Computing (HPDC) Conference, 2012, Delft, Netherlands, June 2012, LBNL 5603E, doi: 10.1145/2286996.2287004

Download File: didc-report.pdf (pdf: 11 MB)

100Gbps networking has finally arrived, and many research and educational in- stitutions have begun to deploy 100Gbps routers and services. ESnet and Internet2 worked together to make 100Gbps networks available to researchers at the Super- computing 2011 conference in Seattle Washington. In this paper, we describe two of the first applications to take advantage of this network. We demonstrate a visu- alization application that enables remotely located scientists to gain insights from large datasets. We also demonstrate climate data movement and analysis over the 100Gbps network. We describe a number of application design issues and host tuning strategies necessary for enabling applications to scale to 100Gbps rates.

Keren Bergman, Gilbert Hendry, Paul Hargrove, John Shalf, Bruce Jacob, K. Scott Hemmert, Arun Rodrigues, David Resnick, "Let there be light!: the future of memory systems is photonics and 3D stacking", Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC'11), San Jose, California, June 5, 2011, 43-48, doi: 10.1145/1988915.1988926

Samuel Williams, Oliker, Carter, John Shalf, "Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), New York, NY, USA, ACM, January 2011, 55, doi: 10.1145/2063384.2063458

Download File: sc11-lbmhd.pdf (pdf: 666 KB)
Download File: sc11lbmhdtalk.pdf (pdf: 1.4 MB)

Jens Krueger, David Donofrio, John Shalf, Marghoob Mohiyuddin, Samuel Williams, Leonid Oliker, Franz-Josef Pfreund, "Hardware/software co-design for energy-efficient seismic modeling", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), January 2011, 73, doi: 10.1145/2063384.2063482

Download File: sc11-greenwave.pdf (pdf: 614 KB)

Kamesh Madduri, Khaled Ibrahim, Samuel Williams, Eun-Jin Im, Stephane Ethier, John Shalf, Leonid Oliker, "Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), January 2011, 23, doi: 10.1145/2063384.2063415

Download File: sc11-gtc.pdf (pdf: 1.3 MB)

H Shan, NJ Wright, J Shalf, K Yelick, M Wagner, N Wichmann, "A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI", PMBS 11 - Proceedings of the 2nd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, Co-located with SC 11, January 1, 2011, 13--14, doi: 10.1145/2088457.2088467

Download File: pmbs11.pdf (pdf: 497 KB)

Lavanya Ramakrishnan, Keith Jackson, Shane Canon, Shreyas Cholia, John Shalf, "Defining Future Platform Requirements for e-Science Cloud (Position paper)", ACM Symposium on Cloud Computing 2010 (ACM SOCC 2010), Indianapolis, Indiana, 2010,

Mark Howison, Quincey Koziol, David Knaak, John Mainzer, John Shalf, "Tuning HDF5 for Lustre File Systems", Proceedings of 2010 Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS10), Heraklion, Crete, Greece, September 2010, LBNL 4803E,

Download File: LBNL-4803E.pdf (pdf: 242 KB)

G. Hendry, J, Chan, S, Kamil, L. Oliker , J. Shalf, L. Carloni , K. Bergman, "Silicon Nanophotonic Network-On-Chip using TDM Arbitration", Hot Interconnects, August 2010,

Download File: hoti10-siphotonics.pdf (pdf: 552 KB)

Testing

J. A. Colmenares, S. Bird, H. Cook, P. Pearce, D. Zhu, J. Shalf, S. Hofmeyr, K. Asanovic, J. Kubiatowicz, "Resource Management in the Tessellation Manycore OS", 2nd Usenix Workshop on Hot Topics in Parallelism (HotPar), June 15, 2010,

Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams, "An auto-tuning framework for parallel multicore stencil computations", International Parallel & Distributed Processing Symposium (IPDPS), January 1, 2010, 1-12, doi: 10.1109/IPDPS.2010.5470421

Download File: ipdps10-ast.pdf (pdf: 789 KB)

Andrew Uselton, Howison, J. Wright, Skinner, Keen, Shalf, L. Karavanic, Leonid Oliker, "Parallel I/O performance: From events to ensembles", International Parallel & Distributed Processing Symposium (IPDPS), 2010, 1-11,

Download File: ipdps10ipm.pdf (pdf: 1.7 MB)

KR Jackson, L Ramakrishnan, K Muriki, S Canon, S Cholia, J Shalf, HJ Wasserman, NJ Wright, "Performance analysis of high performance computing applications on the Amazon Web Services cloud", Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010, 2010, 159--168, doi: 10.1109/CloudCom.2010.69

J. Shalf, M. Wehner, L. Oliker, "The Challenge of Energy-Efficient HPC", SCIDAC Review, Fall, 2009,

Shoaib Kamil, Cy Chan, Samuel Williams, Leonid Oliker, John Shalf, Mark Howison, E. Wes Bethel, Prabhat, "A Generalized Framework for Auto-tuning Stencil Computations", BEST PAPER AWARD - Cray User Group Conference (CUG), Atlanta, GA, May 4, 2009, LBNL 2078E,

Download File: cug09-autotune.pdf (pdf: 354 KB)

Best Paper Award

S. Williams, J. Carter, L. Oliker, J. Shalf, K. Yelick, "Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4", Proceedings of the Cray User Group (CUG), Atlanta, GA, 2009,

Download File: cug09-lbmhd.pdf (pdf: 443 KB)

K Madduri, S Williams, S Ethier, L Oliker, J Shalf, E Strohmaier, K Yelick, "Memory-efficient optimization of gyrokinetic particle-to-grid interpolation for multicore processors", Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 09, January 2009, doi: 10.1145/1654059.1654108

Download File: sc09-gtc.pdf (pdf: 3 MB)

K. Datta, S. Williams, V. Volkov, J. Carter, L. Oliker, J. Shalf, K. Yelick, "Auto-Tuning the 27-point Stencil for Multicore", Proceedings of Fourth International Workshop on Automatic Performance Tuning (iWAPT2009), January 2009,

Download File: iwapt09-27pt.pdf (pdf: 465 KB)

J Gebis, L Oliker, J Shalf, S Williams, K Yelick, "Improving memory subsystem performance using ViVA: Virtual vector architecture", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009, 5455 LNC:146--158, doi: 10.1007/978-3-642-00454-4_16

Download File: arcs09-viva.pdf (pdf: 448 KB)

G. Hendry, S.A. Kamil, A. Biberman, J. Chan, B.G. Lee, M Mohiyuddin, A. Jain, K. Bergman, L.P. Carloni, J. Kubiatocics, L. Oliker, J. Shalf, "Analysis of Photonic Networks for Chip Multiprocessor Using Scientific Applications", International Symposium on Networks-on-Chip (NOCS), 2009,

Download File: nocs09-photonics.pdf (pdf: 1.2 MB)

Marghoob Mohiyuddin, Murphy, Oliker, Shalf, Wawrzynek, Samuel Williams, "A design methodology for domain-optimized power-efficient supercomputing", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2009, doi: 10.1145/1654059.1654072

Download File: sc09-cotuning.pdf (pdf: 912 KB)

B.V. Straalen, J. Shalf, T. Ligocki, N. Keen, and W. Yang, "Scalability Challenges for Massively Parallel AMR Application", 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009., 2009,

Download File: ipdps09finalcertified.pdf (pdf: 366 KB)

Brian van Straalen, Shalf, J. Ligocki, Keen, Woo-Sun Yang, "Scalability challenges for massively parallel AMR applications", IPDPS, 2009, 1-12,

Download File: ipdps09submit.pdf (pdf: 529 KB)

K Datta, M Murphy, V Volkov, S Williams, J Carter, L Oliker, D Patterson, J Shalf, K Yelick, "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures", 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, January 2008, doi: 10.1109/SC.2008.5222004

Download File: sc08-stencil.pdf (pdf: 598 KB)

S Williams, J Carter, L Oliker, J Shalf, K Yelick, "Lattice Boltzmann simulation optimization on leading multicore platforms", IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM, 2008, doi: 10.1109/IPDPS.2008.4536295

Download File: ipdps08-lbmhd.pdf (pdf: 560 KB)

Shoaib Kamil, Shalf, Erich Strohmaier, "Power efficiency in high performance computing", IPDPS, 2008, 1-8,

Download File: powereffreportfull.pdf (pdf: 312 KB)

Hongzhang Shan, Antypas, John Shalf, "Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark", SC, 2008, 42,

Download File: sc08.ior.pdf (pdf: 290 KB)

William T.C. Kramer, John M. Shalf, E. Wes Bethel, D. Agarwal, Michael Banda, John Hules, Juan C. Meza, Leonid Oliker, Horst Simon, David Skinner, Francesca Verdier, Howard Walter, Michael Wehner, and Katherine Yelick, "HPC in 2016: A View Point from NERSC", Proceedings of the Cray User Group Conference, Helsinki, Finland, 2008,

Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, James Demmel, "Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2007, doi: 10.1145/1362622.1362674

Download File: sc07-spmv.pdf (pdf: 438 KB)

J. Borrill, L. Oliker. J. Shalf, H. Shan, "Investigation Of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2007,

Download File: SC07-MADbench2.pdf (pdf: 581 KB)

Shoaib Kamil, Pinar, Gunter, Lijewski, Oliker, John Shalf, "Reconfigurable hybrid interconnection for static and dynamic scientific applications", Conf. Computing Frontiers, 2007, 183-194, LBNL 60060,

Download File: CF07.pdf (pdf: 9.5 MB)

L. Oliker, A. Canning, J. Carter, C. Iancu, M. Lijewski, S. Kamil, J. Shalf, H. Shan, E. Strohmaier, S. Ethier, T. Goodale, "Scientific application performance on candidate petascale platforms", Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM, 2007, doi: 10.1109/IPDPS.2007.370259

Download File: ipdps07-petascale.pdf (pdf: 4.4 MB)

J. Carter, L. Oliker, J. Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", Extended Version: Lecture Notes in Computer Science, 2007,

Download File: LNCS07.pdf (pdf: 445 KB)

J. Carter, Y. He, J. Shalf, H. Shan, E. Strohmaier, H. Wasserman, "The Performance Effect of Multi-core on Scientific Applications", Proceedings of Cray User Group, 2007, LBNL 62662,

Download File: CUG2007FINAL.pdf (pdf: 149 KB)

Hongzhang Shan and John Shalf, "Using IOR to Analyze the I/O performance for HPC Platforms", CUG.org, 2007, LBNL 62647,

Download File: cug07shan.pdf (pdf: 349 KB)

S. Williams, J. Shalf, L. Oliker, P. Husbands, S. Kamil, K. Yelick, "The Potential of the Cell Processor for Scientific Computing", ACM International Conference on Computing Frontiers, 2006, doi: 10.1145/1128022.1128027

Download File: cf06-cell-potential.pdf (pdf: 213 KB)

S Kamil, K Datta, S Williams, L Oliker, J Shalf, K Yelick, "Implicit and explicit optimizations for stencil computations", Proceedings of the 2006 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2006, 2006, 51--60, doi: 10.1145/1178597.1178605

Download File: mspc06-stencil.pdf (pdf: 421 KB)

J. Carter, L. Oliker, J. Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", VECPAR, 2006,

Download File: vecpar06-vector.pdf (pdf: 410 KB)

Jonathan Carter, Oliker, John Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", VECPAR, Springer Berlin/Heidelberg, 2006, 4395:490-503,

Download File: LNCS07-vector.pdf (pdf: 445 KB)

J. Carter, L. Oliker, J. Shalf, "Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems", High Performance Computing for Computational Science., 2006,

Download File: vecpar06-vector.pdf (pdf: 410 KB)

Highest Ranked Conference Paper

Luke Gosink, John Shalf, Kurt Stockinger, Wu, Wes Bethel, "HDF5-FastQuery: Accelerating Complex Queries on Datasets using Fast Bitmap Indices", SSDBM 2006, Vienna, Austria, July 2006, IEEE Computer Society Press., 2006, 149--158,

John Shalf, Kamil, Oliker, David Skinner, "Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2005, 17,

Download File: sc05-communication.pdf (pdf: 6.9 MB)

S. Kamil, J. Shalf, L. Oliker, D. Skinner,, "Understanding Ultra-Scale Application Communication Requirements", IEEE International Symposium on Workload Characterization (IISWC), 2005,

Download File: IISWC05-communication.pdf (pdf: 4.3 MB)

S Kamil, P Husbands, L Oliker, J Shalf, K Yelick, "Impact of modern memory subsystems on cache optimizations for stencil computations", Proceedings of the 3rd 2005 ACM SIGPLAN Workshop on Memory Systems Performance, MSP 2005, 2005, 36--43, doi: 10.1145/1111583.1111589

Download File: msp05-stencil.pdf (pdf: 902 KB)

Horst Simon, William Kramer, William Saphir, John Shalf, David Bailey, Leonid Oliker, Michael Banda, C. William McCurdy, John Hules, Andrew Canning, Marc Day, Philip Colella, David Serafini, Michael Wehner, Peter Nugent, "Science-Driven System Architecture: A New Process for Leadership Class Computing", Journal of the Earth Simulator, Volume 2., 2005, LBNL 56545,

Download File: JES-SDSA.pdf (pdf: 110 KB)

Kurt Stockinger, John Shalf, Kesheng Wu, E Wes Bethel, "Query-driven visualization of large data sets", VIS 05. IEEE Visualization, 2005., 2005, 167--174,

L. Oliker, A. Canning, J. Carter, J. Shalf, S. Ethier, "Scientific Computations on Modern Parallel Vector Systems", Proceedings of the ACM/IEEE SC 2004 Conference: Bridging Communities, 2004, doi: 10.1109/SC.2004.54

Download File: SC04-vector.pdf (pdf: 1.9 MB)

G Griem, L Oliker, J Shalf, K Yelick, "Identifying performance bottlenecks on modern microarchitectures using an adaptable probe", Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM), 2004, 18:3505--3512,

Download File: pmeo2004.pdf (pdf: 419 KB)

L. Oliker, A. Canning, J. Carter, J. Shalf, D. Skinner, S. Ethier, R. Biswas, J. Djomehri, R. Van Der Wijngaart, "Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations", Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003, 2003, doi: 10.1145/1048935.1050213

Download File: SC03-SX6.pdf (pdf: 1 MB)

Gunther H. Weber, Martin Öhler, Oliver Kreylos, John Shalf, Wes Bethel, Bernd Hamann, Gerik Scheuermann, "Parallel Cell Projection Rendering of Adaptive Mesh Refinement Data", IEEE Symposium on Parallel and Large-Data Visualization and Graphics, 2003, 51-60,

Gunther H. Weber, Oliver Kreylos, Terry J. Ligocki, Jonh Shalf, Hans Hagen, Bernd Hamann, Ken I. Joy, Kwan-Liu Ma, "High-quality Volume Rendering of Adaptive Mesh Refinement Data", VMV, 2001, 121-128,

Book Chapters

S Williams, K Datta, L Oliker, J Carter, J Shalf, K Yelick, "Auto-Tuning Memory-Intensive Kernels for Multicore", Chapman \& Hall/CRC Computational Science, (CRC Press: 2010) Pages: 273--296 doi: 10.1201/b10509-14

K Datta, S Williams, V Volkov, J Carter, L Oliker, J Shalf, K Yelick, "Auto-tuning stencil computations on multicore and accelerators", Scientific Computing with Multicore and Accelerators, ( 2010) Pages: 219--254 doi: 10.1201/b10376

John Shalf, Donofrio, Rowen, Oliker, Michael F. Wehner, "Green Flash: Climate Machine (LBNL)", Encyclopedia of Parallel Computing, (Springer: 2010) Pages: 809-819

Green Flash is a research project focused on an application-driven manycore chip design that leverages commodity-embedded circuit designs and hardware/software codesign processes to create a highly programmable and energy-efficient HPC design. The project demonstrates how a multidisciplinary hardware/software codesign process that facilitates close interactions between applications scientists, computer scientists, and hardware engineers can be used to develop a system tailored for the requirements of scientific computing.

L. Oliker, A. Canning, J. Carter, C. Iancu, M. Lijewski, S. Kamil, J. Shalf, H. Shan, E. Strohmaier, S. Ethier, T. Goodale, "Performance Characteristics of Potential Petascale Scientific Applications", Petascale Computing: Algorithms and Applications. Chapman & Hall/CRC Computational Science Series (Hardcover), edited by David A. Bader, ( 2007)

Chapter

J. Shalf, L. Oliker, M. Lijewski, S. Kamil, J. Carter, A. Canning, S. Ethier, "Performance Characteristics of Potential Petascale Scientific Applications", Chapman & Hall/CRC Computational Science, (CRC Press: 2007) Pages: 1

Download File: CactusGRB.pdf (pdf: 712 KB)

Book Chapter

Presentation/Talks

George Michelogiannakis, John Shalf, Chiplets for HPC, OCP Summit, February 6, 2024,

Download File: georgem_hpc.pptx.pdf (pdf: 6.3 MB)

John Shalf, George Michelogiannakis, Heterogeneous Integration for HPC, OCP global summit, October 19, 2022,

Download File: chiplets_2022.pdf (pdf: 1.2 MB)

George Michelogiannakis, Madeleine Glick, John Shalf, Keren Bergman, Photonics as a Means to Implement Intra-rack Resource Disaggregation, SPIE photonics west, March 2022,

George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman, Maximizing The Impact of Emerging Photonic Switches At The System Level, SPIE photonics west, March 2021,

Download File: photonics-west-2021.pdf (pdf: 770 KB)

George Michelogiannakis, John Shalf, Benjamin Aivazi, Yiwen Shen, Keren Bergman, Madeleine Glick, Larry Dennison, Architectural Opportunities and Challenges from Emerging Photonics in Future Systems, IEEE conference on Photonics in Switching and Computing (PSC), September 2018,

Download File: psc2018.pdf (pdf: 2.7 MB)

George Michelogiannakis, David Donofrio, John Shalf, Modeling of Novel Transistors, Manufacturing Technologies, and Architectures to Preserve Digital Computing Performance Scaling, Post-Moore's Era Supercomputing (PMES) Workshop, November 2016,

Download File: PMES-2016.pptx (pptx: 4.5 MB)

Didem Unat, George Michelogiannakis, John Shalf, The Role of Modeling in Locality Optimizations, Modeling and simulation workshop (MODSIM), August 2014,

Download File: modsim2014.pdf (pdf: 1.6 MB)

John Shalf, Erik Schnetter, Gabrielle Allen, Edward Seidel, Cactus and the Role of Frameworks in Complex Multiphysics HPC Applications, 2009,

John Shalf, Auto-Tuning: The Big Questions (Panel), 2009,

Download File: AutoTuningCSCADS09.pdf (pdf: 960 KB)

John Shalf, David Donofrio, Green Flash: Extreme Scale Computing on a Petascale Budget, 2009,

Download File: SalishanGreenFlashShalf.pdf (pdf: 12 MB)

John Shalf, Challenges of Energy Efficient Scientific Computing, 2009,

John Shalf, Harvey Wasserman, Breakthrough Computing in Petascale Applications and Petascale System Examples at NERSC, 2009,

Download File: HPC-UF09-NERSC.pdf (pdf: 9.3 MB)

John Shalf, Satoshi Matsuoka, IESP Power Efficiency Research Priorities, 2009,

Download File: PowerEfficiency-iesp-roadmap-v5b.ppt (ppt: 554 KB)

Kamesh Madduri, Williams, Ethier, Oliker, Shalf, Strohmaier, Katherine A. Yelick, Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2009,

Download File: siampp10-gtc-talk.pdf (pdf: 2.7 MB)
Download File: siampp10-gtc-talk.pptx (pptx: 1.3 MB)

S. Williams, et al., The Roofline Model: A Pedagogical Tool for Auto-tuning Kernels on Multicore Architectures, Hot Chips 20, August 10, 2008,

Download File: hotchips08-roofline-talk.pdf (pdf: 8 MB)

John Shalf, Honzhan Shan, Katie Antypas, I/O Requirements for HPC Applications, 2008,

Download File: ShalfFAST08.pdf (pdf: 15 MB)

John Shalf, NERSC User IO Cases, 2008,

Download File: NERSCUserIOcases.ppt (ppt: 1.9 MB)

Antypas, K. Shalf, J., and Wasserman, H., Recent Workload Characterization Activities at NERSC, 2008,

Download File: NERSCWorkloadSantaFe.pdf (pdf: 8.2 MB)

John Shalf, Neuroinformatics Congress: Future Hardware Challenges for Scientific Computing, 2008,

Download File: ShalfNeuroinformaticsHardwareChallenges.pdf (pdf: 6.8 MB)

M. Wehner, L. Oliker, J. Shalf, Ultra-Efficient Exascale Scientific Computing, 2008,

Download File: ASCAC08-final.ppt (ppt: 5.4 MB)

L. Oliker, J. Shalf, M. Wehner, Climate Modeling at the Petaflop Scale using Semi-Custom Computing, SIAM Conference on Computational Science and Engineering, 2007,

John Shalf, Landscape of Computing Architecture: Introduction to the "Berkeley View, 2007,

John Shalf, About Memory Bandwidth and Multicore, 2007,

Download File: SOS11memShalf.pdf (pdf: 7 MB)

John Shalf, The Landscape of Parallel Computing Architecture., 2007,

Download File: ShalfSciDAC2007.pdf (pdf: 6.7 MB)

John Shalf, Overturning the Conventional Wisdom for the Multicore Era: Everything You Know is Wrong, 2007,

John Shalf, Honzhang Shan, User Perspective on HPC I/O Requirements., 2007,

Download File: ShalfExascaleIO.ppt (ppt: 3.3 MB)

John Shalf, NERSC Workload Analysis, 2007,

Download File: ShalfNUG2006Workload.pdf (pdf: 3.7 MB)

John Shalf, NERSC Power Efficiency Analysis., 2007,

Download File: ShalfNUG2006Power.pdf (pdf: 5 MB)

John Shalf, Memory Subsystem Performance and QuadCore Predictions, 2007,

Download File: ShalfNUG2006QuadCore.pdf (pdf: 4.4 MB)

John Shalf, Shoaib Kamil, David Skinner, Leonid Oliker, Interconnect Requirements for HPC Applications, 2007,

Download File: IPMfinalBrocade.ppt (ppt: 13 MB)

John Shalf, Shoaib Kamil, David Bailey, Erich Strohmaier, Power Efficiency and the Top500, 2007,

Download File: Top500PowerNov14SC07.pdf (pdf: 3.8 MB)

John Shalf, Power, Cooling, and Energy Consumption for the Petascale and Beyond., 2007,

Download File: SC07PowerJSfinal.pdf (pdf: 7.1 MB)

John Shalf, Petascale Computing Application Challenges., 2007,

Download File: PetascaleAppsSC07v2.pdf (pdf: 11 MB)

Leonid Oliker, Julian Borrill, Hongzhang Shan, John Shalf, Investigation Of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark., 2007,

Download File: SC07-MadBench-talk.ppt (ppt: 2.7 MB)

Shoaib Kamil, John Shalf, Power Efficiency Metrics for the Top500, 2007,

Download File: ISCTop500Power.pdf (pdf: 473 KB)

John Shalf, David Bailey, Top500 Power Efficiency, 2006,

Download File: Top500PowerEff.pdf (pdf: 7.1 MB)

Reports

George Michelogiannakis, John Shalf, David Donofrio, John Bachan,, "Continuing the Scaling of Digital Computing Post Moore’s Law", LBNL report, April 2016, LBNL 1005126,

The approaching end of traditional CMOS technology scaling that up until now followed Moore's law is coming to an end in the next decade. However, the DOE has come to depend on the rapid, predictable, and cheap scaling of computing performance to meet mission needs for scientific theory, large scale experiments, and national security. Moving forward, performance scaling of digital computing will need to originate from energy and cost reductions that are a result of novel architectures, devices, manufacturing technologies, and programming models. The deeper issue presented by these changes is the threat to DOE’s mission and to the future economic growth of the U.S. computing industry and to society as a whole. With the impending end of Moore’s law, it is imperative for the Office of Advanced Scientific Computing Research (ASCR) to develop a balanced research agenda to assess the viability of novel semiconductor technologies and navigate the ensuing challenges. This report identifies four areas and research directions for ASCR and how each can be used to preserve performance scaling of digital computing beyond exascale and after Moore's law ends.

Mark F. Adams, Jed Brown, John Shalf, Brian Van Straalen, Erich Strohmaier, Samuel Williams, "HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems", LBNL Technical Report, 2014, LBNL 6630E,

Download File: hpgmg.pdf (pdf: 183 KB)

Adrian Tate, Amir Kamil, Anshu Dubey, Armin Größlinger, Brad Chamberlain, Brice Goglin, Carter Edwards, Chris J. Newburn, David Padua, Didem Unat, Emmanuel Jeannot, Frank Hannig, Gysi Tobias, Hatem Ltaief, James Sexton, Jesus Labarta, John Shalf, Karl Fuerlinger, Kathryn O’Brien, Leonidas Linardakis, Maciej Besta, Marie-Christine Sawley, Mark Abraham, Mauro Bianco, Miquel Pericàs, Naoya Maruyama, Paul Kelly, Peter Messmer, Robert B. Ross, Romain Cledat, Satoshi Matsuoka, Thomas Schulthess, Torsten Hoefler, Vitus Leung, "Programming Abstractions for Data Locality", 2014 Workshop on Programming Abstractions for Data Locality, April 29, 2014, doi: 10.2172/1172915

The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.

Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehen- sive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to en- able performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal.

Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal.

Samuel Williams, Dhiraj D. Kalamkar, Amik Singh, Anand M. Deshpande, Brian Van Straalen, Mikhail Smelyanskiy,
Ann Almgren, Pradeep Dubey, John Shalf, Leonid Oliker, "Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark", December 2012, LBNL 6676E,

Download File: miniGMGLBNL-6676E.pdf (pdf: 906 KB)

M. Christen, N. Keen, T. Ligocki, L. Oliker, J. Shalf, B. van Straalen, S. Williams, "Automatic Thread-Level Parallelization in the Chombo AMR Library", LBNL Technical Report, 2011, LBNL 5109E,

S. Amarasinghe, D. Campbell, W. Carlson, A. Chien, W. Dally, E. Elnohazy, M. Hall, R. Harrison, W. Harrod, K. Hill, J. Hiller, S. Karp, C. Koelbel, D. Koester, P. Kogge, J. Levesque, D. Reed, V. Sarkar, R. Schreiber, M. Richards, A. Scarpelli, J. Shalf , A. Snavely, T. Sterling, "ExaScale Software Study: Software Challenges in Extreme Scale Systems", 2009,

Download File: ECSS-report-101909.pdf (pdf: 6.9 MB)

John Shalf, Thomas Sterling, "Operating Systems For Exascale Computing", 2009,

Download File: ExascaleOSIESP.pdf (pdf: 156 KB)

Gabrielle Allen (LSU/CCT), Gene Allen (MSC Inc.), Kenneth Alvin (SNL), Matt Drahzal (IBM), David Fisher (DoD-Mod), Robert Graybill (USC/ISI), Bob Lucas (USC/ISI), Tim Mattson (Intel), Hal Morgan (SNL), Erik Schnetter (LSU/CCT), Brian Schott (USC/ISI), Edward Seidel (LSU/CCT), John Shalf (LBNL/NERSC), Shawn Shamsian (MSC Inc.), David Skinner (LBNL/NERSC), Siu Tong (Engeneous) (2008), "Frameworks for Multiphysics Simulation : HPC Application Software Consortium Summit Concept Paper.", 2008,

Download File: hpcascdraft20080314.pdf (pdf: 658 KB)

Antypas, K., Shalf, J., and Wasserman, H., "NERSC-6 Workload Analysis and Benchmark Selection Process", 2008, LBNL 1014E,

Download File: NERSCWorkload.pdf (pdf: 5 MB)

J. Levesque, J. Larkin, M. Foster, J. Glenski, G. Geissler, S. Whalen, B. Waldecker, J. Carter, D. Skinner, Y. He, H. Wasserman, J. Shalf, H. Shan, E. Strohmaier, "Understanding and Mitigating Multicore Performance Issues on the AMD Opteron Architecture", 2007, LBNL 62500,

Download File: LBNL-62500.v3.pdf (pdf: 2.4 MB)

Shoaib Kamil, John Shalf, "Measuring Power Efficiency of NERSC's Newest Flagship Machine", 2007,

Download File: powereffreportxt4.pdf (pdf: 312 KB)

K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, K. Yelick, "The Landscape of Parallel Computing Research: A View from Berkeley", EECS Technical Report, December 2006,

Hongzhang Shan, John Shalf, "Analysis of Parallel IO on Modern HPC Platforms", 2006,

Download File: IOR.doc (doc: 399 KB)

Analysis of the parallel IO requirements from a number of HPC applications, combined with microbenchmarks to aid in understanding their performance.

W. Kramer, J. Carter, D. Skinner, L. Oliker, P. Husbands, P. Hargrove, J. Shalf, O. Marques, E. Ng, A. Drummond, K. Yelick, "Software Roadmap to Plug and Play Petaflop/s", 2006,

S. Williams, J. Shalf, L. Oliker, P. Husbands, K. Yelick, "Dense and Sparse Matrix Operations on the Cell Processor", LBNL Technical Report, 2005,

Ryne, R., Abell, D., Adelmann, A., Admundson, J., Bohn, C., Cary, J., Colella, P., Dechow, D., Decyk, V., Dragt, A., Gerber, R., Habib, S., Higdon, D., Katsouleas, T., Ta, K.L., McCorquodale, P., Mihalcea, D., Mitchell, C., Mori, W., Mottershead, C.T., Neri, F., Pogorelov, I., Quiang, J., Samulyak, R., Serafini, D., Shalf, J., Siegerist, C., Spentzouris, P., Stoltz, P., Terzic, B., Venturini, M., Walstrom, P., "SciDAC Advances and Applications in Computational Beam Dynamics", June 2005, LBNL 58243,

Download File: LBNL-58243.pdf (pdf: 219 KB)

John Shalf, John Bell, Andrew Canning, Lin-Wang Wang, Juan Meza, Rob Ryne, Ji Qiang, Kathy Yelick, "Berkeley Petascale Applications", 2005,

Download File: BerkeleyPetascaleApps.doc (doc: 45 KB)

Simon, H., Kramer, W., Saphir, W., Shalf, J., Bailey, D., Oliker, L., Banda, M., McCurdy, C.W., Hules, J., Canning, A., Day, M., Colella, P., Serafini, D., Wehner, M., Nugent, P., "National Facility for Advanced Computational Science: A Sustainable Path to Scientific Discovery", April 2004, LBNL 5500,

Download File: PUB-5500.pdf (pdf: 1.8 MB)

Posters

David Donofrio, Leonid Oliker, John Shalf, Michael F. Wehner, Daniel Burke, John Wawrzynek, "Project Green Flash---Design and Emulate A Low-‐Power CPU for a New Climate-‐Modeling Supercomputer", Design Automation Conference (DAC47), 2010,

S. Williams, J. Carter, J. Demmel, L. Oliker, D. Patterson, J. Shalf, K. Yelick, R. Vuduc, "Autotuning Scientific Kernels on Multicore Systems", ASCR PI Meeting, 2008,

Download File: ascrpi08-autotuning-poster.pdf (pdf: 2.2 MB)

Others

Luke Gosink, John Shalf, Kurt Stockinger, Kesheng Wu, Wes Bethel, HDF5-FastQuery: Accelerating complex queries on HDF datasets using fast bitmap indices, 18th International Conference on Scientific and Statistical Database Management (SSDBM 06), Pages: 149--158 2006,

E. Wes Bethel, Scott Campbell, Eli Dart, John Shalf, Kurt Stockinger, Kesheng Wu, High Performance Visualization using Query-Driven and Analytics, 2006,

Kurt Stockinger, John Shalf, Wes Bethel, Kesheng Wu, DEX: Increasing the Capability of Scientific Data Analysis by Using Efficient Bitmap Indices to Accelerate Scientific Visualization, SSDBM, Pages: 35-44 2005,

E. Wes Bethel, Greg Abram, John Shalf, Randall Frank, Jim Ahrens, Steve Parker, N. Samatova, Mark Miller, Interoperability of Visualization Software and Data Models is NOT an Achievable Goal, IEEE Visualization, Pages: 607-610 2003,

T. J. Jankun-Kelly, Kreylos, Ma, Hamann, I. Joy, John Shalf, E. Wes Bethel, Deploying Web-Based Visual Exploration Tools on the Grid, IEEE Computer Graphics and Applications, Pages: 40-50 2003,