Recent Publications
Deborah Agarwal
2023
Hector G. Martin, Tijana Radivojevic, Jeremy Zucker, Kristofer Bouchard, Jess Sustarich, Sean Peisert, Dan Arnold, Nathan Hillson, Gyorgy Babnigg, Jose M. Marti, Christopher J. Mungall, Gregg T. Beckham, Lucas Waldburger, James Carothers, ShivShankar Sundaram, Deb Agarwal, Blake A. Simmons, Tyler Backman, Deepanwita Banerjee, Deepti Tanjore, Lavanya Ramakrishnan, Anup Singh, "Perspectives for Self-Driving Labs in Synthetic Biology", Current Opinion in Biotechnology, February 2023, doi: 10.1016/j.copbio.2022.102881
2022
MB Simmonds, WJ Riley, DA Agarwal, X Chen, S Cholia, R Crystal-Ornelas, ET Coon, D Dwivedi, VC Hendrix, M Huang, A Jan, Z Kakalia, J Kumar, CD Koven, L Li, M Melara, L Ramakrishnan, DM Ricciuto, AP Walker, W Zhi, Q Zhu, C Varadharajan, Guidelines for Publicly Archiving Terrestrial Model Data to Enhance Usability, Intercomparison, and Synthesis, Data Science Journal, 2022, doi: 10.5334/dsj-2022-003
C Varadharajan, VC Hendrix, DS Christianson, M Burrus, C Wong, SS Hubbard, DA Agarwal, BASIN-3D: A brokering framework to integrate diverse environmental data, Computers and Geosciences, 2022, doi: 10.1016/j.cageo.2021.105024
B Faybishenko, R Versteeg, G Pastorello, D Dwivedi, C Varadharajan, D Agarwal, Challenging problems of quality assurance and quality control (QA/QC) of meteorological time series data, Stochastic Environmental Research and Risk Assessment, Pages: 1049--1062 2022, doi: 10.1007/s00477-021-02106-w
F Molz, B Faybishenko, D Agarwal, A broad exploration of nonlinear dynamics in microbial systems motivated by chemostat experiments producing deterministic chaos., 2022,
2021
C Varadharajan, Z Kakalia, E Alper, EL Brodie, M Burrus, RWH Carroll, D Christianson, W Dong, V Hendrix, M Henderson, S Hubbard, D Johnson, R Versteeg, KH Williams, DA Agarwal, The Colorado East River Community Observatory Data Collection, Hydrological Processes 35(6), 2021, doi: 10.22541/au.161962485.54378235/v1
D. A. Agarwal, J. Damerow, C. Varadharajan, D. S. Christianson, G. Z. Pastorello, Y.-W. Cheah, L. Ramakrishnan, "Balancing the needs of consumers and producers for scientific data collections", Ecological Informatics, 2021, 62:101251, doi: 10.1016/j.ecoinf.2021.101251
JE Damerow, C Varadharajan, K Boye, EL Brodie, M Burrus, KD Chadwick, R Crystal-Ornelas, H Elbashandy, RJ Eloy Alves, KS Ely, AE Goldman, T Haberman, V Hendrix, Z Kakalia, KM Kemner, AB Kersting, N Merino, F O Brien, Z Perzan, E Robles, P Sorensen, JC Stegen, RL Walls, P Weisenhorn, M Zavarin, D Agarwal, Sample identifiers and metadata to support data management and reuse in multidisciplinary ecosystem sciences, Data Science Journal, 2021, doi: 10.5334/dsj-2021-011
J Müller, B Faybishenko, D Agarwal, S Bailey, C Jiang, Y Ryu, C Tull, L Ramakrishnan, Assessing data change in scientific datasets, Concurrency and Computation: Practice and Experience, 2021, doi: 10.1002/cpe.6245
SL Brantley, T Wen, DA Agarwal, JG Catalano, PA Schroeder, K Lehnert, C Varadharajan, J Pett-Ridge, M Engle, AM Castronova, RP Hooper, X Ma, L Jin, K McHenry, E Aronson, AR Shaughnessy, LA Derry, J Richardson, J Bales, EM Pierce, The future low-temperature geochemical data-scape as envisioned by the U.S. geochemical community, Computers and Geosciences, 2021, doi: 10.1016/j.cageo.2021.104933
Venkatesh Akella
2022
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power, "SoK: Limitations of Confidential Computing via TEEs for High-Performance Compute Systems", Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), September 2022,
2021
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power,, "Enabling Design Space Exploration for RISC-V Secure Compute Environments", Proceedings of the Fifth Workshop on Computer Architecture Research with RISC-V (CARRV), (co-located with ISCA 2021), June 17, 2021,
Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert, "Performance Analysis of Scientific Computing Workloads on General Purpose TEEs", Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE, May 2021, doi: 10.1109/IPDPS49936.2021.00115
Ayaz Akram
2022
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power, "SoK: Limitations of Confidential Computing via TEEs for High-Performance Compute Systems", Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), September 2022,
2021
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power,, "Enabling Design Space Exploration for RISC-V Secure Compute Environments", Proceedings of the Fifth Workshop on Computer Architecture Research with RISC-V (CARRV), (co-located with ISCA 2021), June 17, 2021,
Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert, "Performance Analysis of Scientific Computing Workloads on General Purpose TEEs", Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE, May 2021, doi: 10.1109/IPDPS49936.2021.00115
Ann S. Almgren
2021
Jean Sexton, Zarija Lukic, Ann Almgren, Chris Daley, Brian Friesen, Andrew Myers, and Weiqun Zhang, "Nyx: A Massively Parallel AMR Code for Computational Cosmology", The Journal Of Open Source Software, July 10, 2021,
Weiqun Zhang, Andrew Myers, Kevin Gott, Ann Almgren and John Bell, "AMReX: Block-Structured Adaptive Mesh Refinement for Multiphysics Applications", The International Journal of High Performance Computing Applications, June 12, 2021,
Jordan Musser, Ann S Almgren, William D Fullmer, Oscar Antepara, John B Bell, Johannes Blaschke, Kevin Gott, Andrew Myers, Roberto Porcu, Deepak Rangarajan, Michele Rosso, Weiqun Zhang, and Madhava Syamlal, "MFIX:Exa: A Path Towards Exascale CFD-DEM Simulations", The International Journal of High Performance Computing Applications, April 16, 2021,
Oluwamayowa O Amusat
2023
Mohammed A. Alhussaini, Zachary M. Binger, Bianca M. Souza-Chaves, Oluwamayowa O. Amusat, Jangho Park, Timothy V. Bartholomew, Dan Gunter, Andrea Achilli, "Analysis of backwash settings to maximize net water production in an engineering-scale ultrafiltration system for water reuse", Journal of Water Process Engineering, 2023, 53, doi: 10.1016/j.jwpe.2023.103761
2022
Oluwamayowa O. Amusat, Tim Barthlomew, Adam A. Atia, Cost optimization of desalination systems using WaterTAP incorporating detailed water chemistry models, 2022 INFORMS Annual Meeting, 2022,
2021
Dan Gunter, Oluwamayowa Amusat, Tim Bartholomew, Markus Drouven, "Santa Barbara Desalination Digital Twin Technical Report", LBNL Technical Report, 2021, LBNL LBNL-2001437,
Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy, Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, "Experiences with Reproducibility: Case Studies from Scientific Workflows", (P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems, ACM, June 21, 2021, doi: 10.1145/3456287.3465478
Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.
Matthew Avaylon
2022
M. Avaylon, R. Sadre, Z. Bai, T. Perciano, "Adaptable Deep Learning and Probabilistic Graphical Model System for Semantic Segmentation", Advances in Artificial Intelligence and Machine Learnin, March 31, 2022, 2:288--302, doi: 10.54364/AAIML.2022.1119
Venkitesh Ayyar
2022
Venkitesh Ayyar, Robert Knop, Autumn Awbrey, Alexis Andersen, Peter Nugent, "Identifying Transient Candidates in the Dark Energy Survey Using Convolutional Neural Networks", Publications of the Astronomical Society of the Pacific, September 2022, 134:094501,
The ability to discover new transient candidates via image differencing without direct human intervention is an important task in observational astronomy. For these kind of image classification problems, machine learning techniques such as Convolutional Neural Networks (CNNs) have shown remarkable success. In this work, we present the results of an automated transient candidate identification on images with CNNs for an extant data set from the Dark Energy Survey Supernova program, whose main focus was on using Type Ia supernovae for cosmology. By performing an architecture search of CNNs, we identify networks that efficiently select non-artifacts (e.g., supernovae, variable stars, AGN, etc.) from artifacts (image defects, mis-subtractions, etc.), achieving the efficiency of previous work performed with random Forests, without the need to expend any effort in feature identification. The CNNs also help us identify a subset of mislabeled images. Performing a relabeling of the images in this subset, the resulting classification with CNNs is significantly better than previous results, lowering the false positive rate by 27% at a fixed missed detection rate of 0.05.
Ariful Azad
2021
Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad, "Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale", 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021, doi: 10.1109/IPDPS49936.2021.00018
John Bachan
2023
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2022
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Maximilian Bremer, John Bachan, Cy Chan, Clint Dawson, "Adaptive total variation stable local timestepping for conservation laws", Journal of Computational Physics, April 21, 2022,
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2021
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Maximilian Bremer, John Bachan, Cy Chan, and Clint Dawson, "Speculative Parallel Execution for Local Timestepping", 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, May 21, 2021,
Scott Baden
2023
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2022
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2021
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Zhe Bai
2022
Gregory Wallace, Zhe Bai, Robbie Sadre, Talita Perciano, Nicola Bertelli, Syun'ichi Shiraiwa, Wes Bethel, John Wright, "Towards fast and accurate predictions of radio frequency power deposition and current profile via data-driven modelling: applications to lower hybrid current drive", Journal of Plasma Physics, August 18, 2022, 88:895880401, doi: 10.1017/S0022377822000708
M. Avaylon, R. Sadre, Z. Bai, T. Perciano, "Adaptable Deep Learning and Probabilistic Graphical Model System for Semantic Segmentation", Advances in Artificial Intelligence and Machine Learnin, March 31, 2022, 2:288--302, doi: 10.54364/AAIML.2022.1119
2021
Zhe Bai, Liqian Peng, "Non-intrusive nonlinear model reduction via machine learning approximations to low-dimensional operators", Advanced Modeling and Simulation in Engineering Sciences, 2021, 8:28, doi: 10.1186/s40323-021-00213-5
Meriam Gay Bautista
2023
Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Gay Bautista, George Michelogiannakis, "PaST-NoC: A Packet-Switched Superconducting Temporal NoC", IEEE Transactions on Applied Superconductivity, January 2023,
2022
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-Flux Shift Register for Race Logic and Its Applications", IEEE Transactions on Circuits and Systems I: Regular Papers, October 2022,
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, Kylie Huch, George Michelogiannakis, "Superconducting Digital DIT Butterfly Unit for Fast Fourier Transform Using Race Logic", 2022 20th IEEE Interregional NEWCAS Conference (NEWCAS), IEEE, June 2022, 441-445,
Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, George Michelogiannakis, "Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators", 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), ACM, February 2022,
- Download File: asplos2022.pdf (pdf: 1.9 MB)
2021
Meriam Gay Bautista, Zhi Jackie Yao, Anastasiia Butko, Mariam Kiran, Mekena Metcalf, "Towards Automated Superconducting Circuit Calibration using Deep Reinforcement Learning", 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Tampa, FL, USA, IEEE, August 23, 2021, pp. 462-46, doi: 10.1109/ISVLSI51109.2021.00091
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-flux Shift Buffer for Race Logic", 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), August 2021,
George Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Bautista, Dilip Vasudevan, Anastasiia Butko, "SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC", IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021,
John B. Bell
2023
I. Srivastava, D. R. Ladiges, A. Nonaka, A. L. Garcia, J. B. Bell, "Staggered Scheme for the Compressible Fluctuating Hydrodynamics of Multispecies Fluid Mixtures", Physical Review E, January 24, 2023, 107:015305, doi: 10.1103/PhysRevE.107.015305
2022
D. R. Ladiges, J. G. Wang, I. Srivastava, S. P. Carney, A. Nonaka, A. L. Garcia, A. Donev, J. B. Bell, "Modeling Electrokinetic Flows with the Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm", Physical Review E, November 19, 2022, 106:035104, doi: 10.1103/PhysRevE.106.035104
2021
Weiqun Zhang, Andrew Myers, Kevin Gott, Ann Almgren and John Bell, "AMReX: Block-Structured Adaptive Mesh Refinement for Multiphysics Applications", The International Journal of High Performance Computing Applications, June 12, 2021,
Jordan Musser, Ann S Almgren, William D Fullmer, Oscar Antepara, John B Bell, Johannes Blaschke, Kevin Gott, Andrew Myers, Roberto Porcu, Deepak Rangarajan, Michele Rosso, Weiqun Zhang, and Madhava Syamlal, "MFIX:Exa: A Path Towards Exascale CFD-DEM Simulations", The International Journal of High Performance Computing Applications, April 16, 2021,
Daniel R. Ladiges, Sean P. Carney, Andrew Nonaka, Katherine Klymko, Guy C. Moore, Alejandro L. Garcia, Sachin R. Natesh, Aleksandar Donev, John B. Bell, "A Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm for Modeling Electrolytes", Physical Review Fluids, April 1, 2021, 6(4):044309,
Julian Bellavita
2023
Julian Bellavita, Mathias Jacquelin, Esmond G. Ng, Dan Bonachea, Johnny Corbino, Paul H. Hargrove, "symPACK: A GPU-Capable Fan-Out Sparse Cholesky Solver", 2023 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM'23), ACM, November 13, 2023, doi: 10.25344/S49P45
Sparse symmetric positive definite systems of equations are ubiquitous in scientific workloads and applications. Parallel sparse Cholesky factorization is the method of choice for solving such linear systems. Therefore, the development of parallel sparse Cholesky codes that can efficiently run on today’s large-scale heterogeneous distributed-memory platforms is of vital importance. Modern supercomputers offer nodes that contain a mix of CPUs and GPUs. To fully utilize the computing power of these nodes, scientific codes must be adapted to offload expensive computations to GPUs.
We present symPACK, a GPU-capable parallel sparse Cholesky solver that uses one-sided communication primitives and remote procedure calls provided by the UPC++ library. We also utilize the UPC++ "memory kinds" feature to enable efficient communication of GPU-resident data. We show that on a number of large problems, symPACK outperforms comparable state-of-the-art GPU-capable Cholesky factorization codes by up to 14x on the NERSC Perlmutter supercomputer.
2022
Julian Bellavita, Alex Sim (advisor), John Wu (advisor), "Predicting Scientific Dataset Popularity Using dCache Logs", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC), Second place winner, 2022,
The dCache installation is a storage management system that acts as a disk cache for high-energy physics (HEP) data. Storagespace on dCache is limited relative to persistent storage devices, therefore, a heuristic is needed to determine what data should be kept in the cache. A good cache policy would keep frequently accessed data in the cache, but this requires knowledge of future dataset popularity. We present methods for forecasting the number of times a dataset stored on dCache will be accessed in the future. We present a deep neural network that can predict future dataset accesses accurately, reporting a final normalized loss of 4.6e-8. We present a set of algorithms that can forecast future dataset accesses given an access sequence. Included are two novel algorithms, Backup Predictor and Last N Successors, that outperform other file prediction algorithms. Findings suggest that it is possible to anticipate dataset popularity in advance.
J. Bellavita, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, "Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches", 5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534111
E. Wes Bethel
2022
Gregory Wallace, Zhe Bai, Robbie Sadre, Talita Perciano, Nicola Bertelli, Syun'ichi Shiraiwa, Wes Bethel, John Wright, "Towards fast and accurate predictions of radio frequency power deposition and current profile via data-driven modelling: applications to lower hybrid current drive", Journal of Plasma Physics, August 18, 2022, 88:895880401, doi: 10.1017/S0022377822000708
M. G. Amankwah, D. Camps, E. W. Bethel, R. Van Beeumen, T. Perciano, "Quantum pixel representations and compression for N-dimensional images", Nature Scientific Reports, May 11, 2022, 12:7712, doi: 10.1038/s41598-022-11024-y
S. Zhang, R. Sadre, B. A. Legg, H. Pyles, T. Perciano, E. W. Bethel, D. Baker, O. Rübel, J. J. D. Yoreo, "Rotational dynamics and transition mechanisms of surface-adsorbed proteins", Proceedings of the National Academy of Sciences, April 11, 2022, 119:e202024211, doi: 10.1073/pnas.2020242119
E. Wes Bethel, Burlen Loring, Utkarsh Ayachit, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, Dave Pugmire, Silvio Rizzi, Thompson, Will Usher, Gunther H. Weber, Brad Whitlock, Wolf, Kesheng Wu, "Proximity Portability and In Transit, M-to-N Data Partitioning and Movement in SENSEI", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_20
E. Wes Bethel, Burlen Loring, Utkarsh Ayatchit, David Camp, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, David Pugmire, Silvio Rizzi, Thompson, Gunther H. Weber, Brad Whitlock, Matthew Wolf, Kesheng Wu, "The SENSEI Generic In Situ Interface: Tool and Processing Portability at Scale", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_13
2021
E. W. Bethel, C. Heinemann, and T. Perciano, "Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel", Eurographics Symposium on Parallel Graphics and Visualization, June 14, 2021,
Jean Luca Bez
2023
Bin Dong, Jean Luca Bez, Suren Byna, "AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis.", In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’23), June 16, 2023,
- Download File: IODiagnose-final.pdf (pdf: 1.9 MB)
Hammad Ather, Jean Luca Bez, Boyana Norris, Suren Byna, "Illuminating the I/O Optimization Path of Scientific Applications", High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings, Hamburg, Germany, Springer-Verlag, May 21, 2023, 22–41, doi: https://doi.org/10.1007/978-3-031-32041-5_2
The existing parallel I/O stack is complex and difficult to tune due to the interdependencies among multiple factors that impact the performance of data movement between storage and compute systems. When performance is slower than expected, end-users, developers, and system administrators rely on I/O profiling and tracing information to pinpoint the root causes of inefficiencies. Despite having numerous tools that collect I/O metrics on production systems, it is not obvious where the I/O bottlenecks are (unless one is an I/O expert), their root causes, and what to do to solve them. Hence, there is a gap between the currently available metrics, the issues they represent, and the application of optimizations that would mitigate performance slowdowns. An I/O specialist often checks for common problems before diving into the specifics of each application and workload. Streamlining such analysis, investigation, and recommendations could close this gap without requiring a specialist to intervene in every case. In this paper, we propose a novel interactive, user-oriented visualization, and analysis framework, called Drishti. This framework helps users to pinpoint various root causes of I/O performance problems and to provide a set of actionable recommendations for improving performance based on the observed characteristics of an application. We evaluate the applicability and correctness of Drishti using four use cases from distinct science domains and demonstrate its value to end-users, developers, and system administrators when seeking to improve an application’s I/O performance.
2022
Jean Luca Bez, Visualizing I/O bottlenecks with DXT Explorer 2.0, Analyzing Parallel I/O (BoF) is held in conjunction with SC22, 2022,
Jean Luca Bez, Hammad Ather, Suren Byna, "Drishti: Guiding End-Users in the I/O Optimization Journey", PDSW 2022, held in conjunction with SC22, 2022,
Jean Luca Bez, Where's the Bottleneck?, Berkeley Lab Research SLAM, October 7, 2022,
Jean Luca Bez, Suren Byna, April 2019 Darshan counters from the Cori supercomputer [Data set], Zenodo, 2022, doi: 10.5281/zenodo.6476501
Jean Luca Bez, Ahmad Maroof Karimi, Arnab K. Paul, Bing Xie, Suren Byna, Philip Carns, Sarp Oral, Feiyi Wang, Jesse Hanley, "Access Patterns and Performance Behaviors of Multi-layer Supercomputer I/O Subsystems under Production Load", 31st International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC '22), Association for Computing Machinery, June 27, 2022, 43–55, doi: 10.1145/3502181.3531461
Jean Luca Bez, Suren Byna, Understanding I/O Behavior with Interactive Darshan Log Analysis, Exascale Computing Project (ECP) Community Days BoF, 2022,
Jean Luca Bez, Towards Understanding I/O Behavior with Interactive Exploration, Berkeley Lab’s Computing Sciences Area 2022 Postdoc Symposium, 2022,
2021
André Ramos Carneiro, Jean Luca Bez, Carla Osthoff, Lucas Mello Schnorr, Phillipe Olivier Alexandre Navaux, "HPC Data Storage at a Glance: The Santos Dumont Experience", IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, November 26, 2021, 157-166, doi: 10.1109/SBAC-PAD53543.2021.00027
Jean Luca Bez, Visualizing Darshan Extended Traces, Analyzing Parallel I/O (BoF) is held in conjunction with SC21, 2021,
Jean Luca Bez, Houjun Tang, Bing Xie, David Williams-Young, Rob Latham, Rob Ross, Sarp Oral, Suren Byna, "I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis", 2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW), January 1, 2021, 15-22, doi: 10.1109/PDSW54622.2021.00008
Tonglin Li, Suren Byna, Quincey Koziol, Houjun Tang, Jean Luca Bez, Qiao Kang, "h5bench: HDF5 I/O Kernel Suite for Exercising HPC I/O Patterns", Cray User Group (CUG) 2021, January 1, 2021,
Ludovico Bianchi
2021
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, "Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows", 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), November 15, 2021, doi: 10.1109/WORKS54523.2021.00014
Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Michael Beach, Drew Paine, Lavanya Ramakrishnan, "Science Capsule - Capturing the Data Life Cycle", Journal of Open Source Software, 2021, 6:2484, doi: 10.21105/joss.02484
Dan Bonachea
2023
Julian Bellavita, Mathias Jacquelin, Esmond G. Ng, Dan Bonachea, Johnny Corbino, Paul H. Hargrove, "symPACK: A GPU-Capable Fan-Out Sparse Cholesky Solver", 2023 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM'23), ACM, November 13, 2023, doi: 10.25344/S49P45
Sparse symmetric positive definite systems of equations are ubiquitous in scientific workloads and applications. Parallel sparse Cholesky factorization is the method of choice for solving such linear systems. Therefore, the development of parallel sparse Cholesky codes that can efficiently run on today’s large-scale heterogeneous distributed-memory platforms is of vital importance. Modern supercomputers offer nodes that contain a mix of CPUs and GPUs. To fully utilize the computing power of these nodes, scientific codes must be adapted to offload expensive computations to GPUs.
We present symPACK, a GPU-capable parallel sparse Cholesky solver that uses one-sided communication primitives and remote procedure calls provided by the UPC++ library. We also utilize the UPC++ "memory kinds" feature to enable efficient communication of GPU-resident data. We show that on a number of large problems, symPACK outperforms comparable state-of-the-art GPU-capable Cholesky factorization codes by up to 14x on the NERSC Perlmutter supercomputer.
Michelle Mills Strout, Damian Rouson, Amir Kamil, Dan Bonachea, Jeremiah Corrado, Paul H. Hargrove, Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran, Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC23), November 12, 2023,
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models.
The tutorial is targeted for users with little-to-no parallel programming experience, but everyone is welcome. A partial differential equation example will be demonstrated in all three programming models. That example and others will be provided to attendees in a virtual environment. Attendees will be shown how to compile and run these programming examples, and the virtual environment will remain available to attendees throughout the conference, along with Slack-based interactive tech support.
Come join us to learn about some productive and performant parallel programming models!
Michelle Mills Strout, Damian Rouson, Amir Kamil, Dan Bonachea, Jeremiah Corrado, Paul H. Hargrove, Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran (CUF23), ECP/NERSC/OLCF Tutorial, July 2023,
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models. This tutorial should be accessible to users with little-to-no parallel programming experience, and everyone is welcome. A partial differential equation example will be demonstrated in all three programming models along with performance and scaling results on big machines. That example and others will be provided in a cloud instance and Docker container. Attendees will be shown how to compile and run these programming examples, and provided opportunities to experiment with different parameters and code alternatives while being able to ask questions and share their own observations. Come join us to learn about some productive and performant parallel programming models!
Secondary tutorial sites by event sponsors:
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 31, 2023, LBNL 2001516, doi: 10.25344/S46W2J
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Paul H. Hargrove, Dan Bonachea, Johnny Corbino, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'23)", Poster at Exascale Computing Project (ECP) Annual Meeting 2023, January 2023,
The Pagoda project is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. The first component is GASNet-EX, a portable, high-performance, global-address-space communication library. The second component is UPC++, a C++ template library. Together, these libraries enable agile, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
GASNet-EX is a portable, high-performance communications middleware library which leverages hardware support to implement Remote Memory Access (RMA) and Active Message communication primitives. GASNet-EX supports a broad ecosystem of alternative HPC programming models, including UPC++, Legion, Chapel and multiple implementations of UPC and Fortran Coarrays. GASNet-EX is implemented directly over the native APIs for networks of interest in HPC. The tight semantic match of GASNet-EX APIs to the client requirements and hardware capabilities often yields better performance than competing libraries.
UPC++ provides high-level productivity abstractions appropriate for Partitioned Global Address Space (PGAS) programming such as: remote memory access (RMA), remote procedure call (RPC), support for accelerators (e.g. GPUs), and mechanisms for aggressive asynchrony to hide communication costs. UPC++ implements communication using GASNet-EX, delivering high performance and portability from laptops to exascale supercomputers. HPC application software using UPC++ includes: MetaHipMer2 metagenome assembler, SIMCoV viral propagation simulation, NWChemEx TAMM, and graph computation kernels from ExaGraph.
2022
"Berkeley Lab’s Networking Middleware GASNet Turns 20: Now, GASNet-EX is Gearing Up for the Exascale Era", Linda Vu, HPCWire (Lawrence Berkeley National Laboratory CS Area Communications), December 7, 2022, doi: 10.25344/S4BP4G
GASNet Celebrates 20th Anniversary
For 20 years, Berkeley Lab’s GASNet has been fueling developers’ ability to tap the power of massively parallel supercomputers more effectively. The middleware was recently upgraded to support exascale scientific applications.
Katherine Rasmussen, Damian Rouson, Naje George, Dan Bonachea, Hussain Kadhem, Brian Friesen, "Agile Acceleration of LLVM Flang Support for Fortran 2018 Parallel Programming", Research Poster at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), November 2022, doi: 10.25344/S4CP4S
The LLVM Flang compiler ("Flang") is currently Fortran 95 compliant, and the frontend can parse Fortran 2018. However, Flang does not have a comprehensive 2018 test suite and does not fully implement the static semantics of the 2018 standard. We are investigating whether agile software development techniques, such as pair programming and test-driven development (TDD), can help Flang to rapidly progress to Fortran 2018 compliance. Because of the paramount importance of parallelism in high-performance computing, we are focusing on Fortran’s parallel features, commonly denoted “Coarray Fortran.” We are developing what we believe are the first exhaustive, open-source tests for the static semantics of Fortran 2018 parallel features, and contributing them to the LLVM project. A related effort involves writing runtime tests for parallel 2018 features and supporting those tests by developing a new parallel runtime library: the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine).
Paul H. Hargrove, Dan Bonachea, "GASNet-EX RMA Communication Performance on Recent Supercomputing Systems", 5th Annual Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM'22), November 2022, doi: 10.25344/S40C7D
Partitioned Global Address Space (PGAS) programming models, typified by systems such as Unified Parallel C (UPC) and Fortran coarrays, expose one-sided Remote Memory Access (RMA) communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.
GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in emerging exascale machines. The library is an evolution of the popular GASNet communication system, building upon 20 years of lessons learned. We present microbenchmark results which demonstrate the RMA performance of GASNet-EX is competitive with MPI implementations on four recent, high-impact, production HPC systems. These results are an update relative to previously published results on older systems. The networks measured here are representative of hardware currently used in six of the top ten fastest supercomputers in the world, and all of the exascale systems on the U.S. DOE road map.
Damian Rouson, Dan Bonachea, "Caffeine: CoArray Fortran Framework of Efficient Interfaces to Network Environments", Proceedings of the Eighth Annual Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC2022), Dallas, Texas, USA, IEEE, November 2022, doi: 10.25344/S4459B
This paper provides an introduction to the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine), a parallel runtime library built atop the GASNet-EX exascale networking library. Caffeine leverages several non-parallel Fortran features to write type- and rank-agnostic interfaces and corresponding procedure definitions that support parallel Fortran 2018 features, including communication, collective operations, and related services. One major goal is to develop a runtime library that can eventually be considered for adoption by LLVM Flang, enabling that compiler to support the parallel features of Fortran. The paper describes the motivations behind Caffeine's design and implementation decisions, details the current state of Caffeine's development, and previews future work. We explain how the design and implementation offer benefits related to software sustainability by lowering the barrier to user contributions, reducing complexity through the use of Fortran 2018 C-interoperability features, and high performance through the use of a lightweight communication substrate.
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001480, doi: 10.25344/S4M59P
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Dan Bonachea, Paul H. Hargrove, An Introduction to GASNet-EX for Chapel Users, 9th Annual Chapel Implementers and Users Workshop (CHIUW 2022), June 10, 2022,
Have you ever typed "export CHPL_COMM=gasnet"? If you’ve used Chapel with multi-locale support on a system without "Cray" in the model name, then you’ve probably used GASNet. Did you ever wonder what GASNet is? What GASNet should mean to you? This talk aims to answer those questions and more. Chapel has system-specific implementations of multi-locale communication for Cray-branded systems including the Cray XC and HPE Cray EX lines. On other systems, Chapel communication uses the GASNet communication library embedded in third-party/gasnet. In this talk, that third-party will introduce itself to you in the first person.
Paul H. Hargrove, Dan Bonachea, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'22)", Poster at Exascale Computing Project (ECP) Annual Meeting 2022, May 5, 2022,
We present UPC++ and GASNet-EX, distributed libraries which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Procedure Call (RPC) and for Remote Memory Access (RMA) to host and GPU memories. The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001452, doi: 10.25344/S4530J
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
2021
Daniel Waters, Colin A. MacLean, Dan Bonachea, Paul H. Hargrove, "Demonstrating UPC++/Kokkos Interoperability in a Heat Conduction Simulation (Extended Abstract)", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S4630V
We describe the replacement of MPI with UPC++ in an existing Kokkos code that simulates heat conduction within a rectangular 3D object, as well as an analysis of the new code’s performance on CUDA accelerators. The key challenges were packing the halos in Kokkos data structures in a way that allowed for UPC++ remote memory access, and streamlining synchronization costs. Additional UPC++ abstractions used included global pointers, distributed objects, remote procedure calls, and futures. We also make use of the device allocator concept to facilitate data management in memory with unique properties, such as GPUs. Our results demonstrate that despite the algorithm’s good semantic match to message passing abstractions, straightforward modifications to use UPC++ communication deliver vastly improved performance and scalability in the common case. We find the one-sided UPC++ version written in a natural way exhibits good performance, whereas the message-passing version written in a straightforward way exhibits performance anomalies. We argue this represents a productivity benefit for one-sided communication models.
Amir Kamil, Dan Bonachea, "Optimization of Asynchronous Communication Operations through Eager Notifications", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S42C71
UPC++ is a C++ library implementing the Asynchronous Partitioned Global Address Space (APGAS) model. We propose an enhancement to the completion mechanisms of UPC++ used to synchronize communication operations that is designed to reduce overhead for on-node operations. Our enhancement permits eager delivery of completion notification in cases where the data transfer semantics of an operation happen to complete synchronously, for example due to the use of shared-memory bypass. This semantic relaxation allows removing significant overhead from the critical path of the implementation in such cases. We evaluate our results on three different representative systems using a combination of microbenchmarks and five variations of the the HPCChallenge RandomAccess benchmark implemented in UPC++ and run on a single node to accentuate the impact of locality. We find that in RMA versions of the benchmark written in a straightforward manner (without manually optimizing for locality), the new eager notification mode can provide up to a 25% speedup when synchronizing with promises and up to a 13.5x speedup when synchronizing with conjoined futures. We also evaluate our results using a graph matching application written with UPC++ RMA communication, where we measure overall speedups of as much as 11% in single-node runs of the unmodified application code, due to our transparent enhancements.
Paul H. Hargrove, Dan Bonachea, Colin A. MacLean, Daniel Waters, "GASNet-EX Memory Kinds: Support for Device Memory in PGAS Programming Models", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21) Research Poster, November 2021, doi: 10.25344/S4P306
Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work includes two major components: UPC++ (a C++ template library) and GASNet-EX (a portable, high-performance communication library). This poster describes recent advances in GASNet-EX to efficiently implement Remote Memory Access (RMA) operations to and from memory on accelerator devices such as GPUs. Performance is illustrated via benchmark results from UPC++ and the Legion programming system, both using GASNet-EX as their communications library.
Katherine A. Yelick, Amir Kamil, Damian Rouson, Dan Bonachea, Paul H. Hargrove, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications (SC21), Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), November 15, 2021,
UPC++ is a C++ library supporting Partitioned Global Address Space (PGAS) programming. UPC++ offers low-overhead one-sided Remote Memory Access (RMA) and Remote Procedure Calls (RPC), along with future/promise-based asynchrony to express dependencies between computation and asynchronous data movement. UPC++ supports simple/regular data structures as well as more elaborate distributed applications where communication is fine-grained and/or irregular. UPC++ provides a uniform abstraction for one-sided RMA between host and GPU/accelerator memories anywhere in the system. UPC++'s support for aggressive asynchrony enables applications to effectively overlap communication and reduce latency stalls, while the underlying GASNet-EX communication library delivers efficient low-overhead RMA/RPC on HPC networks.
This tutorial introduces UPC++, covering the memory and execution models and basic algorithm implementations. Participants gain hands-on experience incorporating UPC++ features into application proxy examples. We examine a few UPC++ applications with irregular communication (metagenomic assembler and COVID-19 simulation) and describe how they utilize UPC++ to optimize communication performance.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001425, doi: 10.25344/S4XK53
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
Dan Bonachea, "UPC++ as_eager Working Group Draft, Revision 2020.6.2", Lawrence Berkeley National Laboratory Tech Report, August 9, 2021, LBNL 2001416, doi: 10.25344/S4FK5R
This draft proposes an extension for a new future-based completion variant that can be more effectively streamlined for RMA and atomic access operations that happen to be satisfied at runtime using purely node-local resources. Many such operations are most efficiently performed synchronously using load/store instructions on shared-memory mappings, where the actual access may only require a few CPU instructions. In such cases we believe it’s critical to minimize the overheads imposed by the UPC++ runtime and completion queues, in order to enable efficient operation on hierarchical node hardware using shared-memory bypass.
The new upcxx::{source,operation}_cx::as_eager_future() completion variant accomplishes this goal by relaxing the current restriction that future-returning access operations must return a non-ready future whose completion is deferred until a subsequent explicit invocation of user-level progress. This relaxation allows access operations that are completed synchronously to instead return a ready future, thereby avoiding most or all of the runtime costs associated with deferment of future completion and subsequent mandatory entry into the progress engine.
We additionally propose to make this new as_eager_future() completion variant the new default completion for communication operations that currently default to returning a future. This should encourage use of the streamlined variant, and may provide performance improvements to some codes without source changes. A mechanism is proposed to restore the legacy behavior on-demand for codes that might happen to rely on deferred completion for correctness.
Finally, we propose a new as_eager_promise() completion variant that extends analogous improvements to promise-based completion, and corresponding changes to the default behavior of as_promise().
Paul H. Hargrove, Dan Bonachea, Max Grossman, Amir Kamil, Colin A. MacLean, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'21)", Poster at Exascale Computing Project (ECP) Annual Meeting 2021, April 2021,
We present UPC++ and GASNet-EX, which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Memory Access (RMA) and Remote Procedure Call (RPC). The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2021.3.0", Lawrence Berkeley National Laboratory Tech Report, March 31, 2021, LBNL 2001388, doi: 10.25344/S4K881
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
Dan Bonachea, GASNet-EX: A High-Performance, Portable Communication Library for Exascale, Berkeley Lab – CS Seminar, March 10, 2021,
- Download File: GASNet-2021-LBL-seminar-slides.pdf (pdf: 9.1 MB)
Partitioned Global Address Space (PGAS) models, pioneered by languages such as Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.
GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in future exascale machines. The library is an evolution of the popular GASNet communication system, building on 20 years of lessons learned. We describe several features and enhancements that have been introduced to address the needs of modern runtimes and exploit the hardware capabilities of emerging systems. Microbenchmark results demonstrate the RMA performance of GASNet-EX is competitive with several MPI implementations on current systems. GASNet-EX provides communication services that help to deliver speedups in HPC applications written using the UPC++ library, enabling new science on pre-exascale systems.
Julian Borrill
2021
Y Segawa, H Hirose, D Kaneko, M Hasegawa, S Adachi, P Ade, MAOA Faúndez, Y Akiba, K Arnold, J Avva, C Baccigalupi, D Barron, D Beck, S Beckman, F Bianchini, D Boettger, J Borrill, J Carron, S Chapman, K Cheung, Y Chinone, K Crowley, A Cukierman, T De Haan, M Dobbs, R Dunner, HE Bouhargani, T Elleflot, J Errard, G Fabbian, S Feeney, C Feng, T Fujino, N Galitzki, N Goeckner-Wald, J Groh, G Hall, N Halverson, T Hamada, M Hazumi, C Hill, L Howe, Y Inoue, J Ito, G Jaehnig, O Jeong, N Katayama, B Keating, R Keskitalo, S Kikuchi, T Kisner, N Krachmalnicoff, A Kusaka, AT Lee, D Leon, E Linder, LN Lowry, A Mangu, F Matsuda, Y Minami, J Montgomery, M Navaroli, H Nishino, J Peloton, ATP Pham, D Poletti, G Puglisi, C Raum, CL Reichardt, C Ross, M Silva-Feaver, P Siritanasak, R Stompor, A Suzuki, O Tajima, S Takakura, S Takatori, D Tanabe, GP Teply, C Tsai, C Verges, B Westbrook, Y Zhou, "Method for rapid performance validation of large TES bolometer array for POLARBEAR-2A using a coherent millimeter-wave source", AIP Conference Proceedings, 2021, 2319, doi: 10.1063/5.0038197
M Tristram, AJ Banday, KM Górski, R Keskitalo, CR Lawrence, KJ Andersen, RB Barreiro, J Borrill, HK Eriksen, R Fernandez-Cobos, TS Kisner, E Martínez-González, B Partridge, D Scott, TL Svalheim, H Thommesen, IK Wehus, "Planck constraints on the tensor-to-scalar ratio", Astronomy and Astrophysics, 2021, 647, doi: 10.1051/0004-6361/202039585
G Puglisi, R Keskitalo, T Kisner, JD Borrill, Simulating Calibration and Beam Systematics for a Future CMB Space Mission with the TOAST Package, Research Notes of the AAS, Pages: 137--137 2021, doi: 10.3847/2515-5172/ac0823
N Aghanim, Y Akrami, M Ashdown, J Aumont, C Baccigalupi, M Ballardini, AJ Banday, RB Barreiro, N Bartolo, S Basak, R Battye, K Benabed, JP Bernard, M Bersanelli, P Bielewicz, JJ Bock, JR Bond, J Borrill, FR Bouchet, F Boulanger, M Bucher, C Burigana, RC Butler, E Calabrese, JF Cardoso, J Carron, A Challinor, HC Chiang, J Chluba, LPL Colombo, C Combet, D Contreras, BP Crill, F Cuttaia, P De Bernardis, G De Zotti, J Delabrouille, JM Delouis, E DI Valentino, JM DIego, O Doré, M Douspis, A Ducout, X Dupac, S Dusini, G Efstathiou, F Elsner, TA Enßlin, HK Eriksen, Y Fantaye, M Farhang, J Fergusson, R Fernandez-Cobos, F Finelli, F Forastieri, M Frailis, AA Fraisse, E Franceschi, A Frolov, S Galeotta, S Galli, K Ganga, RT Génova-Santos, M Gerbino, T Ghosh, J González-Nuevo, KM Górski, S Gratton, A Gruppuso, JE Gudmundsson, J Hamann, W Handley, FK Hansen, D Herranz, SR Hildebrandt, E Hivon, Z Huang, AH Jaffe, WC Jones, A Karakci, E Keihänen, R Keskitalo, K Kiiveri, J Kim, TS Kisner, L Knox, N Krachmalnicoff, M Kunz, H Kurki-Suonio, G Lagache, JM Lamarre, A Lasenby, M Lattanzi, CR Lawrence, M Le Jeune, P Lemos, J Lesgourgues, F Levrier, A Lewis, M Liguori, "Erratum: Planck 2018 results: VI. Cosmological parameters (Astronomy and Astrophysics (2020) 641 (A6) DOI: 10.1051/0004-6361/201833910)", Astronomy and Astrophysics, 2021, 652, doi: 10.1051/0004-6361/201833910e
Kristofer Bouchard
2021
Luca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin, Sean Peisert, W. Bradley Holtz, Anil Aswani, Dipankar Dwivedi, Haruko Wainwright, Ghanshyam Pilania, Benjamin Nachman, Babetta L. Marrone, Nicola Falco, Prabhat, Daniel Arnold, Alejandro Wolf-Yadlin, Sarah Powers, Sharlee Climer, Quinn Jackson, Ty Carlson, Michael Sohn, Petrus Zwart, Neeraj Kumar, Amy Justice, Claire Tomlin, Daniel Jacobson, Gos Micklem, Georgios V. Gkoutos, Peter J. Bickel, Jean-Baptiste Cazier, Juliane Müller, Bobbie-Jo Webb-Robertson, Rick Stevens, Mark Anderson, Ken Kreutz-Delgado, Michael W. Mahoney, James B. Brown,, Learning from Learning Machines: a New Generation of AI Technology to Meet the Needs of Science, arXiv preprint arXiv:2111.13786, November 27, 2021,
Maximilian Bremer
2022
Maximilian Bremer, John Bachan, Cy Chan, Clint Dawson, "Adaptive total variation stable local timestepping for conservation laws", Journal of Computational Physics, April 21, 2022,
2021
Md Abdul M Faysal, Shaikh Arifuzzaman, Cy Chan, Maximilian Bremer, Doru Popovici, John Shalf, "HyPC-Map: A Hybrid Parallel Community Detection Algorithm Using Information-Theoretic Approach", HPEC, September 20, 2021,
Maximilian Bremer, John Bachan, Cy Chan, and Clint Dawson, "Speculative Parallel Execution for Local Timestepping", 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, May 21, 2021,
Benjamin Brock
2021
O Selvitopi, B Brock, I Nisa, A Tripathy, K Yelick, A Buluç, "Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication", Proceedings of the International Conference on Supercomputing, January 2021, 431--442, doi: 10.1145/3447818.3461472
Aydin Buluç
2021
Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad, "Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale", 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021, doi: 10.1109/IPDPS49936.2021.00018
Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, Aydın Buluç, "BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 2021, doi: 10.1101/464420
O Selvitopi, B Brock, I Nisa, A Tripathy, K Yelick, A Buluç, "Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication", Proceedings of the International Conference on Supercomputing, January 2021, 431--442, doi: 10.1145/3447818.3461472
G Guidi, M Ellis, A Buluç, K Yelick, D Culler, "10 years later: Cloud computing is closing the performance gap", ICPE 2021 - Companion of the ACM/SPEC International Conference on Performance Engineering, January 1, 2021, 41--48, doi: 10.1145/3447545.3451183
Anastasiia Butko
2021
Meriam Gay Bautista, Zhi Jackie Yao, Anastasiia Butko, Mariam Kiran, Mekena Metcalf, "Towards Automated Superconducting Circuit Calibration using Deep Reinforcement Learning", 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Tampa, FL, USA, IEEE, August 23, 2021, pp. 462-46, doi: 10.1109/ISVLSI51109.2021.00091
George Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Bautista, Dilip Vasudevan, Anastasiia Butko, "SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC", IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021,
Surendra Byna
2023
Bin Dong, Jean Luca Bez, Suren Byna, "AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis.", In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’23), June 16, 2023,
- Download File: IODiagnose-final.pdf (pdf: 1.9 MB)
Hammad Ather, Jean Luca Bez, Boyana Norris, Suren Byna, "Illuminating the I/O Optimization Path of Scientific Applications", High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings, Hamburg, Germany, Springer-Verlag, May 21, 2023, 22–41, doi: https://doi.org/10.1007/978-3-031-32041-5_2
The existing parallel I/O stack is complex and difficult to tune due to the interdependencies among multiple factors that impact the performance of data movement between storage and compute systems. When performance is slower than expected, end-users, developers, and system administrators rely on I/O profiling and tracing information to pinpoint the root causes of inefficiencies. Despite having numerous tools that collect I/O metrics on production systems, it is not obvious where the I/O bottlenecks are (unless one is an I/O expert), their root causes, and what to do to solve them. Hence, there is a gap between the currently available metrics, the issues they represent, and the application of optimizations that would mitigate performance slowdowns. An I/O specialist often checks for common problems before diving into the specifics of each application and workload. Streamlining such analysis, investigation, and recommendations could close this gap without requiring a specialist to intervene in every case. In this paper, we propose a novel interactive, user-oriented visualization, and analysis framework, called Drishti. This framework helps users to pinpoint various root causes of I/O performance problems and to provide a set of actionable recommendations for improving performance based on the observed characteristics of an application. We evaluate the applicability and correctness of Drishti using four use cases from distinct science domains and demonstrate its value to end-users, developers, and system administrators when seeking to improve an application’s I/O performance.
S. Kim, A. Sim, K. Wu, S. Byna, Y. Son, H. Eom, "Design and Implementation of I/O Performance Prediction Scheme on HPC Systems through Large-scale Log Analysis", Journal of Big Data, 2023, 10(65), doi: 10.1186/s40537-023-00741-4
2022
Jean Luca Bez, Hammad Ather, Suren Byna, "Drishti: Guiding End-Users in the I/O Optimization Journey", PDSW 2022, held in conjunction with SC22, 2022,
Jean Luca Bez, Suren Byna, April 2019 Darshan counters from the Cori supercomputer [Data set], Zenodo, 2022, doi: 10.5281/zenodo.6476501
Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son, "Design and implementation of dynamic I/O control scheme for large scale distributed file systems", Cluster Computing, 2022, 25(6):1--16, doi: 10.1007/s10586-022-03640-0
- Download File: wu2022.bib (bib: 22 KB)
Jean Luca Bez, Ahmad Maroof Karimi, Arnab K. Paul, Bing Xie, Suren Byna, Philip Carns, Sarp Oral, Feiyi Wang, Jesse Hanley, "Access Patterns and Performance Behaviors of Multi-layer Supercomputer I/O Subsystems under Production Load", 31st International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC '22), Association for Computing Machinery, June 27, 2022, 43–55, doi: 10.1145/3502181.3531461
D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, W. Arndt, J. Blaschke, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, T. Lehman, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, L. Stephey, R. Thomas, G. Torok, "LBNL Superfacility Project Report", Lawrence Berkeley National Laboratory, 2022, doi: 10.48550/arXiv.2206.11992
Jean Luca Bez, Suren Byna, Understanding I/O Behavior with Interactive Darshan Log Analysis, Exascale Computing Project (ECP) Community Days BoF, 2022,
Houjun Tang, Quincey Koziol, John Ravi, and Suren Byna,, "Transparent Asynchronous Parallel I/O using Background Threads", IEEE Transactions on Parallel and Distributed Systems, April 4, 2022, 33, doi: 10.1109/TPDS.2021.3090322
2021
Qiao Kang, Scot Breitenfeld, Kaiyuan Hou, Wei-keng Liao, Robert Ross, and Suren Byna,, "Optimizing Performance of Parallel I/O Accesses to Non-contiguous Blocks in Multiple Array Variables", IEEE BigData 2021 conference, December 19, 2021,
J. Bang, C. Kim, K. Wu, A. Sim, S. Byna, H. Sung, H. Eom, "An In-Depth I/O Pattern Analysis in HPC Systems", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021, doi: 10.1109/HiPC53243.2021.00056
Wei Zhang, Suren Byna, Hyogi Sim, Sangkeun Lee, Sudharshan Vazhkudai, and Yong Chen,, "Exploiting User Activeness for Data Retention in HPC Systems", International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21), November 21, 2021, doi: https://doi.org/10.1145/3458817.3476201
- Download File: 3458817.3476201-2.pdf (pdf: 1.5 MB)
Cong Xu, Suparna Bhattacharya, Martin Foltin, Suren Byna, and Paolo Faraboschi, "Data-Aware Storage Tiering for Deep Learning", 6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21, November 21, 2021,
Houjun Tang, Bing Xie, Suren Byna, Phillip Carns, Quincey Koziol, Sudarsun Kannan, Jay Lofstead, and Sarp Oral,, "SCTuner: An Auto-tuner Addressing Dynamic I/O Needs on Supercomputer I/O Sub-systems", 6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21, November 21, 2021,
Bo Fang, Daoce Wang, Sian Jin, Quincey Koziol, Zhao Zhang, Qiang Guan, Suren Byna, Sriram Krishnamoorthy, and Dingwen Tao,, "Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights", IEEE Cluster 2021, September 1, 2021,
Suren Byna, Houjun Tang, and Quincey Koziol,, Automatic and Transparent Scientific Data Management with Object Abstractions, PASC 2021, in a Minisymposium on "Data Movement Orchestration on HPC Systems", July 31, 2021,
Bing Xie, Houjun Tang, Suren Byna, Jesse Hanley, Quincey Koziol, Tonglin Li, Sarp Oral,, "Battle of the Defaults: Extracting Performance Characteristics of HDF5 under Production Load", CCGrid 2021, May 31, 2021,
Jean Luca Bez, Houjun Tang, Bing Xie, David Williams-Young, Rob Latham, Rob Ross, Sarp Oral, Suren Byna, "I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis", 2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW), January 1, 2021, 15-22, doi: 10.1109/PDSW54622.2021.00008
Tonglin Li, Suren Byna, Quincey Koziol, Houjun Tang, Jean Luca Bez, Qiao Kang, "h5bench: HDF5 I/O Kernel Suite for Exercising HPC I/O Patterns", Cray User Group (CUG) 2021, January 1, 2021,
Paolo Calafiura
2022
Paolo Calafiura and others, Artificial Intelligence for High Energy Physics, edited by Paolo Calafiura, David Rousseau, Kazuhiro Terao, (World Scientific: March 1, 2022) doi: 10.1142/12200
John Wu, Ben Brown, Paolo Calafiura, Quincey Koziol, Dongeun Lee, Alex Sim, Devesh Tiwari, Support for In-Flight Data Analyses in Scientific Workflows, DOE ASCR Workshop on the Management and Storage of Scientific Data, 2022, doi: 10.2172/1843500
Alina Lazar, others, Accelerating the Inference of the Exa.TrkX Pipeline, 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded - Towards Sustainable, Diverse, Performant and Effective Scientific Computing, 2022,
Chun-Yi Wang, others, Reconstruction of Large Radius Tracks with the Exa.TrkX pipeline, 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded - Towards Sustainable, Diverse, Performant and Effective Scientific Computing, 2022,
Sunanda Banerjee, others, Detector and Beamline Simulation for Next-Generation High Energy Physics Experiments, 2022 Snowmass Summer Study, 2022,
Meghna Bhattacharya, others, Portability: A Necessary Approach for Future Scientific Software, 2022 Snowmass Summer Study, 2022,
Christopher D. Jones, Kyle Knoepfel, Paolo Calafiura, Charles Leggett, Vakhtang Tsulaia, Evolution of HEP Processing Frameworks, 2022 Snowmass Summer Study, 2022,
Savannah Thais, Paolo Calafiura, Grigorios Chachamis, Gage DeZoort, Javier Duarte, Sanmay Ganguly, Michael Kagan, Daniel Murnane, Mark S. Neubauer, Kazuhiro Terao, Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges, 2022 Snowmass Summer Study, 2022,
2021
Xiangyang Ju, others, Performance of a geometric deep learning pipeline for HL-LHC particle tracking, Eur. Phys. J. C, Pages: 876 2021, doi: 10.1140/epjc/s10052-021-09675-8
David Camp
2022
E. Wes Bethel, Burlen Loring, Utkarsh Ayatchit, David Camp, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, David Pugmire, Silvio Rizzi, Thompson, Gunther H. Weber, Brad Whitlock, Matthew Wolf, Kesheng Wu, "The SENSEI Generic In Situ Interface: Tool and Processing Portability at Scale", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_13
Daan Camps
2022
M. G. Amankwah, D. Camps, E. W. Bethel, R. Van Beeumen, T. Perciano, "Quantum pixel representations and compression for N-dimensional images", Nature Scientific Reports, May 11, 2022, 12:7712, doi: 10.1038/s41598-022-11024-y
2021
Thijs Steel, Daan Camps, Karl Meerbergen, Raf Vandebril, "A Multishift, Multipole Rational QZ Method with Aggressive Early Deflation", SIAM Journal on Matrix Analysis and Applications, February 19, 2021, 42:753-774, doi: 10.1137/19M1249631
In the article “A Rational QZ Method” by D. Camps, K. Meerbergen, and R. Vandebril [SIAM J. Matrix Anal. Appl., 40 (2019), pp. 943--972], we introduced rational QZ (RQZ) methods. Our theoretical examinations revealed that the convergence of the RQZ method is governed by rational subspace iteration, thereby generalizing the classical QZ method, whose convergence relies on polynomial subspace iteration. Moreover the RQZ method operates on a pencil more general than Hessenberg---upper triangular, namely, a Hessenberg pencil, which is a pencil consisting of two Hessenberg matrices. However, the RQZ method can only be made competitive to advanced QZ implementations by using crucial add-ons such as small bulge multishift sweeps, aggressive early deflation, and optimal packing. In this paper we develop these techniques for the RQZ method. In the numerical experiments we compare the results with state-of-the-art routines for the generalized eigenvalue problem and show that the presented method is competitive in terms of speed and accuracy.
Cy Chan
2022
Maximilian Bremer, John Bachan, Cy Chan, Clint Dawson, "Adaptive total variation stable local timestepping for conservation laws", Journal of Computational Physics, April 21, 2022,
2021
Md Abdul M Faysal, Shaikh Arifuzzaman, Cy Chan, Maximilian Bremer, Doru Popovici, John Shalf, "HyPC-Map: A Hybrid Parallel Community Detection Algorithm Using Information-Theoretic Approach", HPEC, September 20, 2021,
Serges Love Teutu Talla, Isabelle Kemajou-Brown, Cy Chan, Bin Wang, "A Binary Multi-Subsystems Transportation Networks Estimation using Mobiliti Data", 2021 American Control Conference (ACC), May 25, 2021,
Maximilian Bremer, John Bachan, Cy Chan, and Clint Dawson, "Speculative Parallel Execution for Local Timestepping", 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, May 21, 2021,
You-Wei Cheah
2021
D. A. Agarwal, J. Damerow, C. Varadharajan, D. S. Christianson, G. Z. Pastorello, Y.-W. Cheah, L. Ramakrishnan, "Balancing the needs of consumers and producers for scientific data collections", Ecological Informatics, 2021, 62:101251, doi: 10.1016/j.ecoinf.2021.101251
Yize Chen
2022
Daniel Arnold, Sy-Toan Ngo, Ciaran Roberts, Yize Chen, Anna Scaglione, Sean Peisert, "Adam-based Augmented Random Search for Control Policies for Distributed Energy Resource Cyber Attack Mitigation", Proceedings of the 2022 American Control Conference (ACC), June 2022,
2021
Yize Chen, Yuanyuan Shi, Daniel Arnold, Sean Peisert, SAVER: Safe Learning-Based Controller for Real-Time Voltage Regulation, arXiv preprint arXiv:2111.15152,, November 30, 2021,
Yize Chen, Daniel Arnold, Yuanyuan Shi, Sean Peisert, Understanding the Safety Requirements for Learning-based Power Systems Operations, arXiv preprint arXiv:2110.04983, October 11, 2021,
Ran Cheng
2023
Hao Li, Han Cai, Joseph Forman, Ran Cheng, et al., "Transport Properties of NbN Thin Films Patterned With a Focused Helium Ion Beam", IEEE Transactions on Applied Superconductivity, August 2023,
Ran Cheng, Christoph Kirst, Dilip Vasudevan, "Superconducting-Oscillatory Neural Network With Pixel Error Detection for Image Recognition", IEEE Transaction on Applied Superconductivity, August 2023, 33:1-7,
2021
Ran Cheng, Uday S. Goteti, Harrison Walker, Keith M. Krause, Luke Oeding, Michael C. Hamilton, "Toward Learning in Neuromorphic Circuits Based on Quantum Phase Slip Junctions", Frontiers in Neuroscience, November 8, 2021,
Ran Cheng, Uday S. Goteti, Michael C. Hamilton, "High-Speed and Low-Power Superconducting Neuromorphic Circuits Based on Quantum Phase-Slip Junctions", IEEE Transactions on Applied Superconductivity, August 2021,
Shreyas Cholia
2022
D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, W. Arndt, J. Blaschke, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, T. Lehman, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, L. Stephey, R. Thomas, G. Torok, "LBNL Superfacility Project Report", Lawrence Berkeley National Laboratory, 2022, doi: 10.48550/arXiv.2206.11992
MB Simmonds, WJ Riley, DA Agarwal, X Chen, S Cholia, R Crystal-Ornelas, ET Coon, D Dwivedi, VC Hendrix, M Huang, A Jan, Z Kakalia, J Kumar, CD Koven, L Li, M Melara, L Ramakrishnan, DM Ricciuto, AP Walker, W Zhi, Q Zhu, C Varadharajan, Guidelines for Publicly Archiving Terrestrial Model Data to Enhance Usability, Intercomparison, and Synthesis, Data Science Journal, 2022, doi: 10.5334/dsj-2022-003
Nicholas Choma
2021
Xiangyang Ju, others, Performance of a geometric deep learning pipeline for HL-LHC particle tracking, Eur. Phys. J. C, Pages: 876 2021, doi: 10.1140/epjc/s10052-021-09675-8
Danielle Svehla Christianson
2022
H Weierbach, AR Lima, JD Willard, VC Hendrix, DS Christianson, M Lubich, C Varadharajan, Stream Temperature Predictions for River Basin Management in the Pacific Northwest and Mid-Atlantic Regions Using Machine Learning, Water (Switzerland), 2022, doi: 10.3390/w14071032
C Varadharajan, AP Appling, B Arora, DS Christianson, VC Hendrix, V Kumar, AR Lima, J Müller, S Oliver, M Ombadi, T Perciano, JM Sadler, H Weierbach, JD Willard, Z Xu, J Zwart, "Can machine learning accelerate process understanding and decision-relevant predictions of river water quality?", Hydrological Processes, January 1, 2022, 36, doi: 10.1002/hyp.14565
C Varadharajan, VC Hendrix, DS Christianson, M Burrus, C Wong, SS Hubbard, DA Agarwal, BASIN-3D: A brokering framework to integrate diverse environmental data, Computers and Geosciences, 2022, doi: 10.1016/j.cageo.2021.105024
2021
C Varadharajan, Z Kakalia, E Alper, EL Brodie, M Burrus, RWH Carroll, D Christianson, W Dong, V Hendrix, M Henderson, S Hubbard, D Johnson, R Versteeg, KH Williams, DA Agarwal, The Colorado East River Community Observatory Data Collection, Hydrological Processes 35(6), 2021, doi: 10.22541/au.161962485.54378235/v1
D. A. Agarwal, J. Damerow, C. Varadharajan, D. S. Christianson, G. Z. Pastorello, Y.-W. Cheah, L. Ramakrishnan, "Balancing the needs of consumers and producers for scientific data collections", Ecological Informatics, 2021, 62:101251, doi: 10.1016/j.ecoinf.2021.101251
Chen-Nee Chuah
2021
Ammar Haydari, Michael Zhang, Chen-Nee Chuah, Jane Macfarlane, Sean Peisert, Adaptive Differential Privacy Mechanism for Aggregated Mobility Dataset, arXiv preprint arXiv:2112.08487, December 10, 2021,
Lisa Claus
2021
Yang Liu, Pieter Ghysels, Lisa Claus, Xiaoye Sherry Li, "Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations", SIAM J. Sci. Comput., June 22, 2021,
Johnny Corbino
2023
Julian Bellavita, Mathias Jacquelin, Esmond G. Ng, Dan Bonachea, Johnny Corbino, Paul H. Hargrove, "symPACK: A GPU-Capable Fan-Out Sparse Cholesky Solver", 2023 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM'23), ACM, November 13, 2023, doi: 10.25344/S49P45
Sparse symmetric positive definite systems of equations are ubiquitous in scientific workloads and applications. Parallel sparse Cholesky factorization is the method of choice for solving such linear systems. Therefore, the development of parallel sparse Cholesky codes that can efficiently run on today’s large-scale heterogeneous distributed-memory platforms is of vital importance. Modern supercomputers offer nodes that contain a mix of CPUs and GPUs. To fully utilize the computing power of these nodes, scientific codes must be adapted to offload expensive computations to GPUs.
We present symPACK, a GPU-capable parallel sparse Cholesky solver that uses one-sided communication primitives and remote procedure calls provided by the UPC++ library. We also utilize the UPC++ "memory kinds" feature to enable efficient communication of GPU-resident data. We show that on a number of large problems, symPACK outperforms comparable state-of-the-art GPU-capable Cholesky factorization codes by up to 14x on the NERSC Perlmutter supercomputer.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Paul H. Hargrove, Dan Bonachea, Johnny Corbino, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'23)", Poster at Exascale Computing Project (ECP) Annual Meeting 2023, January 2023,
The Pagoda project is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. The first component is GASNet-EX, a portable, high-performance, global-address-space communication library. The second component is UPC++, a C++ template library. Together, these libraries enable agile, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
GASNet-EX is a portable, high-performance communications middleware library which leverages hardware support to implement Remote Memory Access (RMA) and Active Message communication primitives. GASNet-EX supports a broad ecosystem of alternative HPC programming models, including UPC++, Legion, Chapel and multiple implementations of UPC and Fortran Coarrays. GASNet-EX is implemented directly over the native APIs for networks of interest in HPC. The tight semantic match of GASNet-EX APIs to the client requirements and hardware capabilities often yields better performance than competing libraries.
UPC++ provides high-level productivity abstractions appropriate for Partitioned Global Address Space (PGAS) programming such as: remote memory access (RMA), remote procedure call (RPC), support for accelerators (e.g. GPUs), and mechanisms for aggressive asynchrony to hide communication costs. UPC++ implements communication using GASNet-EX, delivering high performance and portability from laptops to exascale supercomputers. HPC application software using UPC++ includes: MetaHipMer2 metagenome assembler, SIMCoV viral propagation simulation, NWChemEx TAMM, and graph computation kernels from ExaGraph.
2022
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Silvia Crivelli
2023
Nathan A. Kimbrel, Allison E. Ashley-Koch, Xue J. Qin, Jennifer H. Lindquist, Melanie E. Garrett, Michelle F. Dennis, Lauren P. Hair, Jennifer E. Huffman, Daniel A. Jacobson, Ravi K. Madduri, Jodie A. Trafton, Hilary Coon, Anna R. Docherty, Niamh Mullins, Douglas M. Ruderfer, Philip D. Harvey, Benjamin H. McMahon, David W. Oslin, Jean C. Beckham, Elizabeth R. Hauser, Michael A. Hauser, Million Veteran Program Suicide Exemplar Workgroup, International Suicide Genetics Consortium, Veterans Affairs Mid-Atlantic Mental Illness Research Education and Clinical Center Workgroup, Veterans Affairs Million Veteran Program, "Identification of Novel, Replicable Genetic Risk Loci for Suicidal Thoughts and Behaviors Among US Military Veterans", JAMA Psychiatry, February 1, 2023, 80:100-191, doi: 10.1001/jamapsychiatry.2022.3896
2022
Destinee Morrow, Rafael Zamora-Resendiz, Jean C Beckham, Nathan A Kimbrel, David W Oslin, Suzanne Tamang, Million Veteran Program Suicide Exemplar Workgroup, Silvia Crivelli, "A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes", Journal of Psychiatric Research, July 1, 2022, 151:328-338, doi: 10.1016/j.jpsychires.2022.04.009
James Demmel
2021
Y. Cho, J. W. Demmel, X. S. Li, Y. Liu, H. Luo, "Enhancing autotuning capability with a history database", IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), December 20, 2021,
- Download File: GPTuneHistoryDB.pdf (pdf: 390 KB)
H. Luo, J.W. Demmel, Y. Cho, X. S. Li, Y. Liu, "Non-smooth Bayesian optimization in tuning problems", arxiv-preprint, September 21, 2021,
Y. Liu, W. M. Sid-Lakhdar, O. Marques, X. Zhu, C. Meng, J. W. Demmel, X. S. Li, "GPTune: multitask learning for autotuning exascale applications", PPoPP, February 17, 2021, doi: 10.1145/3437801.3441621
Nan Ding
2022
Taylor Groves, Chris Daley, Rahulkumar Gayatri, Hai Ah Nam, Nan Ding, Lenny Oliker, Nicholas J. Wright, Samuel Williams, "A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures", PMBS, November 2022,
- Download File: PMBS22_GPU_final.pdf (pdf: 719 KB)
Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, LeAnn Lindsey, Christopher Daley, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, Methodology for Evaluating the Potential of Disaggregated Memory Systems, https://resdis.github.io/ws/2022/sc/, November 18, 2022,
- Download File: RESDIS22_Disaggregated_memory_Nan.pdf (pdf: 3.8 MB)
Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", RESDIS, https://resdis.github.io/ws/2022/sc/, November 18, 2022,
- Download File: Methodology-for-Evaluating-the-Potential-of-Disaggregated-Memory-Systems.pdf (pdf: 5.1 MB)
2021
Nan Ding, Muaaz Awan, Samuel Williams, "Instruction Roofline: An insightful visual performance model for GPUs", CCPE, August 4, 2021, doi: 10.1002/cpe.6591
Nan Ding, Samuel Williams, Yang Liu, Xiaoye S. Li, A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver, July 19, 2021,
- Download File: multiGPU_SpTRSV_ACDA21-v2.pdf (pdf: 3.7 MB)
Nan Ding, Yang Liu, Samuel Williams, Xiaoye S. Li, "A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), July 19, 2021,
- Download File: Multi-GPU-SpTRSV-ACDA21-.pdf (pdf: 897 KB)
MG Awan, S Hofmeyr, R Egan, N Ding, A Buluc, J Deslippe, L Oliker, K Yelick, "Accelerating Large Scale de novo Metagenome Assembly Using GPUs", International Conference for High Performance Computing, Networking, Storage and Analysis, SC, January 1, 2021, doi: 10.1145/3458817.3476212
Adrián Diéguez
2022
Adrián P. Diéguez, Margarita Amor, Ramón Doallo, Akira Nukada, Satoshi Matsuoka, "Efficient high-precision integer multiplication on the GPU", The International Journal of High Performance Computing Applications, March 2022, 36:356-369, doi: 10.1177/10943420221077964
Bin Dong
2023
Bin Dong, Jean Luca Bez, Suren Byna, "AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis.", In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’23), June 16, 2023,
- Download File: IODiagnose-final.pdf (pdf: 1.9 MB)
2022
Bin Dong, Alex Popescu, Veronica Rodriguez Tribaldos, Suren Byna, Jonathan Ajo-Franklin, Kesheng Wu, "Real-time and post-hoc compression for data from Distributed Acoustic Sensing", Computers \& Geosciences, June 24, 2022, 105181,
- Download File: wu2022.bib (bib: 22 KB)
Jonathan Ajo‐Franklin, Verónica Rodríguez Tribaldos, Avinash Nayak, Feng Cheng, Robert Mellors, Benxin Chi, Todd Wood, Michelle Robertson, Cody Rotermund, Eric Matzel, Dennise C. Templeton, Christina Morency, Kesheng Wu, Bin Dong, Patrick Dobson;, "The Imperial Valley Dark Fiber Project: Toward Seismic Studies Using DAS and Telecom Infrastructure for Geothermal Applications", Seismological Research Letters, June 24, 2022,
Runzhou Han, Suren Byna, Houjun Tang, Bin Dong, and Mai Zheng,, "PROV-IO: An I/O-Centric Provenance Framework for Scientific Data on HPC Systems", HPDC 2022, June 23, 2022,
John Wu, Bin Dong, Alex Sim, Automating Data Management Through Unified Runtime Systems, DOE ASCR Workshop on the Management and Storage of Scientific Data, 2022, doi: 10.2172/1843500
Bin Dong, Kesheng Wu, Suren Byna, User-Defined Tensor Data Analysis, SpringerBrief, (January 1, 2022)
Vincent A. Dumont
2021
V. Dumont, C. Garner, A. Trivedi, C. Jones, V. Ganapati, J. Mueller, T. Perciano, M. Kiran, and M. Day, "HYPPO: A Surrogate-Based Multi-Level Parallelism Tool for Hyperparameter Optimization", 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), November 15, 2021,
Abdelrahman Elbashandy
2021
Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy, Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, "Experiences with Reproducibility: Case Studies from Scientific Workflows", (P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems, ACM, June 21, 2021, doi: 10.1145/3456287.3465478
Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.
Marquita Ellis
2021
Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, Aydın Buluç, "BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 2021, doi: 10.1101/464420
G Guidi, M Ellis, A Buluç, K Yelick, D Culler, "10 years later: Cloud computing is closing the performance gap", ICPE 2021 - Companion of the ACM/SPEC International Conference on Performance Engineering, January 1, 2021, 41--48, doi: 10.1145/3447545.3451183
Abdelilah Essiari
2021
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, "Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows", 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), November 15, 2021, doi: 10.1109/WORKS54523.2021.00014
Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Michael Beach, Drew Paine, Lavanya Ramakrishnan, "Science Capsule - Capturing the Data Life Cycle", Journal of Open Source Software, 2021, 6:2484, doi: 10.21105/joss.02484
Farzad Fatollahi-Fard
2021
Douglas Doerfler, Farzad Fatollahi-Fard, Colin MacLean, Tan Nguyen, Samuel Williams, Nicholas J. Wright, Marco Siracusa, "Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs", International Workshop on OpenCL (iWOCL), April 2021, doi: 10.1145/3456669.3456671
Anne Felden
2022
Anne M. Felden, Daniel F. Martin, Esmond G. Ng, "SUHMO: an AMR SUbglacial Hydrology MOdel v1.0", Geosci. Model Dev. Discuss., July 27, 2022,
- Download File: gmd-2022-190.pdf (pdf: 5.5 MB)
2021
Anne M. Felden, Daniel F. Martin, Esmond G. Ng, SUHMO: An SUbglacial Hydrology MOdel based on the Chombo AMR framework, American Geophysical Union Fall Meeting, December 13, 2021,
Brian Friesen
2022
Katherine Rasmussen, Damian Rouson, Naje George, Dan Bonachea, Hussain Kadhem, Brian Friesen, "Agile Acceleration of LLVM Flang Support for Fortran 2018 Parallel Programming", Research Poster at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), November 2022, doi: 10.25344/S4CP4S
The LLVM Flang compiler ("Flang") is currently Fortran 95 compliant, and the frontend can parse Fortran 2018. However, Flang does not have a comprehensive 2018 test suite and does not fully implement the static semantics of the 2018 standard. We are investigating whether agile software development techniques, such as pair programming and test-driven development (TDD), can help Flang to rapidly progress to Fortran 2018 compliance. Because of the paramount importance of parallelism in high-performance computing, we are focusing on Fortran’s parallel features, commonly denoted “Coarray Fortran.” We are developing what we believe are the first exhaustive, open-source tests for the static semantics of Fortran 2018 parallel features, and contributing them to the LLVM project. A related effort involves writing runtime tests for parallel 2018 features and supporting those tests by developing a new parallel runtime library: the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine).
Naje George
2022
Katherine Rasmussen, Damian Rouson, Naje George, Dan Bonachea, Hussain Kadhem, Brian Friesen, "Agile Acceleration of LLVM Flang Support for Fortran 2018 Parallel Programming", Research Poster at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), November 2022, doi: 10.25344/S4CP4S
The LLVM Flang compiler ("Flang") is currently Fortran 95 compliant, and the frontend can parse Fortran 2018. However, Flang does not have a comprehensive 2018 test suite and does not fully implement the static semantics of the 2018 standard. We are investigating whether agile software development techniques, such as pair programming and test-driven development (TDD), can help Flang to rapidly progress to Fortran 2018 compliance. Because of the paramount importance of parallelism in high-performance computing, we are focusing on Fortran’s parallel features, commonly denoted “Coarray Fortran.” We are developing what we believe are the first exhaustive, open-source tests for the static semantics of Fortran 2018 parallel features, and contributing them to the LLVM project. A related effort involves writing runtime tests for parallel 2018 features and supporting those tests by developing a new parallel runtime library: the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine).
Devarshi Ghoshal
2021
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, "Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows", 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), November 15, 2021, doi: 10.1109/WORKS54523.2021.00014
Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.
Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy, Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, "Experiences with Reproducibility: Case Studies from Scientific Workflows", (P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems, ACM, June 21, 2021, doi: 10.1145/3456287.3465478
Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Michael Beach, Drew Paine, Lavanya Ramakrishnan, "Science Capsule - Capturing the Data Life Cycle", Journal of Open Source Software, 2021, 6:2484, doi: 10.21105/joss.02484
Pieter Ghysels
2022
M. Wang, Y. Liu, P. Ghysels, A. C. Yucel, "VoxImp: Impedance Extraction Simulator for Voxelized Structures", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, November 2, 2022, doi: 10.1109/TCAD.2022.3218768
X. Zhu, Y. Liu, P. Ghysels, D. Bindal, X. S. Li, "GPTuneBand: multi-task and multi-fidelity Bayesian optimization for autotuning large-scale high performance computing applications", SIAM PP, February 23, 2022,
- Download File: GPTuneBand.pdf (pdf: 1.4 MB)
2021
Yang Liu, Pieter Ghysels, Lisa Claus, Xiaoye Sherry Li, "Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations", SIAM J. Sci. Comput., June 22, 2021,
Yang Liu, Xin Xing, Han Guo, Eric Michielssen, Pieter Ghysels, Xiaoye Sherry Li, "Butterfly factorization via randomized matrix-vector multiplications", SIAM J. Sci. Comput., March 9, 2021,
Anna Giannakou
2021
Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert, "Performance Analysis of Scientific Computing Workloads on General Purpose TEEs", Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE, May 2021, doi: 10.1109/IPDPS49936.2021.00115
Patricia Gonzalez
2023
Kylie Huch, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Hyperdimensional Associative Memory Circuit for Scalable Machine Learning", IEEE Transactions on Applied Superconductivity, May 2023,
Patricia Gonzalez-Guerrero, Kylie Huch, Nirmalendu Patra, Thom Popovici, George Michelogiannakis, "An Area Efficient Superconducting Unary CNN Accelerator", IEEE 24th International Symposium on Quality Electronic Design (ISQED), IEEE, April 2023,
Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Gay Bautista, George Michelogiannakis, "PaST-NoC: A Packet-Switched Superconducting Temporal NoC", IEEE Transactions on Applied Superconductivity, January 2023,
2022
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-Flux Shift Register for Race Logic and Its Applications", IEEE Transactions on Circuits and Systems I: Regular Papers, October 2022,
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, Kylie Huch, George Michelogiannakis, "Superconducting Digital DIT Butterfly Unit for Fast Fourier Transform Using Race Logic", 2022 20th IEEE Interregional NEWCAS Conference (NEWCAS), IEEE, June 2022, 441-445,
Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, George Michelogiannakis, "Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators", 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), ACM, February 2022,
- Download File: asplos2022.pdf (pdf: 1.9 MB)
2021
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-flux Shift Buffer for Race Logic", 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), August 2021,
George Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Bautista, Dilip Vasudevan, Anastasiia Butko, "SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC", IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021,
Max Grossman
2023
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2022
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2021
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Paul H. Hargrove, Dan Bonachea, Max Grossman, Amir Kamil, Colin A. MacLean, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'21)", Poster at Exascale Computing Project (ECP) Annual Meeting 2021, April 2021,
We present UPC++ and GASNet-EX, which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Memory Access (RMA) and Remote Procedure Call (RPC). The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems
Junmin Gu
2022
Lipeng Wan, Axel Huebl, Junmin Gu, Franz Poeschel, Ana Gainaru, Ruonan Wang, Jieyang Chen, Xin Liang, Dmitry Ganyushin, Todd Munson, Ian Foster, Jean-Luc Vay, Norbert Podhorszki, Kesheng Wu, Scott Klasky, "Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization", IEEE Transactions on Parallel and Distributed Systems, 2022, 33:878-890, doi: 10.1109/TPDS.2021.3100784
E. Wes Bethel, Burlen Loring, Utkarsh Ayachit, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, Dave Pugmire, Silvio Rizzi, Thompson, Will Usher, Gunther H. Weber, Brad Whitlock, Wolf, Kesheng Wu, "Proximity Portability and In Transit, M-to-N Data Partitioning and Movement in SENSEI", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_20
E. Wes Bethel, Burlen Loring, Utkarsh Ayatchit, David Camp, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, David Pugmire, Silvio Rizzi, Thompson, Gunther H. Weber, Brad Whitlock, Matthew Wolf, Kesheng Wu, "The SENSEI Generic In Situ Interface: Tool and Processing Portability at Scale", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_13
2021
Franz Poeschel, Juncheng E, William F. Godoy, Norbert Podhorszki, Scott Klasky, Greg Eisenhauer, Philip E. Davis, Lipeng Wan, Ana Gainaru, Junmin Gu, Fabian Koller, René Widera, Michael Bussmann, Axel Huebl, "Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2", Smoky Mountains Computational Sciences and Engineering Conference (SMC2021), 2021,
Giulia Guidi
2021
Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, Aydın Buluç, "BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 2021, doi: 10.1101/464420
G Guidi, M Ellis, A Buluç, K Yelick, D Culler, "10 years later: Cloud computing is closing the performance gap", ICPE 2021 - Companion of the ACM/SPEC International Conference on Performance Engineering, January 1, 2021, 41--48, doi: 10.1145/3447545.3451183
Daniel Gunter
2023
Mohammed A. Alhussaini, Zachary M. Binger, Bianca M. Souza-Chaves, Oluwamayowa O. Amusat, Jangho Park, Timothy V. Bartholomew, Dan Gunter, Andrea Achilli, "Analysis of backwash settings to maximize net water production in an engineering-scale ultrafiltration system for water reuse", Journal of Water Process Engineering, 2023, 53, doi: 10.1016/j.jwpe.2023.103761
2022
Andrew Adams, Emily K. Adams, Dan Gunter, Ryan Kiser, Mark Krenz, Sean Peisert, John Zage, "Roadmap for Securing Operational Technology in NSF Scientific Research", Trusted CI Report, November 16, 2022, doi: 10.5281/zenodo.7327987
Emily K. Adams, Daniel Gunter, Ryan Kiser, Mark Krenz, Sean Peisert, Susan Sons, John Zage, "Findings of the 2022 Trusted CI Study on the Security of Operational Technology in NSF Scientific Research", Trusted CI Report, July 15, 2022, doi: doi.org/10.5281/zenodo.6828675
2021
Dan Gunter, Oluwamayowa Amusat, Tim Bartholomew, Markus Drouven, "Santa Barbara Desalination Digital Twin Technical Report", LBNL Technical Report, 2021, LBNL LBNL-2001437,
Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy, Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, "Experiences with Reproducibility: Case Studies from Scientific Workflows", (P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems, ACM, June 21, 2021, doi: 10.1145/3456287.3465478
Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.
Ankur Kumar Gupta
2021
Ankur K. Gupta, Benjamin C. Gamoke, Krishnan Raghavachari, Interaction–Deletion: A Composite Energy Method for the Optimization of Molecular Systems Selectively Removing Specific Nonbonded Interactions, The Journal of Physical Chemistry A, Pages: 4668-4682 2021, doi: 10.1021/acs.jpca.1c02918
Ankur Gupta
2021
Ankur K. Gupta, Benjamin C. Gamoke, Krishnan Raghavachari, Interaction–Deletion: A Composite Energy Method for the Optimization of Molecular Systems Selectively Removing Specific Nonbonded Interactions, The Journal of Physical Chemistry A, Pages: 4668-4682 2021, doi: 10.1021/acs.jpca.1c02918
Paul H. Hargrove
2023
Julian Bellavita, Mathias Jacquelin, Esmond G. Ng, Dan Bonachea, Johnny Corbino, Paul H. Hargrove, "symPACK: A GPU-Capable Fan-Out Sparse Cholesky Solver", 2023 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM'23), ACM, November 13, 2023, doi: 10.25344/S49P45
Sparse symmetric positive definite systems of equations are ubiquitous in scientific workloads and applications. Parallel sparse Cholesky factorization is the method of choice for solving such linear systems. Therefore, the development of parallel sparse Cholesky codes that can efficiently run on today’s large-scale heterogeneous distributed-memory platforms is of vital importance. Modern supercomputers offer nodes that contain a mix of CPUs and GPUs. To fully utilize the computing power of these nodes, scientific codes must be adapted to offload expensive computations to GPUs.
We present symPACK, a GPU-capable parallel sparse Cholesky solver that uses one-sided communication primitives and remote procedure calls provided by the UPC++ library. We also utilize the UPC++ "memory kinds" feature to enable efficient communication of GPU-resident data. We show that on a number of large problems, symPACK outperforms comparable state-of-the-art GPU-capable Cholesky factorization codes by up to 14x on the NERSC Perlmutter supercomputer.
Michelle Mills Strout, Damian Rouson, Amir Kamil, Dan Bonachea, Jeremiah Corrado, Paul H. Hargrove, Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran, Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC23), November 12, 2023,
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models.
The tutorial is targeted for users with little-to-no parallel programming experience, but everyone is welcome. A partial differential equation example will be demonstrated in all three programming models. That example and others will be provided to attendees in a virtual environment. Attendees will be shown how to compile and run these programming examples, and the virtual environment will remain available to attendees throughout the conference, along with Slack-based interactive tech support.
Come join us to learn about some productive and performant parallel programming models!
Michelle Mills Strout, Damian Rouson, Amir Kamil, Dan Bonachea, Jeremiah Corrado, Paul H. Hargrove, Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran (CUF23), ECP/NERSC/OLCF Tutorial, July 2023,
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models. This tutorial should be accessible to users with little-to-no parallel programming experience, and everyone is welcome. A partial differential equation example will be demonstrated in all three programming models along with performance and scaling results on big machines. That example and others will be provided in a cloud instance and Docker container. Attendees will be shown how to compile and run these programming examples, and provided opportunities to experiment with different parameters and code alternatives while being able to ask questions and share their own observations. Come join us to learn about some productive and performant parallel programming models!
Secondary tutorial sites by event sponsors:
Paul H. Hargrove, PGAS Programming Models: My 20-year Perspective, Keynote for 10th Annual Chapel Implementers and Users Workshop (CHIUW 2023), June 2, 2023, doi: 10.25344/S4K59C
Paul H. Hargrove has been involved in the world of Partitioned Global Address Space (PGAS) programming models since 1999, before he knew such a thing existed. Early involvement in the GASNet communications library as used in implementations of UPC, Titanium and Co-array Fortran convinced Paul that one could have productivity and performance without sacrificing one for the other. Since then he has been among the apostates who work to overturn the belief that message-passing is the only (or best) way to program for High-Performance Computing (HPC). Paul has been fortunate to witness the history of the PGAS community through several rare opportunities, including interactions made possible by the wide adoption of GASNet and through operating a PGAS booth at the annual SC conferences from 2007 to 2017. In this talk, Paul will share some highlights of his experiences across 24 years of PGAS history. Among these is the DARPA High Productivity Computing Systems (HPCS) project which helped give birth to Chapel.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Paul H. Hargrove, Dan Bonachea, Johnny Corbino, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'23)", Poster at Exascale Computing Project (ECP) Annual Meeting 2023, January 2023,
The Pagoda project is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. The first component is GASNet-EX, a portable, high-performance, global-address-space communication library. The second component is UPC++, a C++ template library. Together, these libraries enable agile, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
GASNet-EX is a portable, high-performance communications middleware library which leverages hardware support to implement Remote Memory Access (RMA) and Active Message communication primitives. GASNet-EX supports a broad ecosystem of alternative HPC programming models, including UPC++, Legion, Chapel and multiple implementations of UPC and Fortran Coarrays. GASNet-EX is implemented directly over the native APIs for networks of interest in HPC. The tight semantic match of GASNet-EX APIs to the client requirements and hardware capabilities often yields better performance than competing libraries.
UPC++ provides high-level productivity abstractions appropriate for Partitioned Global Address Space (PGAS) programming such as: remote memory access (RMA), remote procedure call (RPC), support for accelerators (e.g. GPUs), and mechanisms for aggressive asynchrony to hide communication costs. UPC++ implements communication using GASNet-EX, delivering high performance and portability from laptops to exascale supercomputers. HPC application software using UPC++ includes: MetaHipMer2 metagenome assembler, SIMCoV viral propagation simulation, NWChemEx TAMM, and graph computation kernels from ExaGraph.
2022
"Berkeley Lab’s Networking Middleware GASNet Turns 20: Now, GASNet-EX is Gearing Up for the Exascale Era", Linda Vu, HPCWire (Lawrence Berkeley National Laboratory CS Area Communications), December 7, 2022, doi: 10.25344/S4BP4G
GASNet Celebrates 20th Anniversary
For 20 years, Berkeley Lab’s GASNet has been fueling developers’ ability to tap the power of massively parallel supercomputers more effectively. The middleware was recently upgraded to support exascale scientific applications.
Paul H. Hargrove, Dan Bonachea, "GASNet-EX RMA Communication Performance on Recent Supercomputing Systems", 5th Annual Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM'22), November 2022, doi: 10.25344/S40C7D
Partitioned Global Address Space (PGAS) programming models, typified by systems such as Unified Parallel C (UPC) and Fortran coarrays, expose one-sided Remote Memory Access (RMA) communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.
GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in emerging exascale machines. The library is an evolution of the popular GASNet communication system, building upon 20 years of lessons learned. We present microbenchmark results which demonstrate the RMA performance of GASNet-EX is competitive with MPI implementations on four recent, high-impact, production HPC systems. These results are an update relative to previously published results on older systems. The networks measured here are representative of hardware currently used in six of the top ten fastest supercomputers in the world, and all of the exascale systems on the U.S. DOE road map.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Dan Bonachea, Paul H. Hargrove, An Introduction to GASNet-EX for Chapel Users, 9th Annual Chapel Implementers and Users Workshop (CHIUW 2022), June 10, 2022,
Have you ever typed "export CHPL_COMM=gasnet"? If you’ve used Chapel with multi-locale support on a system without "Cray" in the model name, then you’ve probably used GASNet. Did you ever wonder what GASNet is? What GASNet should mean to you? This talk aims to answer those questions and more. Chapel has system-specific implementations of multi-locale communication for Cray-branded systems including the Cray XC and HPE Cray EX lines. On other systems, Chapel communication uses the GASNet communication library embedded in third-party/gasnet. In this talk, that third-party will introduce itself to you in the first person.
Paul H. Hargrove, Dan Bonachea, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'22)", Poster at Exascale Computing Project (ECP) Annual Meeting 2022, May 5, 2022,
We present UPC++ and GASNet-EX, distributed libraries which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Procedure Call (RPC) and for Remote Memory Access (RMA) to host and GPU memories. The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2021
Daniel Waters, Colin A. MacLean, Dan Bonachea, Paul H. Hargrove, "Demonstrating UPC++/Kokkos Interoperability in a Heat Conduction Simulation (Extended Abstract)", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S4630V
We describe the replacement of MPI with UPC++ in an existing Kokkos code that simulates heat conduction within a rectangular 3D object, as well as an analysis of the new code’s performance on CUDA accelerators. The key challenges were packing the halos in Kokkos data structures in a way that allowed for UPC++ remote memory access, and streamlining synchronization costs. Additional UPC++ abstractions used included global pointers, distributed objects, remote procedure calls, and futures. We also make use of the device allocator concept to facilitate data management in memory with unique properties, such as GPUs. Our results demonstrate that despite the algorithm’s good semantic match to message passing abstractions, straightforward modifications to use UPC++ communication deliver vastly improved performance and scalability in the common case. We find the one-sided UPC++ version written in a natural way exhibits good performance, whereas the message-passing version written in a straightforward way exhibits performance anomalies. We argue this represents a productivity benefit for one-sided communication models.
Paul H. Hargrove, Dan Bonachea, Colin A. MacLean, Daniel Waters, "GASNet-EX Memory Kinds: Support for Device Memory in PGAS Programming Models", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21) Research Poster, November 2021, doi: 10.25344/S4P306
Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work includes two major components: UPC++ (a C++ template library) and GASNet-EX (a portable, high-performance communication library). This poster describes recent advances in GASNet-EX to efficiently implement Remote Memory Access (RMA) operations to and from memory on accelerator devices such as GPUs. Performance is illustrated via benchmark results from UPC++ and the Legion programming system, both using GASNet-EX as their communications library.
Katherine A. Yelick, Amir Kamil, Damian Rouson, Dan Bonachea, Paul H. Hargrove, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications (SC21), Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), November 15, 2021,
UPC++ is a C++ library supporting Partitioned Global Address Space (PGAS) programming. UPC++ offers low-overhead one-sided Remote Memory Access (RMA) and Remote Procedure Calls (RPC), along with future/promise-based asynchrony to express dependencies between computation and asynchronous data movement. UPC++ supports simple/regular data structures as well as more elaborate distributed applications where communication is fine-grained and/or irregular. UPC++ provides a uniform abstraction for one-sided RMA between host and GPU/accelerator memories anywhere in the system. UPC++'s support for aggressive asynchrony enables applications to effectively overlap communication and reduce latency stalls, while the underlying GASNet-EX communication library delivers efficient low-overhead RMA/RPC on HPC networks.
This tutorial introduces UPC++, covering the memory and execution models and basic algorithm implementations. Participants gain hands-on experience incorporating UPC++ features into application proxy examples. We examine a few UPC++ applications with irregular communication (metagenomic assembler and COVID-19 simulation) and describe how they utilize UPC++ to optimize communication performance.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Paul H. Hargrove, Dan Bonachea, Max Grossman, Amir Kamil, Colin A. MacLean, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'21)", Poster at Exascale Computing Project (ECP) Annual Meeting 2021, April 2021,
We present UPC++ and GASNet-EX, which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Memory Access (RMA) and Remote Procedure Call (RPC). The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems
Dan Bonachea, GASNet-EX: A High-Performance, Portable Communication Library for Exascale, Berkeley Lab – CS Seminar, March 10, 2021,
- Download File: GASNet-2021-LBL-seminar-slides.pdf (pdf: 9.1 MB)
Partitioned Global Address Space (PGAS) models, pioneered by languages such as Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.
GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in future exascale machines. The library is an evolution of the popular GASNet communication system, building on 20 years of lessons learned. We describe several features and enhancements that have been introduced to address the needs of modern runtimes and exploit the hardware capabilities of emerging systems. Microbenchmark results demonstrate the RMA performance of GASNet-EX is competitive with several MPI implementations on current systems. GASNet-EX provides communication services that help to deliver speedups in HPC applications written using the UPC++ library, enabling new science on pre-exascale systems.
Ammar Haydari
2021
Ammar Haydari, Michael Zhang, Chen-Nee Chuah, Jane Macfarlane, Sean Peisert, Adaptive Differential Privacy Mechanism for Aggregated Mobility Dataset, arXiv preprint arXiv:2112.08487, December 10, 2021,
Matthew Henderson
2021
C Varadharajan, Z Kakalia, E Alper, EL Brodie, M Burrus, RWH Carroll, D Christianson, W Dong, V Hendrix, M Henderson, S Hubbard, D Johnson, R Versteeg, KH Williams, DA Agarwal, The Colorado East River Community Observatory Data Collection, Hydrological Processes 35(6), 2021, doi: 10.22541/au.161962485.54378235/v1
Valerie Hendrix
2022
H Weierbach, AR Lima, JD Willard, VC Hendrix, DS Christianson, M Lubich, C Varadharajan, Stream Temperature Predictions for River Basin Management in the Pacific Northwest and Mid-Atlantic Regions Using Machine Learning, Water (Switzerland), 2022, doi: 10.3390/w14071032
C Varadharajan, AP Appling, B Arora, DS Christianson, VC Hendrix, V Kumar, AR Lima, J Müller, S Oliver, M Ombadi, T Perciano, JM Sadler, H Weierbach, JD Willard, Z Xu, J Zwart, "Can machine learning accelerate process understanding and decision-relevant predictions of river water quality?", Hydrological Processes, January 1, 2022, 36, doi: 10.1002/hyp.14565
MB Simmonds, WJ Riley, DA Agarwal, X Chen, S Cholia, R Crystal-Ornelas, ET Coon, D Dwivedi, VC Hendrix, M Huang, A Jan, Z Kakalia, J Kumar, CD Koven, L Li, M Melara, L Ramakrishnan, DM Ricciuto, AP Walker, W Zhi, Q Zhu, C Varadharajan, Guidelines for Publicly Archiving Terrestrial Model Data to Enhance Usability, Intercomparison, and Synthesis, Data Science Journal, 2022, doi: 10.5334/dsj-2022-003
C Varadharajan, VC Hendrix, DS Christianson, M Burrus, C Wong, SS Hubbard, DA Agarwal, BASIN-3D: A brokering framework to integrate diverse environmental data, Computers and Geosciences, 2022, doi: 10.1016/j.cageo.2021.105024
2021
C Varadharajan, Z Kakalia, E Alper, EL Brodie, M Burrus, RWH Carroll, D Christianson, W Dong, V Hendrix, M Henderson, S Hubbard, D Johnson, R Versteeg, KH Williams, DA Agarwal, The Colorado East River Community Observatory Data Collection, Hydrological Processes 35(6), 2021, doi: 10.22541/au.161962485.54378235/v1
JE Damerow, C Varadharajan, K Boye, EL Brodie, M Burrus, KD Chadwick, R Crystal-Ornelas, H Elbashandy, RJ Eloy Alves, KS Ely, AE Goldman, T Haberman, V Hendrix, Z Kakalia, KM Kemner, AB Kersting, N Merino, F O Brien, Z Perzan, E Robles, P Sorensen, JC Stegen, RL Walls, P Weisenhorn, M Zavarin, D Agarwal, Sample identifiers and metadata to support data management and reuse in multidisciplinary ecosystem sciences, Data Science Journal, 2021, doi: 10.5334/dsj-2021-011
R Crystal-Ornelas, C Varadharajan, B Bond-Lamberty, K Boye, M Burrus, S Cholia, M Crow, J Damerow, R Devarakonda, KS Ely, A Goldman, S Heinz, V Hendrix, Z Kakalia, SC Pennington, E Robles, A Rogers, M Simmonds, T Velliquette, H Weierbach, P Weisenhorn, JN Welch, DA Agarwal, A Guide to Using GitHub for Developing and Versioning Data Standards and Reporting Formats, Earth and Space Science, 2021, doi: 10.1029/2021EA001797
Monica Hernandez
2023
"Arte inspirando la informática cuántica en el Advanced Quantum Testbed", Monica Hernandez, July 7, 2023,
"Art Inspiring a Quantum-Ready Vision at the Advanced Quantum Testbed", Monica Hernandez, July 7, 2023,
"Éxito reportado en la generación de operaciones cuánticas entrelazadas de dos cutrits con alta fidelidad", Monica Hernandez, July 6, 2023,
"Success Generating Two-Qutrit Entangling Gates With High Fidelity", Monica Hernandez, July 6, 2023,
"Innovating quantum computers with fluxonium processors", Monica Hernandez, News release, April 11, 2023,
Monica Hernandez, "Quantum Systems Accelerator 2023 Impact Report", Impact Report, March 17, 2023,
2022
Monica Hernandez, Quantum Computing Workshop Brings Classical Control Systems Into Focus, News release, December 20, 2022,
"Jumpstarting the Future Quantum Workforce", Monica Hernandez, Feature, December 13, 2022,
"The Sparks That Ignited Curiosity: How Quantum Researchers Found Their Path", Monica Hernandez, Feature, October 14, 2022,
"La curiosidad por la informática cuántica: Cómo cinco científicos encontraron su especialización", Monica Hernandez, Feature in Spanish, October 14, 2022,
"El Advanced Quantum Testbed en Berkeley Lab lidera avances científicos para la computación cuántica", Monica Hernandez, Feature in Spanish, October 14, 2022,
"How Berkeley Lab’s Advanced Quantum Testbed Paves Breakthroughs for Quantum Computing", Monica Hernandez, Feature, October 14, 2022,
"How the Five National Quantum Information Science Research Centers Harness the Quantum Revolution", Hannah Adams, Pete Genzer, Monica Hernandez, Leah Hesla, Scott Jones, Elizabeth Rosenthal, Denise Yazak, August 26, 2022,
"QIS Innovation Across the Growing R&D Ecosystem", Monica Hernandez, Feature, August 25, 2022,
Monica Hernandez, Optimizing SWAP Networks for Quantum Computing, News release, August 4, 2022,
"QSA Scientists Participated in ‘QIS For Everyone’ Briefing", Monica Hernandez, Feature, July 13, 2022,
Monica Hernandez, Breakthrough in Quantum Universal Gate Sets: A High-Fidelity iToffoli Gate, News release, May 24, 2022,
"Inspiring High Schoolers to Learn Quantum Computing", Monica Hernandez, Feature, April 14, 2022,
"Meet QSA’s Early-Career Researchers Advancing the QIS Frontier", Monica Hernandez, Feature, April 14, 2022,
"AQT-Zurich Instruments Partnership Enables Groundbreaking Quantum Information Science", Monica Hernandez, Feature, April 14, 2022,
Monica Hernandez, "Advanced Quantum Testbed 2021 Progress Report", Progress Report, April 14, 2022,
Monica Hernandez, Joe Chew, Open Sourced Control Hardware for Quantum Computers, News release, February 24, 2022,
"QSA’s Science Breakthroughs in 2021", Monica Hernandez, Feature, February 17, 2022,
2021
"Advancing Quantum Engineering: A Must-Do for Quantum Computing", Monica Hernandez, Feature, December 20, 2021,
"How the Advanced Quantum Testbed Prepares the New Quantum Workforce", Monica Hernandez, Feature, December 14, 2021,
Monica Hernandez, Crucial Leap in Error Mitigation for Quantum Computers, News release, December 9, 2021,
Monica Hernandez, How a Novel Radio Frequency Control System Enhances Quantum Computers, News release, November 9, 2021,
"Rising Talent in Quantum Computing: Meet Early Career Researchers at QSA", Monica Hernandez, November 4, 2021,
"K-12 Career Talk: A Day in the Life of an AQT scientist", Monica Hernandez, Feature, October 22, 2021,
"El Advanced Quantum Testbed avanza tecnologías y talento para la computación cuántica", Monica Hernandez, Feature in Spanish, October 13, 2021,
"The Advanced Quantum Testbed Propels Quantum Information Technologies and Talent", Monica Hernandez, Feature, October 13, 2021,
"Why QSA Advances 2D Materials for Quantum Computing", Monica Hernandez, Feature, September 28, 2021,
Monica Hernandez, Raising the Bar in Error Characterization for Qutrit-Based Quantum Computing, News release, September 20, 2021,
"How the Quantum Systems Accelerator Set A Shared Direction in Electronic Controls for Quantum Computing", Monica Hernandez, Feature, August 20, 2021,
"Leading with Breakthrough Science at the Advanced Quantum Testbed User Program", Monica Hernandez, Feature, July 29, 2021,
"The Quantum Systems Accelerator Hosts First Industry Roundtable", Monica Hernandez, Feature, June 22, 2021,
"AQT Positions Itself as Hub for Quantum Computing Startups", Monica Hernandez, Feature, June 16, 2021,
Steven Hofmeyr
2023
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2022
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2021
Melanie E. Moses, Steven Hofmeyr, Judy L Cannon, Akil Andrews, Rebekah Gridley, Monica Hinga, Kirtus Leyba, Abigail Pribisova, Vanessa Surjadidjaja, Humayra Tasnim, Stephanie Forrest, "Spatially distributed infection increases viral load in a computational model of SARS-CoV-2 lung infection", PLOS Computational Biology, December 2021, 17(12), doi: 10.1371/journal.pcbi.1009735
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
MG Awan, S Hofmeyr, R Egan, N Ding, A Buluc, J Deslippe, L Oliker, K Yelick, "Accelerating Large Scale de novo Metagenome Assembly Using GPUs", International Conference for High Performance Computing, Networking, Storage and Analysis, SC, January 1, 2021, doi: 10.1145/3458817.3476212
Kylie Morgan Huch
2023
Kylie Huch, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Hyperdimensional Associative Memory Circuit for Scalable Machine Learning", IEEE Transactions on Applied Superconductivity, May 2023,
Patricia Gonzalez-Guerrero, Kylie Huch, Nirmalendu Patra, Thom Popovici, George Michelogiannakis, "An Area Efficient Superconducting Unary CNN Accelerator", IEEE 24th International Symposium on Quality Electronic Design (ISQED), IEEE, April 2023,
2022
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, Kylie Huch, George Michelogiannakis, "Superconducting Digital DIT Butterfly Unit for Fast Fourier Transform Using Race Logic", 2022 20th IEEE Interregional NEWCAS Conference (NEWCAS), IEEE, June 2022, 441-445,
Costin Iancu
2022
Mathias Weiden, Justin Kalloor, John Kubiatowicz, Ed Younis, Costin Iancu, "Wide Quantum Circuit Optimization with Topology Aware Synthesis", Third International Workshop on Quantum Computing Software, November 13, 2022,
Unitary synthesis is an optimization technique that can achieve optimal gate counts while mapping quantum circuits to restrictive qubit topologies. Synthesis algorithms are limited in scalability by their exponentially growing run times. Application to wide circuits requires partitioning into smaller components. In this work, we explore methods to reduce depth and multi-qubit gate count of wide, mapped quantum circuits using synthesis. We present TopAS, a topology aware synthesis tool that preconditions quantum circuits before mapping. Partitioned subcircuits are optimized and fitted to sparse subtopologies to balance the opposing demands of synthesis and mapping algorithms. Compared to state of the art wide circuit synthesis algorithms, TopAS is able to reduce depth on average by 35.2% and CNOT count by 11.5% for mesh topologies. Compared to the optimization and mapping algorithms of Qiskit and Tket, TopAS is able to reduce CNOT counts by 30.3% and depth by 38.2% on average.
2021
Ed Younis, Koushik Sen, Katherine Yelick, Costin Iancu, QFAST: Quantum Synthesis Using a Hierarchical Continuous Circuit Space, Bulletin of the American Physical Society, March 2021,
We present QFAST, a quantum synthesis tool designed to produce short circuits and to scale well in practice. Our contributions are: 1) a novel representation of circuits able to encode placement and topology; 2) a hierarchical approach with an iterative refinement formulation that combines "coarse-grained" fast optimization during circuit structure search with a good, but slower, optimization stage only in the final circuit instantiation. When compared against state-of-the-art techniques, although not always optimal, QFAST can reduce circuits for "time-dependent evolution" algorithms, as used by domain scientists, by 60x in depth. On typical circuits, it provides 4x better depth reduction than the widely used Qiskit and UniversalQ compilers. We also show the composability and tunability of our formulation in terms of circuit depth and running time. For example, we show how to generate shorter circuits by plugging in the best available third party synthesis algorithm at a given hierarchy level. Composability enables portability across chip architectures, which is missing from similar approaches.
QFAST is integrated with Qiskit and available at github.com/bqskit.
Akel Hashim, Ravi Naik, Alexis Morvan, Jean-Loup Ville, Brad Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin O Brien, Ian Hincks, Joel Wallman, Joseph V Emerson, David Ivan Santiago, Irfan Siddiqi, Scalable Quantum Computing on a Noisy Superconducting Quantum Processor via Randomized Compiling, Bulletin of the American Physical Society, 2021,
Coherent errors in quantum hardware severely limit the performance of quantum algorithms in an unpredictable manner, and mitigating their impact is necessary for realizing reliable, large-scale quantum computations. Randomized compiling achieves this goal by converting coherent errors into stochastic noise, dramatically reducing unpredictable errors in quantum algorithms and enabling accurate predictions of aggregate performance via cycle benchmarking estimates. In this work, we demonstrate significant performance gains under randomized compiling for both the four-qubit quantum Fourier transform algorithm and for random circuits of variable depth on a superconducting quantum processor. We also validate solution accuracy using experimentally-measured error rates. Our results demonstrate that randomized compiling can be utilized to maximally-leverage and predict the capabilities of modern-day noisy quantum processors, paving the way forward for scalable quantum computing.
Khaled Ibrahim
2022
K. Ibrahim, L. Oliker,, "Preprocessing Pipeline Optimization for Scientific Deep-Learning Workloads", IPDPS 22, June 3, 2022,
- Download File: SciML-optimization-12.pdf (pdf: 17 MB)
2021
Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,
- Download File: pmbs21-DL-final.pdf (pdf: 632 KB)
Khaled Ibrahim, Roofline on GPUs (advanced topics), ECP Annual Meeting, April 2021,
- Download File: ECP21-Roofline-6-advanced.pdf (pdf: 15 MB)
Mathias Jacquelin
2023
Julian Bellavita, Mathias Jacquelin, Esmond G. Ng, Dan Bonachea, Johnny Corbino, Paul H. Hargrove, "symPACK: A GPU-Capable Fan-Out Sparse Cholesky Solver", 2023 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM'23), ACM, November 13, 2023, doi: 10.25344/S49P45
Sparse symmetric positive definite systems of equations are ubiquitous in scientific workloads and applications. Parallel sparse Cholesky factorization is the method of choice for solving such linear systems. Therefore, the development of parallel sparse Cholesky codes that can efficiently run on today’s large-scale heterogeneous distributed-memory platforms is of vital importance. Modern supercomputers offer nodes that contain a mix of CPUs and GPUs. To fully utilize the computing power of these nodes, scientific codes must be adapted to offload expensive computations to GPUs.
We present symPACK, a GPU-capable parallel sparse Cholesky solver that uses one-sided communication primitives and remote procedure calls provided by the UPC++ library. We also utilize the UPC++ "memory kinds" feature to enable efficient communication of GPU-resident data. We show that on a number of large problems, symPACK outperforms comparable state-of-the-art GPU-capable Cholesky factorization codes by up to 14x on the NERSC Perlmutter supercomputer.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2022
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2021
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Revathi Jambunathan
2023
H. Klion, R. Jambunathan, M. E. Rowan, E. Yang, D. Willcox, J.-L. Vay, R. Lehe, A. Myers, A. Huebl, W. Zhang, "Particle-in-Cell Simulations of Relativistic Magnetic Reconnection with Advanced Maxwell Solver Algorithms", arXiv preprint, submitted to The Astrophysical Journal, April 20, 2023,
2022
Z. Yao, R. Jambunathan, Y. Zeng, and A. Nonaka, "A Massively Parallel Time-Domain Coupled Electrodynamics-Micromagnetics Solver", International Journal of High Performance Computing Applications, January 10, 2022, accepted,
Hans Johansen
2023
Will Thacher and Hans Johansen and Daniel Martin, "A high order Cartesian grid, finite volume method for elliptic interface problems", Journal of Computational Physics, October 15, 2023, 491, doi: 10.1016/j.jcp.2023.112351
2022
Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams, "Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations", MCHPC, November 2022,
- Download File: MCHPC22_final.pdf (pdf: 401 KB)
2021
Tuowen Zhao, Mary Hall, Hans Johansen, Samuel Williams, "Improving Communication by Optimizing On-Node Data Movement with Data Layout", PPoPP, February 2021,
- Download File: PPoPP-Bricks-MPI-final.pdf (pdf: 864 KB)
Xiangyang Ju
2022
Alina Lazar, others, Accelerating the Inference of the Exa.TrkX Pipeline, 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded - Towards Sustainable, Diverse, Performant and Effective Scientific Computing, 2022,
Chun-Yi Wang, others, Reconstruction of Large Radius Tracks with the Exa.TrkX pipeline, 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded - Towards Sustainable, Diverse, Performant and Effective Scientific Computing, 2022,
2021
Xiangyang Ju, others, Performance of a geometric deep learning pipeline for HL-LHC particle tracking, Eur. Phys. J. C, Pages: 876 2021, doi: 10.1140/epjc/s10052-021-09675-8
Hussain Kadhem
2022
Katherine Rasmussen, Damian Rouson, Naje George, Dan Bonachea, Hussain Kadhem, Brian Friesen, "Agile Acceleration of LLVM Flang Support for Fortran 2018 Parallel Programming", Research Poster at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), November 2022, doi: 10.25344/S4CP4S
The LLVM Flang compiler ("Flang") is currently Fortran 95 compliant, and the frontend can parse Fortran 2018. However, Flang does not have a comprehensive 2018 test suite and does not fully implement the static semantics of the 2018 standard. We are investigating whether agile software development techniques, such as pair programming and test-driven development (TDD), can help Flang to rapidly progress to Fortran 2018 compliance. Because of the paramount importance of parallelism in high-performance computing, we are focusing on Fortran’s parallel features, commonly denoted “Coarray Fortran.” We are developing what we believe are the first exhaustive, open-source tests for the static semantics of Fortran 2018 parallel features, and contributing them to the LLVM project. A related effort involves writing runtime tests for parallel 2018 features and supporting those tests by developing a new parallel runtime library: the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine).
Amir Kamil
2023
Michelle Mills Strout, Damian Rouson, Amir Kamil, Dan Bonachea, Jeremiah Corrado, Paul H. Hargrove, Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran, Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC23), November 12, 2023,
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models.
The tutorial is targeted for users with little-to-no parallel programming experience, but everyone is welcome. A partial differential equation example will be demonstrated in all three programming models. That example and others will be provided to attendees in a virtual environment. Attendees will be shown how to compile and run these programming examples, and the virtual environment will remain available to attendees throughout the conference, along with Slack-based interactive tech support.
Come join us to learn about some productive and performant parallel programming models!
Michelle Mills Strout, Damian Rouson, Amir Kamil, Dan Bonachea, Jeremiah Corrado, Paul H. Hargrove, Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran (CUF23), ECP/NERSC/OLCF Tutorial, July 2023,
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models. This tutorial should be accessible to users with little-to-no parallel programming experience, and everyone is welcome. A partial differential equation example will be demonstrated in all three programming models along with performance and scaling results on big machines. That example and others will be provided in a cloud instance and Docker container. Attendees will be shown how to compile and run these programming examples, and provided opportunities to experiment with different parameters and code alternatives while being able to ask questions and share their own observations. Come join us to learn about some productive and performant parallel programming models!
Secondary tutorial sites by event sponsors:
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 31, 2023, LBNL 2001516, doi: 10.25344/S46W2J
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Paul H. Hargrove, Dan Bonachea, Johnny Corbino, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'23)", Poster at Exascale Computing Project (ECP) Annual Meeting 2023, January 2023,
The Pagoda project is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. The first component is GASNet-EX, a portable, high-performance, global-address-space communication library. The second component is UPC++, a C++ template library. Together, these libraries enable agile, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
GASNet-EX is a portable, high-performance communications middleware library which leverages hardware support to implement Remote Memory Access (RMA) and Active Message communication primitives. GASNet-EX supports a broad ecosystem of alternative HPC programming models, including UPC++, Legion, Chapel and multiple implementations of UPC and Fortran Coarrays. GASNet-EX is implemented directly over the native APIs for networks of interest in HPC. The tight semantic match of GASNet-EX APIs to the client requirements and hardware capabilities often yields better performance than competing libraries.
UPC++ provides high-level productivity abstractions appropriate for Partitioned Global Address Space (PGAS) programming such as: remote memory access (RMA), remote procedure call (RPC), support for accelerators (e.g. GPUs), and mechanisms for aggressive asynchrony to hide communication costs. UPC++ implements communication using GASNet-EX, delivering high performance and portability from laptops to exascale supercomputers. HPC application software using UPC++ includes: MetaHipMer2 metagenome assembler, SIMCoV viral propagation simulation, NWChemEx TAMM, and graph computation kernels from ExaGraph.
2022
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001480, doi: 10.25344/S4M59P
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Paul H. Hargrove, Dan Bonachea, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'22)", Poster at Exascale Computing Project (ECP) Annual Meeting 2022, May 5, 2022,
We present UPC++ and GASNet-EX, distributed libraries which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Procedure Call (RPC) and for Remote Memory Access (RMA) to host and GPU memories. The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001452, doi: 10.25344/S4530J
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
2021
Amir Kamil, Dan Bonachea, "Optimization of Asynchronous Communication Operations through Eager Notifications", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S42C71
UPC++ is a C++ library implementing the Asynchronous Partitioned Global Address Space (APGAS) model. We propose an enhancement to the completion mechanisms of UPC++ used to synchronize communication operations that is designed to reduce overhead for on-node operations. Our enhancement permits eager delivery of completion notification in cases where the data transfer semantics of an operation happen to complete synchronously, for example due to the use of shared-memory bypass. This semantic relaxation allows removing significant overhead from the critical path of the implementation in such cases. We evaluate our results on three different representative systems using a combination of microbenchmarks and five variations of the the HPCChallenge RandomAccess benchmark implemented in UPC++ and run on a single node to accentuate the impact of locality. We find that in RMA versions of the benchmark written in a straightforward manner (without manually optimizing for locality), the new eager notification mode can provide up to a 25% speedup when synchronizing with promises and up to a 13.5x speedup when synchronizing with conjoined futures. We also evaluate our results using a graph matching application written with UPC++ RMA communication, where we measure overall speedups of as much as 11% in single-node runs of the unmodified application code, due to our transparent enhancements.
Katherine A. Yelick, Amir Kamil, Damian Rouson, Dan Bonachea, Paul H. Hargrove, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications (SC21), Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), November 15, 2021,
UPC++ is a C++ library supporting Partitioned Global Address Space (PGAS) programming. UPC++ offers low-overhead one-sided Remote Memory Access (RMA) and Remote Procedure Calls (RPC), along with future/promise-based asynchrony to express dependencies between computation and asynchronous data movement. UPC++ supports simple/regular data structures as well as more elaborate distributed applications where communication is fine-grained and/or irregular. UPC++ provides a uniform abstraction for one-sided RMA between host and GPU/accelerator memories anywhere in the system. UPC++'s support for aggressive asynchrony enables applications to effectively overlap communication and reduce latency stalls, while the underlying GASNet-EX communication library delivers efficient low-overhead RMA/RPC on HPC networks.
This tutorial introduces UPC++, covering the memory and execution models and basic algorithm implementations. Participants gain hands-on experience incorporating UPC++ features into application proxy examples. We examine a few UPC++ applications with irregular communication (metagenomic assembler and COVID-19 simulation) and describe how they utilize UPC++ to optimize communication performance.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001425, doi: 10.25344/S4XK53
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
Paul H. Hargrove, Dan Bonachea, Max Grossman, Amir Kamil, Colin A. MacLean, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'21)", Poster at Exascale Computing Project (ECP) Annual Meeting 2021, April 2021,
We present UPC++ and GASNet-EX, which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Memory Access (RMA) and Remote Procedure Call (RPC). The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2021.3.0", Lawrence Berkeley National Laboratory Tech Report, March 31, 2021, LBNL 2001388, doi: 10.25344/S4K881
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
Qiao Kang
2021
Qiao Kang, Scot Breitenfeld, Kaiyuan Hou, Wei-keng Liao, Robert Ross, and Suren Byna,, "Optimizing Performance of Parallel I/O Accesses to Non-contiguous Blocks in Multiple Array Variables", IEEE BigData 2021 conference, December 19, 2021,
Tonglin Li, Suren Byna, Quincey Koziol, Houjun Tang, Jean Luca Bez, Qiao Kang, "h5bench: HDF5 I/O Kernel Suite for Exercising HPC I/O Patterns", Cray User Group (CUG) 2021, January 1, 2021,
M. Ozan Karsavuran
2023
Nabil Abubaker, Orhun Caglayan, M. Ozan Karsavuran, Cevdet Aykanat,, "Minimizing Staleness and Communication Overhead in Distributed SGD for Collaborative Filtering", IEEE Transactions on Computers, May 2023, doi: 10.1109/TC.2023.3275107
Nabil Abubaker, M. Ozan Karsavuran, Cevdet Aykanat, "Scaling Stratified Stochastic Gradient Descent for Distributed Matrix Completion", IEEE Transactions on Knowledge and Data Engineering, March 2023, doi: 10.1109/TKDE.2023.3253791
2022
Mestan Firat Celiktug, M. Ozan Karsavuran, Seher Acer, Cevdet Aykanat, "Simultaneous Computational and Data Load Balancing in Distributed-Memory Setting", SIAM Journal on Scientific Computing, November 2022, 44(6):C399-C424, doi: 10.1137/22M1485772
Nabil Abubaker, M. Ozan Karsavuran, Cevdet Aykanat, "Scalable Unsupervised ML: Latency Hiding in Distributed Sparse Tensor Decomposition", IEEE Transactions on Parallel and Distributed Systems, November 2022, 33(11):3028-3040, doi: 10.1109/TPDS.2021.3128827
2021
M. Ozan Karsavuran, Seher Acer, Cevdet Aykanat, Medium-Grain Partitioning for Sparse Tensor Decomposition, SIAM Conference on Computational Science and Engineering (CSE21), 2021,
M. Ozan Karsavuran, Seher Acer, Cevdet Aykanat, "Partitioning Models for General Medium-Grain Parallel Sparse Tensor Decomposition", IEEE Transactions on Parallel and Distributed Systems, January 2021, 32(1):147--159, doi: 10.1109/TPDS.2020.3012624
Reijo Keskitalo
2022
M Galloway, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, BeyondPlanck III. Commander3, 2022,
M Galloway, M Reinecke, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, BeyondPlanck VIII. Efficient Sidelobe Convolution and Correction through Spin Harmonics, 2022,
TL Svalheim, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, M Galloway, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, A Zonca, BeyondPlanck X. Bandpass and beam leakage corrections, 2022,
D Herman, B Hensley, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, M Galloway, S Gerakakis, E Gjerløw, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, BeyondPlanck XVI. Limits on Large-Scale Polarized Anomalous Microwave Emission from Planck LFI and WMAP, 2022,
KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, M Galloway, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, M Tomasi, DJ Watts, IK Wehus, A Zacchei, BeyondPlanck XIV. Intensity foreground sampling, degeneracies and priors, 2022,
L Collaboration, E Allys, K Arnold, J Aumont, R Aurlien, S Azzoni, C Baccigalupi, AJ Banday, R Banerji, RB Barreiro, N Bartolo, L Bautista, D Beck, S Beckman, M Bersanelli, F Boulanger, M Brilenkov, M Bucher, E Calabrese, P Campeti, A Carones, FJ Casas, A Catalano, V Chan, K Cheung, Y Chinone, SE Clark, F Columbro, G D Alessandro, PD Bernardis, TD Haan, EDL Hoz, MD Petris, SD Torre, P Diego-Palazuelos, T Dotani, JM Duval, T Elleflot, HK Eriksen, J Errard, T Essinger-Hileman, F Finelli, R Flauger, C Franceschet, U Fuskeland, M Galloway, K Ganga, M Gerbino, M Gervasi, RT Génova-Santos, T Ghigna, S Giardiello, E Gjerløw, J Grain, F Grupp, A Gruppuso, JE Gudmundsson, NW Halverson, P Hargrave, T Hasebe, M Hasegawa, M Hazumi, S Henrot-Versillé, B Hensley, LT Hergt, D Herman, E Hivon, RA Hlozek, AL Hornsby, Y Hoshino, J Hubmayr, K Ichiki, T Iida, H Imada, H Ishino, G Jaehnig, N Katayama, A Kato, R Keskitalo, T Kisner, Y Kobayashi, A Kogut, K Kohri, E Komatsu, K Komatsu, K Konishi, N Krachmalnicoff, CL Kuo, L Lamagna, M Lattanzi, AT Lee, C Leloup, F Levrier, E Linder, G Luzzi, J Macias-Perez, B Maffei, D Maino, S Mandelli, E Martínez-González, S Masi, M Massa, S Matarrese, FT Matsuda, T Matsumura, L Mele, M Migliaccio, Y Minami, A Moggi, J Montgomery, L Montier, G Morgante, B Mot, Y Nagano, T Nagasaki, R Nagata, R Nakano, T Namikawa, F Nati, P Natoli, S Nerval, F Noviello, K Odagiri, S Oguri, H Ohsaki, L Pagano, A Paiella, D Paoletti, A Passerini, G Patanchon, F Piacentini, M Piat, G Polenta, D Poletti, T Prouvé, G Puglisi, D Rambaud, C Raum, S Realini, M Reinecke, M Remazeilles, A Ritacco, G Roudil, JA Rubino-Martin, M Russell, H Sakurai, Y Sakurai, M Sasaki, D Scott, Y Sekimoto, K Shinozaki, M Shiraishi, P Shirron, G Signorelli, F Spinella, S Stever, R Stompor, S Sugiyama, RM Sullivan, A Suzuki, TL Svalheim, E Switzer, R Takaku, H Takakura, Y Takase, A Tartari, Y Terao, J Thermeau, H Thommesen, KL Thompson, M Tomasi, M Tominaga, M Tristram, M Tsuji, M Tsujimoto, L Vacher, P Vielva, N Vittorio, W Wang, K Watanuki, IK Wehus, J Weller, B Westbrook, J Wilms, EJ Wollack, J Yumoto, M Zannoni, Probing Cosmic Inflation with the LiteBIRD Cosmic Microwave Background Polarization Survey, 2022,
DJ Watts, M Galloway, HT Ihle, KJ Andersen, R Aurlien, R Banerji, A Basyrov, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, JR Eskilt, MK Foss, C Franceschet, U Fuskeland, S Galeotta, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, JB Jewell, A Karakci, E Keihänen, R Keskitalo, JGS Lunde, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, M San, NO Stutzer, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, IK Wehus, A Zacchei, From BeyondPlanck to Cosmoglobe: Preliminary WMAP Q-band analysis, 2022,
P Diego-Palazuelos, JR Eskilt, Y Minami, M Tristram, RM Sullivan, AJ Banday, RB Barreiro, HK Eriksen, KM Górski, R Keskitalo, E Komatsu, E Martínez-González, D Scott, P Vielva, IK Wehus, "Cosmic Birefringence from the Planck Data Release 4", Physical review letters, 2022, 128:091302, doi: 10.1103/physrevlett.128.091302
2021
Y Segawa, H Hirose, D Kaneko, M Hasegawa, S Adachi, P Ade, MAOA Faúndez, Y Akiba, K Arnold, J Avva, C Baccigalupi, D Barron, D Beck, S Beckman, F Bianchini, D Boettger, J Borrill, J Carron, S Chapman, K Cheung, Y Chinone, K Crowley, A Cukierman, T De Haan, M Dobbs, R Dunner, HE Bouhargani, T Elleflot, J Errard, G Fabbian, S Feeney, C Feng, T Fujino, N Galitzki, N Goeckner-Wald, J Groh, G Hall, N Halverson, T Hamada, M Hazumi, C Hill, L Howe, Y Inoue, J Ito, G Jaehnig, O Jeong, N Katayama, B Keating, R Keskitalo, S Kikuchi, T Kisner, N Krachmalnicoff, A Kusaka, AT Lee, D Leon, E Linder, LN Lowry, A Mangu, F Matsuda, Y Minami, J Montgomery, M Navaroli, H Nishino, J Peloton, ATP Pham, D Poletti, G Puglisi, C Raum, CL Reichardt, C Ross, M Silva-Feaver, P Siritanasak, R Stompor, A Suzuki, O Tajima, S Takakura, S Takatori, D Tanabe, GP Teply, C Tsai, C Verges, B Westbrook, Y Zhou, "Method for rapid performance validation of large TES bolometer array for POLARBEAR-2A using a coherent millimeter-wave source", AIP Conference Proceedings, 2021, 2319, doi: 10.1063/5.0038197
M Tristram, AJ Banday, KM Górski, R Keskitalo, CR Lawrence, KJ Andersen, RB Barreiro, J Borrill, HK Eriksen, R Fernandez-Cobos, TS Kisner, E Martínez-González, B Partridge, D Scott, TL Svalheim, H Thommesen, IK Wehus, "Planck constraints on the tensor-to-scalar ratio", Astronomy and Astrophysics, 2021, 647, doi: 10.1051/0004-6361/202039585
G Puglisi, R Keskitalo, T Kisner, JD Borrill, Simulating Calibration and Beam Systematics for a Future CMB Space Mission with the TOAST Package, Research Notes of the AAS, Pages: 137--137 2021, doi: 10.3847/2515-5172/ac0823
N Aghanim, Y Akrami, M Ashdown, J Aumont, C Baccigalupi, M Ballardini, AJ Banday, RB Barreiro, N Bartolo, S Basak, R Battye, K Benabed, JP Bernard, M Bersanelli, P Bielewicz, JJ Bock, JR Bond, J Borrill, FR Bouchet, F Boulanger, M Bucher, C Burigana, RC Butler, E Calabrese, JF Cardoso, J Carron, A Challinor, HC Chiang, J Chluba, LPL Colombo, C Combet, D Contreras, BP Crill, F Cuttaia, P De Bernardis, G De Zotti, J Delabrouille, JM Delouis, E DI Valentino, JM DIego, O Doré, M Douspis, A Ducout, X Dupac, S Dusini, G Efstathiou, F Elsner, TA Enßlin, HK Eriksen, Y Fantaye, M Farhang, J Fergusson, R Fernandez-Cobos, F Finelli, F Forastieri, M Frailis, AA Fraisse, E Franceschi, A Frolov, S Galeotta, S Galli, K Ganga, RT Génova-Santos, M Gerbino, T Ghosh, J González-Nuevo, KM Górski, S Gratton, A Gruppuso, JE Gudmundsson, J Hamann, W Handley, FK Hansen, D Herranz, SR Hildebrandt, E Hivon, Z Huang, AH Jaffe, WC Jones, A Karakci, E Keihänen, R Keskitalo, K Kiiveri, J Kim, TS Kisner, L Knox, N Krachmalnicoff, M Kunz, H Kurki-Suonio, G Lagache, JM Lamarre, A Lasenby, M Lattanzi, CR Lawrence, M Le Jeune, P Lemos, J Lesgourgues, F Levrier, A Lewis, M Liguori, "Erratum: Planck 2018 results: VI. Cosmological parameters (Astronomy and Astrophysics (2020) 641 (A6) DOI: 10.1051/0004-6361/201833910)", Astronomy and Astrophysics, 2021, 652, doi: 10.1051/0004-6361/201833910e
M Tristram, AJ Banday, KM Górski, R Keskitalo, CR Lawrence, KJ Andersen, RB Barreiro, J Borrill, LPL Colombo, HK Eriksen, R Fernandez-Cobos, TS Kisner, E Martínez-González, B Partridge, D Scott, TL Svalheim, IK Wehus, Improved limits on the tensor-to-scalar ratio using BICEP and Planck, 2021,
Mariam Kiran
2022
D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, W. Arndt, J. Blaschke, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, T. Lehman, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, L. Stephey, R. Thomas, G. Torok, "LBNL Superfacility Project Report", Lawrence Berkeley National Laboratory, 2022, doi: 10.48550/arXiv.2206.11992
Qiang Du, Dan Wang, Tong Zhou, Antonio Gilardi, Mariam Kiran, Bashir Mohammed, Derun Li, and Russell Wilcox, "Experimental beam combining stabilization using machine learning trained while phases drift", Advanced Solid State Lasers 2022, © 2022 Optica Publishing Group, June 1, 2022, Vol. 30,:pp. 12639-, doi: https://doi.org/10.1364/OE.450255
Sugeerth Murugesan, Mariam Kiran, Bernd Hamann, Gunther H. Weber, "Netostat: Analyzing Dynamic Flow Patterns in High-Speed Networks", Cluster Computing, 2022, doi: 10.1007/s10586-022-03543-0
2021
Shen Sheng, Mariam Kiran, Bashir Mohammed, "DynamicDeepFlow: An Approach for Identifying Changes in Network Traffic Flow Using Unsupervised Clustering", (BEST PAPER) 4th International Conference on Machine Learning for Networking (MLN'2021), December 6, 2021,
V. Dumont, C. Garner, A. Trivedi, C. Jones, V. Ganapati, J. Mueller, T. Perciano, M. Kiran, and M. Day, "HYPPO: A Surrogate-Based Multi-Level Parallelism Tool for Hyperparameter Optimization", 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), November 15, 2021,
Bashir Mohammed, Mariam Kiran, Bjoern Enders, "NetGraf: An End-to-End Learning Network Monitoring Service", 2021 IEEE Workshop on Innovating the Network for Data-Intensive Science (INDIS), November 15, 2021, doi: 10.1109/INDIS54524.2021.00007
B Mohammed, M Kiran; N Krishnaswamy; Keshang, Wu, "Predicting WAN Traffic Volumes using Fourier and Multivariate SARIMA Approach", International Journal of Big Data Intelligence, November 3, 2021, doi: 10.1504/IJBDI.2021.118742
M Kiran, B Mohammed, Q Du, D Wang, S Shen, R Wilcox, "Controlling Laser Beam Combining via an Active Reinforcement Learning Algorithm", Advanced Solid State Lasers 2021, Washington, DC United States, October 4, 2021,
Meriam Gay Bautista, Zhi Jackie Yao, Anastasiia Butko, Mariam Kiran, Mekena Metcalf, "Towards Automated Superconducting Circuit Calibration using Deep Reinforcement Learning", 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Tampa, FL, USA, IEEE, August 23, 2021, pp. 462-46, doi: 10.1109/ISVLSI51109.2021.00091
Ted Kisner
2022
L Collaboration, E Allys, K Arnold, J Aumont, R Aurlien, S Azzoni, C Baccigalupi, AJ Banday, R Banerji, RB Barreiro, N Bartolo, L Bautista, D Beck, S Beckman, M Bersanelli, F Boulanger, M Brilenkov, M Bucher, E Calabrese, P Campeti, A Carones, FJ Casas, A Catalano, V Chan, K Cheung, Y Chinone, SE Clark, F Columbro, G D Alessandro, PD Bernardis, TD Haan, EDL Hoz, MD Petris, SD Torre, P Diego-Palazuelos, T Dotani, JM Duval, T Elleflot, HK Eriksen, J Errard, T Essinger-Hileman, F Finelli, R Flauger, C Franceschet, U Fuskeland, M Galloway, K Ganga, M Gerbino, M Gervasi, RT Génova-Santos, T Ghigna, S Giardiello, E Gjerløw, J Grain, F Grupp, A Gruppuso, JE Gudmundsson, NW Halverson, P Hargrave, T Hasebe, M Hasegawa, M Hazumi, S Henrot-Versillé, B Hensley, LT Hergt, D Herman, E Hivon, RA Hlozek, AL Hornsby, Y Hoshino, J Hubmayr, K Ichiki, T Iida, H Imada, H Ishino, G Jaehnig, N Katayama, A Kato, R Keskitalo, T Kisner, Y Kobayashi, A Kogut, K Kohri, E Komatsu, K Komatsu, K Konishi, N Krachmalnicoff, CL Kuo, L Lamagna, M Lattanzi, AT Lee, C Leloup, F Levrier, E Linder, G Luzzi, J Macias-Perez, B Maffei, D Maino, S Mandelli, E Martínez-González, S Masi, M Massa, S Matarrese, FT Matsuda, T Matsumura, L Mele, M Migliaccio, Y Minami, A Moggi, J Montgomery, L Montier, G Morgante, B Mot, Y Nagano, T Nagasaki, R Nagata, R Nakano, T Namikawa, F Nati, P Natoli, S Nerval, F Noviello, K Odagiri, S Oguri, H Ohsaki, L Pagano, A Paiella, D Paoletti, A Passerini, G Patanchon, F Piacentini, M Piat, G Polenta, D Poletti, T Prouvé, G Puglisi, D Rambaud, C Raum, S Realini, M Reinecke, M Remazeilles, A Ritacco, G Roudil, JA Rubino-Martin, M Russell, H Sakurai, Y Sakurai, M Sasaki, D Scott, Y Sekimoto, K Shinozaki, M Shiraishi, P Shirron, G Signorelli, F Spinella, S Stever, R Stompor, S Sugiyama, RM Sullivan, A Suzuki, TL Svalheim, E Switzer, R Takaku, H Takakura, Y Takase, A Tartari, Y Terao, J Thermeau, H Thommesen, KL Thompson, M Tomasi, M Tominaga, M Tristram, M Tsuji, M Tsujimoto, L Vacher, P Vielva, N Vittorio, W Wang, K Watanuki, IK Wehus, J Weller, B Westbrook, J Wilms, EJ Wollack, J Yumoto, M Zannoni, Probing Cosmic Inflation with the LiteBIRD Cosmic Microwave Background Polarization Survey, 2022,
2021
Y Segawa, H Hirose, D Kaneko, M Hasegawa, S Adachi, P Ade, MAOA Faúndez, Y Akiba, K Arnold, J Avva, C Baccigalupi, D Barron, D Beck, S Beckman, F Bianchini, D Boettger, J Borrill, J Carron, S Chapman, K Cheung, Y Chinone, K Crowley, A Cukierman, T De Haan, M Dobbs, R Dunner, HE Bouhargani, T Elleflot, J Errard, G Fabbian, S Feeney, C Feng, T Fujino, N Galitzki, N Goeckner-Wald, J Groh, G Hall, N Halverson, T Hamada, M Hazumi, C Hill, L Howe, Y Inoue, J Ito, G Jaehnig, O Jeong, N Katayama, B Keating, R Keskitalo, S Kikuchi, T Kisner, N Krachmalnicoff, A Kusaka, AT Lee, D Leon, E Linder, LN Lowry, A Mangu, F Matsuda, Y Minami, J Montgomery, M Navaroli, H Nishino, J Peloton, ATP Pham, D Poletti, G Puglisi, C Raum, CL Reichardt, C Ross, M Silva-Feaver, P Siritanasak, R Stompor, A Suzuki, O Tajima, S Takakura, S Takatori, D Tanabe, GP Teply, C Tsai, C Verges, B Westbrook, Y Zhou, "Method for rapid performance validation of large TES bolometer array for POLARBEAR-2A using a coherent millimeter-wave source", AIP Conference Proceedings, 2021, 2319, doi: 10.1063/5.0038197
M Tristram, AJ Banday, KM Górski, R Keskitalo, CR Lawrence, KJ Andersen, RB Barreiro, J Borrill, HK Eriksen, R Fernandez-Cobos, TS Kisner, E Martínez-González, B Partridge, D Scott, TL Svalheim, H Thommesen, IK Wehus, "Planck constraints on the tensor-to-scalar ratio", Astronomy and Astrophysics, 2021, 647, doi: 10.1051/0004-6361/202039585
G Puglisi, R Keskitalo, T Kisner, JD Borrill, Simulating Calibration and Beam Systematics for a Future CMB Space Mission with the TOAST Package, Research Notes of the AAS, Pages: 137--137 2021, doi: 10.3847/2515-5172/ac0823
N Aghanim, Y Akrami, M Ashdown, J Aumont, C Baccigalupi, M Ballardini, AJ Banday, RB Barreiro, N Bartolo, S Basak, R Battye, K Benabed, JP Bernard, M Bersanelli, P Bielewicz, JJ Bock, JR Bond, J Borrill, FR Bouchet, F Boulanger, M Bucher, C Burigana, RC Butler, E Calabrese, JF Cardoso, J Carron, A Challinor, HC Chiang, J Chluba, LPL Colombo, C Combet, D Contreras, BP Crill, F Cuttaia, P De Bernardis, G De Zotti, J Delabrouille, JM Delouis, E DI Valentino, JM DIego, O Doré, M Douspis, A Ducout, X Dupac, S Dusini, G Efstathiou, F Elsner, TA Enßlin, HK Eriksen, Y Fantaye, M Farhang, J Fergusson, R Fernandez-Cobos, F Finelli, F Forastieri, M Frailis, AA Fraisse, E Franceschi, A Frolov, S Galeotta, S Galli, K Ganga, RT Génova-Santos, M Gerbino, T Ghosh, J González-Nuevo, KM Górski, S Gratton, A Gruppuso, JE Gudmundsson, J Hamann, W Handley, FK Hansen, D Herranz, SR Hildebrandt, E Hivon, Z Huang, AH Jaffe, WC Jones, A Karakci, E Keihänen, R Keskitalo, K Kiiveri, J Kim, TS Kisner, L Knox, N Krachmalnicoff, M Kunz, H Kurki-Suonio, G Lagache, JM Lamarre, A Lasenby, M Lattanzi, CR Lawrence, M Le Jeune, P Lemos, J Lesgourgues, F Levrier, A Lewis, M Liguori, "Erratum: Planck 2018 results: VI. Cosmological parameters (Astronomy and Astrophysics (2020) 641 (A6) DOI: 10.1051/0004-6361/201833910)", Astronomy and Astrophysics, 2021, 652, doi: 10.1051/0004-6361/201833910e
Hannah Klion
2023
H. Klion, R. Jambunathan, M. E. Rowan, E. Yang, D. Willcox, J.-L. Vay, R. Lehe, A. Myers, A. Huebl, W. Zhang, "Particle-in-Cell Simulations of Relativistic Magnetic Reconnection with Advanced Maxwell Solver Algorithms", arXiv preprint, submitted to The Astrophysical Journal, April 20, 2023,
2022
Hannah Klion, Alexander Tchekhovskoy, Daniel Kasen, Adithan Kathirgamaraju, Eliot Quataert, Rodrigo Fernandez, "The impact of r-process heating on the dynamics of neutron star merger accretion disc winds and their electromagnetic radiation", Monthly Notices of the RAS, 2022, 510:2968-2979, doi: 10.1093/mnras/stab3583
2021
Hannah Klion, Paul C. Duffell, Daniel Kasen, Eliot Quataert, "The effect of jet-ejecta interaction on the viewing angle dependence of kilonova light curves", Monthly Notices of the RAS, 2021, 502:865-875, doi: 10.1093/mnras/stab042
Katie Klymko
2021
Daniel R. Ladiges, Sean P. Carney, Andrew Nonaka, Katherine Klymko, Guy C. Moore, Alejandro L. Garcia, Sachin R. Natesh, Aleksandar Donev, John B. Bell, "A Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm for Modeling Electrolytes", Physical Review Fluids, April 1, 2021, 6(4):044309,
Rob Knop
2022
Melissa L. Graham, Robert A. Knop, Thomas Kennedy, Peter E. Nugent, Eric Bellm, Márcio Catelan, Avi Patel, Hayden Smotherman, Monika Soraisam, Steven Stetzler, Lauren N. Aldoroty, Autumn Awbrey, Karina Baeza-Villagra, Pedro H. Bernardinelli, Federica Bianco, Dillon Brout, Riley Clarke, William I. Clarkson, Thomas Collett, James R. A. Davenport, Shenming Fu, John E. Gizis, Ari Heinze, Lei Hu, Saurabh W. Jha, Mario Jurić, J. Bryce Kalmbach, Alex Kim, Chien-Hsiu Lee, Chris Lidman, Mark Magee, Clara E. Martínez-Vázquez, Thomas Matheson, Gautham Narayan, Antonella Palmese, Christopher A. Phillips, Markus Rabus, Armin Rest, Nicolás Rodríguez-Segovia, Rachel Street, A. Katherina Vivas, Lifan Wang, Nicholas Wolf, Jiawen Yang, "Deep drilling in the time domain with DECam: Survey characterization", Monthly Notices of the Royal Astronomical Society, November 2022,
Venkitesh Ayyar, Robert Knop, Autumn Awbrey, Alexis Andersen, Peter Nugent, "Identifying Transient Candidates in the Dark Energy Survey Using Convolutional Neural Networks", Publications of the Astronomical Society of the Pacific, September 2022, 134:094501,
The ability to discover new transient candidates via image differencing without direct human intervention is an important task in observational astronomy. For these kind of image classification problems, machine learning techniques such as Convolutional Neural Networks (CNNs) have shown remarkable success. In this work, we present the results of an automated transient candidate identification on images with CNNs for an extant data set from the Dark Energy Survey Supernova program, whose main focus was on using Type Ia supernovae for cosmology. By performing an architecture search of CNNs, we identify networks that efficiently select non-artifacts (e.g., supernovae, variable stars, AGN, etc.) from artifacts (image defects, mis-subtractions, etc.), achieving the efficiency of previous work performed with random Forests, without the need to expend any effort in feature identification. The CNNs also help us identify a subset of mislabeled images. Performing a relabeling of the images in this subset, the resulting classification with CNNs is significantly better than previous results, lowering the false positive rate by 27% at a fixed missed detection rate of 0.05.
Gerwin Koolstra
2021
G Koolstra, N Stevenson, S Barzili, L Burns, K Siva, S Greenfield, W Livingston, A Hashim, RK Naik, JM Kreikebaum, KP O'Brien, DI Santiago, J Dressel, I Siddiqi, "Monitoring fast superconducting qubit dynamics using a neural network", Preprint, August 2021,
Élie Genois, Jonathan A. Gross, Agustin Di Paolo, Noah J. Stevenson, Gerwin Koolstra, Akel Hashim, Irfan Siddiqi, Alexandre Blais, "Quantum-tailored machine-learning characterization of a superconducting qubit", Preprint, June 24, 2021,
Dianna LaFerry
2023
Alex Doe, Jane Doe, Dianna LaFerry, John Smith, "Test Title for Sample Publication", Conference, April 22, 2023, No.1:555-600,
This is a test publication for the purposes of explaining the SilverStripe 4 local publications database. It is intended as a guidepost for users and does not contain any relevant scientific information. All authors, titles, and dates are fictitious.
Daniel Ladiges
2023
I. Srivastava, D. R. Ladiges, A. Nonaka, A. L. Garcia, J. B. Bell, "Staggered Scheme for the Compressible Fluctuating Hydrodynamics of Multispecies Fluid Mixtures", Physical Review E, January 24, 2023, 107:015305, doi: 10.1103/PhysRevE.107.015305
2022
D. R. Ladiges, J. G. Wang, I. Srivastava, S. P. Carney, A. Nonaka, A. L. Garcia, A. Donev, J. B. Bell, "Modeling Electrokinetic Flows with the Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm", Physical Review E, November 19, 2022, 106:035104, doi: 10.1103/PhysRevE.106.035104
2021
Robin J Dolleman, Debadi Chakraborty, Daniel R Ladiges, Herre SJ van der Zant, John E Sader, Peter G Steeneken, "Squeeze-film effect on atomically thin resonators in the high-pressure limit", Submitted to Nano Letters, June 24, 2021,
Daniel R. Ladiges, Sean P. Carney, Andrew Nonaka, Katherine Klymko, Guy C. Moore, Alejandro L. Garcia, Sachin R. Natesh, Aleksandar Donev, John B. Bell, "A Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm for Modeling Electrolytes", Physical Review Fluids, April 1, 2021, 6(4):044309,
Kan-Heng Lee
2023
Ziqian Li, Tanay Roy, David Rodriguez Perez, Kan-Heng Lee, Eliot Kapit, David I. Schuster, "Autonomous error correction of a single logical qubit using two transmons", arXiv.org, 2023,
Charles Leggett
2022
Meghna Bhattacharya, others, Portability: A Necessary Approach for Future Scientific Software, 2022 Snowmass Summer Study, 2022,
Christopher D. Jones, Kyle Knoepfel, Paolo Calafiura, Charles Leggett, Vakhtang Tsulaia, Evolution of HEP Processing Frameworks, 2022 Snowmass Summer Study, 2022,
Xiaoye Li
2022
X. Li, Y. Liu, P. Lin, P. Sao, "Newly released capabilities in distributed-memory SuperLU sparse direct solver", ACM Transactions on Mathematical Software, November 19, 2022,
- Download File: 3577197.pdf (pdf: 1.1 MB)
Hengrui Luo, Younghyun Cho, James W. Demmel, Xiaoye S. Li, Yang Liu, "Hybrid models for mixed variables in Bayesian optimization", June 6, 2022,
X. Zhu, Y. Liu, P. Ghysels, D. Bindal, X. S. Li, "GPTuneBand: multi-task and multi-fidelity Bayesian optimization for autotuning large-scale high performance computing applications", SIAM PP, February 23, 2022,
- Download File: GPTuneBand.pdf (pdf: 1.4 MB)
2021
Y. Cho, J. W. Demmel, X. S. Li, Y. Liu, H. Luo, "Enhancing autotuning capability with a history database", IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), December 20, 2021,
- Download File: GPTuneHistoryDB.pdf (pdf: 390 KB)
H. Luo, J.W. Demmel, Y. Cho, X. S. Li, Y. Liu, "Non-smooth Bayesian optimization in tuning problems", arxiv-preprint, September 21, 2021,
Nan Ding, Yang Liu, Samuel Williams, Xiaoye S. Li, "A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), July 19, 2021,
- Download File: Multi-GPU-SpTRSV-ACDA21-.pdf (pdf: 897 KB)
Yang Liu, Pieter Ghysels, Lisa Claus, Xiaoye Sherry Li, "Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations", SIAM J. Sci. Comput., June 22, 2021,
Yang Liu, Xin Xing, Han Guo, Eric Michielssen, Pieter Ghysels, Xiaoye Sherry Li, "Butterfly factorization via randomized matrix-vector multiplications", SIAM J. Sci. Comput., March 9, 2021,
Y. Liu, W. M. Sid-Lakhdar, O. Marques, X. Zhu, C. Meng, J. W. Demmel, X. S. Li, "GPTune: multitask learning for autotuning exascale applications", PPoPP, February 17, 2021, doi: 10.1145/3437801.3441621
Terry J. Ligocki
2023
Tim Kneafsey, David Trebotich, Terry Ligocki, "Direct Numerical Simulation of Flow Through Nanoscale Shale Pores in a Mesoscale Sample", Album of Porous Media, edited by E.F. Médici, A.D. Otero, (Springer Cham: April 14, 2023) Pages: 87 doi: https://doi.org/10.1007/978-3-031-23800-0_69
David Trebotich, Terry Ligocki, "High Resolution Simulation of Fluid Flow in Press Felts Used in Paper Manufacturing", Album of Porous Media, edited by E.F. Médici, A.D. Otero, (Springer Cham: April 14, 2023) Pages: 132 doi: https://doi.org/10.1007/978-3-031-23800-0_109
Yang Liu
2022
X. Li, Y. Liu, P. Lin, P. Sao, "Newly released capabilities in distributed-memory SuperLU sparse direct solver", ACM Transactions on Mathematical Software, November 19, 2022,
- Download File: 3577197.pdf (pdf: 1.1 MB)
M. Wang, Y. Liu, P. Ghysels, A. C. Yucel, "VoxImp: Impedance Extraction Simulator for Voxelized Structures", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, November 2, 2022, doi: 10.1109/TCAD.2022.3218768
Yang Liu, Jian Song, Robert Burridge, Jianliang Qian, "A Fast Butterfly-compressed Hadamard-Babich Integrator for High-Frequency Helmholtz Equations in Inhomogeneous Media with Arbitrary Sources", SIAM Multiscale Modeling and Simulation, October 6, 2022,
- Download File: 2210-v2.02698.pdf (pdf: 38 MB)
Hengrui Luo, Younghyun Cho, James W. Demmel, Xiaoye S. Li, Yang Liu, "Hybrid models for mixed variables in Bayesian optimization", June 6, 2022,
Yang Liu, "A comparative study of butterfly-enhanced direct integral and differential equation solvers for high-frequency electromagnetic analysis involving inhomogeneous dielectrics", May 29, 2022,
- Download File: comparative_study-v2.pdf (pdf: 3.3 MB)
X. Zhu, Y. Liu, P. Ghysels, D. Bindal, X. S. Li, "GPTuneBand: multi-task and multi-fidelity Bayesian optimization for autotuning large-scale high performance computing applications", SIAM PP, February 23, 2022,
- Download File: GPTuneBand.pdf (pdf: 1.4 MB)
2021
Y. Cho, J. W. Demmel, X. S. Li, Y. Liu, H. Luo, "Enhancing autotuning capability with a history database", IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), December 20, 2021,
- Download File: GPTuneHistoryDB.pdf (pdf: 390 KB)
S. B. Sayed, Y. Liu, L. J. Gomez, A. C. Yucel, "A butterfly-accelerated volume integral equation solver for broad permittivity and large-scale electromagnetic analysis", arxiv-preprint, November 5, 2021,
H. Luo, J.W. Demmel, Y. Cho, X. S. Li, Y. Liu, "Non-smooth Bayesian optimization in tuning problems", arxiv-preprint, September 21, 2021,
Nan Ding, Yang Liu, Samuel Williams, Xiaoye S. Li, "A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), July 19, 2021,
- Download File: Multi-GPU-SpTRSV-ACDA21-.pdf (pdf: 897 KB)
Yang Liu, Pieter Ghysels, Lisa Claus, Xiaoye Sherry Li, "Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations", SIAM J. Sci. Comput., June 22, 2021,
Yang Liu, Xin Xing, Han Guo, Eric Michielssen, Pieter Ghysels, Xiaoye Sherry Li, "Butterfly factorization via randomized matrix-vector multiplications", SIAM J. Sci. Comput., March 9, 2021,
Y. Liu, W. M. Sid-Lakhdar, O. Marques, X. Zhu, C. Meng, J. W. Demmel, X. S. Li, "GPTune: multitask learning for autotuning exascale applications", PPoPP, February 17, 2021, doi: 10.1145/3437801.3441621
Burlen Loring
2022
E. Wes Bethel, Burlen Loring, Utkarsh Ayachit, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, Dave Pugmire, Silvio Rizzi, Thompson, Will Usher, Gunther H. Weber, Brad Whitlock, Wolf, Kesheng Wu, "Proximity Portability and In Transit, M-to-N Data Partitioning and Movement in SENSEI", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_20
E. Wes Bethel, Burlen Loring, Utkarsh Ayatchit, David Camp, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, David Pugmire, Silvio Rizzi, Thompson, Gunther H. Weber, Brad Whitlock, Matthew Wolf, Kesheng Wu, "The SENSEI Generic In Situ Interface: Tool and Processing Portability at Scale", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_13
Jason Lowe-Power
2022
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power, "SoK: Limitations of Confidential Computing via TEEs for High-Performance Compute Systems", Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), September 2022,
2021
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power,, "Enabling Design Space Exploration for RISC-V Secure Compute Environments", Proceedings of the Fifth Workshop on Computer Architecture Research with RISC-V (CARRV), (co-located with ISCA 2021), June 17, 2021,
Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert, "Performance Analysis of Scientific Computing Workloads on General Purpose TEEs", Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE, May 2021, doi: 10.1109/IPDPS49936.2021.00115
Zarija Lukic
2021
Jean Sexton, Zarija Lukic, Ann Almgren, Chris Daley, Brian Friesen, Andrew Myers, and Weiqun Zhang, "Nyx: A Massively Parallel AMR Code for Computational Cosmology", The Journal Of Open Source Software, July 10, 2021,
Timur Takhtaganov, Zarija Lukić, Juliane Mueller, Dmitriy Morozov, "Cosmic Inference: Constraining Parameters With Observations and Highly Limited Number of Simulations", Astrophysical Journal, 2021, 906:74, doi: 10.3847/1538-4357/abc8ed
Darren Lyles
2023
Kylie Huch, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Hyperdimensional Associative Memory Circuit for Scalable Machine Learning", IEEE Transactions on Applied Superconductivity, May 2023,
Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Gay Bautista, George Michelogiannakis, "PaST-NoC: A Packet-Switched Superconducting Temporal NoC", IEEE Transactions on Applied Superconductivity, January 2023,
2022
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-Flux Shift Register for Race Logic and Its Applications", IEEE Transactions on Circuits and Systems I: Regular Papers, October 2022,
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, Kylie Huch, George Michelogiannakis, "Superconducting Digital DIT Butterfly Unit for Fast Fourier Transform Using Race Logic", 2022 20th IEEE Interregional NEWCAS Conference (NEWCAS), IEEE, June 2022, 441-445,
Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, George Michelogiannakis, "Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators", 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), ACM, February 2022,
- Download File: asplos2022.pdf (pdf: 1.9 MB)
2021
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-flux Shift Buffer for Race Logic", 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), August 2021,
George Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Bautista, Dilip Vasudevan, Anastasiia Butko, "SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC", IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021,
Colin MacLean
2023
Paul H. Hargrove, Dan Bonachea, Johnny Corbino, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'23)", Poster at Exascale Computing Project (ECP) Annual Meeting 2023, January 2023,
The Pagoda project is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. The first component is GASNet-EX, a portable, high-performance, global-address-space communication library. The second component is UPC++, a C++ template library. Together, these libraries enable agile, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
GASNet-EX is a portable, high-performance communications middleware library which leverages hardware support to implement Remote Memory Access (RMA) and Active Message communication primitives. GASNet-EX supports a broad ecosystem of alternative HPC programming models, including UPC++, Legion, Chapel and multiple implementations of UPC and Fortran Coarrays. GASNet-EX is implemented directly over the native APIs for networks of interest in HPC. The tight semantic match of GASNet-EX APIs to the client requirements and hardware capabilities often yields better performance than competing libraries.
UPC++ provides high-level productivity abstractions appropriate for Partitioned Global Address Space (PGAS) programming such as: remote memory access (RMA), remote procedure call (RPC), support for accelerators (e.g. GPUs), and mechanisms for aggressive asynchrony to hide communication costs. UPC++ implements communication using GASNet-EX, delivering high performance and portability from laptops to exascale supercomputers. HPC application software using UPC++ includes: MetaHipMer2 metagenome assembler, SIMCoV viral propagation simulation, NWChemEx TAMM, and graph computation kernels from ExaGraph.
2022
Mateusz Pusz, Gašper Ažman, Bengt Gustafsson, Colin MacLean, Corentin Jabot, "Universal Template Parameters", ISO C++ Standard Mailing, September 2022,
This paper proposes a unified model for universal template parameters (UTPs) and dependent names, enabling more comprehensive and consistent template metaprogramming. Universal template parameters allow for a generic apply and other higher-order template metafunctions, including certain type traits.
Paul H. Hargrove, Dan Bonachea, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'22)", Poster at Exascale Computing Project (ECP) Annual Meeting 2022, May 5, 2022,
We present UPC++ and GASNet-EX, distributed libraries which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Procedure Call (RPC) and for Remote Memory Access (RMA) to host and GPU memories. The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems.
2021
Daniel Waters, Colin A. MacLean, Dan Bonachea, Paul H. Hargrove, "Demonstrating UPC++/Kokkos Interoperability in a Heat Conduction Simulation (Extended Abstract)", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S4630V
We describe the replacement of MPI with UPC++ in an existing Kokkos code that simulates heat conduction within a rectangular 3D object, as well as an analysis of the new code’s performance on CUDA accelerators. The key challenges were packing the halos in Kokkos data structures in a way that allowed for UPC++ remote memory access, and streamlining synchronization costs. Additional UPC++ abstractions used included global pointers, distributed objects, remote procedure calls, and futures. We also make use of the device allocator concept to facilitate data management in memory with unique properties, such as GPUs. Our results demonstrate that despite the algorithm’s good semantic match to message passing abstractions, straightforward modifications to use UPC++ communication deliver vastly improved performance and scalability in the common case. We find the one-sided UPC++ version written in a natural way exhibits good performance, whereas the message-passing version written in a straightforward way exhibits performance anomalies. We argue this represents a productivity benefit for one-sided communication models.
Paul H. Hargrove, Dan Bonachea, Colin A. MacLean, Daniel Waters, "GASNet-EX Memory Kinds: Support for Device Memory in PGAS Programming Models", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21) Research Poster, November 2021, doi: 10.25344/S4P306
Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work includes two major components: UPC++ (a C++ template library) and GASNet-EX (a portable, high-performance communication library). This poster describes recent advances in GASNet-EX to efficiently implement Remote Memory Access (RMA) operations to and from memory on accelerator devices such as GPUs. Performance is illustrated via benchmark results from UPC++ and the Legion programming system, both using GASNet-EX as their communications library.
Tan Nguyen, Colin MacLean, Marco Siracusa, Douglas Doerfler, Nicholas J. Wright, Samuel Williams, "FPGA‐based HPC accelerators: An evaluation on performance and energy efficiency", CCPE, August 22, 2021, doi: 10.1002/cpe.6570
Paul H. Hargrove, Dan Bonachea, Max Grossman, Amir Kamil, Colin A. MacLean, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'21)", Poster at Exascale Computing Project (ECP) Annual Meeting 2021, April 2021,
We present UPC++ and GASNet-EX, which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Memory Access (RMA) and Remote Procedure Call (RPC). The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems
Osni Marques
2021
Y. Liu, W. M. Sid-Lakhdar, O. Marques, X. Zhu, C. Meng, J. W. Demmel, X. S. Li, "GPTune: multitask learning for autotuning exascale applications", PPoPP, February 17, 2021, doi: 10.1145/3437801.3441621
Daniel F. Martin
2023
Will Thacher and Hans Johansen and Daniel Martin, "A high order Cartesian grid, finite volume method for elliptic interface problems", Journal of Computational Physics, October 15, 2023, 491, doi: 10.1016/j.jcp.2023.112351
S. Bevan, S. Cornford, L. Gilbert, I. Otosaka, D. Martin, T. Surawy-Stepney, "Amundsen Sea Embayment ice-sheet mass-loss predictions to 2050 calibrated using observations of velocity and elevation change", Journal of Glaciology, August 14, 2023, 1-11, doi: 10.1017/jog.2023.57
2022
Daniel Martin, Samuel Kachuck, Joanna Millstein, Brent Minchew, "Examining the Sensitivity of Ice Sheet Models to Updates in Rheology (n=4)", AGU Fall Meeting, December 15, 2022,
- Download File: AGU2022-1.pdf (pdf: 508 KB)
Anne M. Felden, Daniel F. Martin, Esmond G. Ng, "SUHMO: an AMR SUbglacial Hydrology MOdel v1.0", Geosci. Model Dev. Discuss., July 27, 2022,
- Download File: gmd-2022-190.pdf (pdf: 5.5 MB)
Samuel B. Kachuck, Morgan Whitcomb, Jeremy N. Bassis, Daniel F. Martin, Stephen F. Price, "Simulating ice-shelf extent using damage mechanics", Journal of Glaciology, March 7, 2022, 68(271):987-998, doi: 10.1017/jog.2022.12
2021
Samuel Benjamin Kachuck, Morgan Whitcomb, Jeremy N Bassis, Daniel F Martin, and Stephen F Price,, "When are (simulations of) ice shelves stable? Stabilizing forces in fracture-permitting models", AGU Fall Meeting, December 16, 2021,
Daniel F. Martin, Stephen L. Cornford, Esmond G. Ng, Impact of Improved Bedrock Geometry and Basal Friction Relations on Antarctic Vulnerability to Regional Ice Shelf Collapse, Americal Geophysical Union Fall Meeting, December 15, 2021,
Courtney Shafer, Daniel F Martin and Esmond G Ng, "Comparing the Shallow-Shelf and L1L2 Approximations using BISICLES in the Context of MISMIP+ with Buttressing Effects", AGU Fall Meeting, December 13, 2021,
Anne M. Felden, Daniel F. Martin, Esmond G. Ng, SUHMO: An SUbglacial Hydrology MOdel based on the Chombo AMR framework, American Geophysical Union Fall Meeting, December 13, 2021,
Thomas M Evans, Andrew Siegel, Erik W Draeger,Jack Deslippe, Marianne M Francois, Timothy C Germann,William E Hart, Daniel F Martin, "A survey of software implementations used by application codes in the Exascale Computing Project", The International Journal of High Performance Computing Applications, June 25, 2021, doi: https://doi.org/10.1177/10943420211028940
- Download File: ijhpc-2021.pdf (pdf: 242 KB)
Tamsin L. Edwards, Sophie Nowicki, Ben Marzeion, Regine Hock, Heiko Goelzer, Hélène Seroussi, Nicolas C. Jourdain, Donald A. Slater, Fiona E. Turner, Christopher J. Smith, Christine M. McKenna, Erika Simon, Ayako Abe-Ouchi, Jonathan M. Gregory, Eric Larour, William H. Lipscomb, Antony J. Payne, Andrew Shepherd, Cécile Agosta, Patrick Alexander, Torsten Albrecht, Brian Anderson, Xylar Asay-Davis, Andy Aschwanden, Alice Barthel, Andrew Bliss, Reinhard Calov, Christopher Chambers, Nicolas Champollion, Youngmin Choi, Richard Cullather, Joshua Cuzzone, Christophe Dumas, Denis Felikson, Xavier Fettweis, Koji Fujita, Benjamin K. Galton-Fenzi, Rupert Gladstone, Nicholas R. Golledge, Ralf Greve, Tore Hattermann, Matthew J. Hoffman, Angelika Humbert, Matthias Huss, Philippe Huybrechts, Walter Immerzeel, Thomas Kleiner, Philip Kraaijenbrink, Sébastien Le clec’h, Victoria Lee, Gunter R. Leguy, Christopher M. Little, Daniel P. Lowry, Jan-Hendrik Malles, Daniel F. Martin, Fabien Maussion, Mathieu Morlighem, James F. O’Neill, Isabel Nias, Frank Pattyn, Tyler Pelle, Stephen F. Price, Aurélien Quiquet, Valentina Radić, Ronja Reese, David R. Rounce, Martin Rückamp, Akiko Sakai, Courtney Shafer, Nicole-Jeanne Schlegel, Sarah Shannon, Robin S. Smith, Fiammetta Straneo, Sainan Sun, Lev Tarasov, Luke D. Trusel, Jonas Van Breedam, Roderik van de Wal, Michiel van den Broeke, Ricarda Winkelmann, Harry Zekollari, Chen Zhao, Tong Zhang, Thomas Zwinger, "Projected land ice contributions to twenty-first-century sea level rise", Nature, May 5, 2021, 593:74-82, doi: 10.1038/s41586-021-03302-y
- Download File: Edwards-et-al-2021-Nature-preprint.pdf (pdf: 40 MB)
Mekena Metcalf
2021
Meriam Gay Bautista, Zhi Jackie Yao, Anastasiia Butko, Mariam Kiran, Mekena Metcalf, "Towards Automated Superconducting Circuit Calibration using Deep Reinforcement Learning", 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Tampa, FL, USA, IEEE, August 23, 2021, pp. 462-46, doi: 10.1109/ISVLSI51109.2021.00091
George Michelogiannakis
2023
Jie Li, George Michelogiannakis, Brandon Cook, Dulanya Cooray, Yong Chen, "Analyzing Resource Utilization in an HPC System: A Case Study of NERSC Perlmutter", ISC High Performance, Elsevier, May 2023,
George Michelogiannakis, Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter, ISC High Performance, May 2023,
- Download File: isc2023.pdf (pdf: 1.1 MB)
Zhenguo Wu, Liang Yuan Dai, Asher Novick, Madeleine Glick, Ziyi Zhu, Sébastien Rumley, George Michelogiannakis, John Shalf, Keren Bergman, "Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications", IEEE Journal of Lightwave Technology, May 2023,
Kylie Huch, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Hyperdimensional Associative Memory Circuit for Scalable Machine Learning", IEEE Transactions on Applied Superconductivity, May 2023,
Patricia Gonzalez-Guerrero, Kylie Huch, Nirmalendu Patra, Thom Popovici, George Michelogiannakis, "An Area Efficient Superconducting Unary CNN Accelerator", IEEE 24th International Symposium on Quality Electronic Design (ISQED), IEEE, April 2023,
Dilip Vasudevan, George Michelogiannakis, "Efficient Temporal Arithmetic Logic Design for Superconducting RSFQ Logic", IEEE Transactions on Applied Superconductivity, March 2023,
George Michelogiannakis, A Case for Intra-Rack Resource Disaggregation for HPC, HiPEAC conference 2023, January 17, 2023,
- Download File: disaggregation.pptx.pdf (pdf: 1.3 MB)
Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Gay Bautista, George Michelogiannakis, "PaST-NoC: A Packet-Switched Superconducting Temporal NoC", IEEE Transactions on Applied Superconductivity, January 2023,
2022
George Michelogiannakis, Intra-Rack Resource Disaggregation Using Emerging Photonics, OCP global summit, October 19, 2022,
- Download File: disaggregation_2022.pdf (pdf: 953 KB)
John Shalf, George Michelogiannakis, Heterogeneous Integration for HPC, OCP global summit, October 19, 2022,
- Download File: chiplets_2022.pdf (pdf: 1.2 MB)
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-Flux Shift Register for Race Logic and Its Applications", IEEE Transactions on Circuits and Systems I: Regular Papers, October 2022,
Alvin Oliver Glova, Yukai Yang, Yiyao Wan, Zhizhou Zhang, George Michelogiannakis, Jonathan Balkind, Timothy Sherwood, "Establishing Cooperative Computation with Hardware Embassies", IEEE International Symposium on Secure and Private Execution Environment Design, September 2022,
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, Kylie Huch, George Michelogiannakis, "Superconducting Digital DIT Butterfly Unit for Fast Fourier Transform Using Race Logic", 2022 20th IEEE Interregional NEWCAS Conference (NEWCAS), IEEE, June 2022, 441-445,
George Michelogiannakis, Madeleine Glick, John Shalf, Keren Bergman, Photonics as a Means to Implement Intra-rack Resource Disaggregation, SPIE photonics west, March 2022,
George Michelogiannakis, Madeleine Glick, John Shalf, Keren Bergman, "Photonics as a means to implement intra-rack resource disaggregation", Proceedings Volume 12027, Metro and Data Center Optical Networks and Short-Reach Links V, March 2022, doi: https://doi.org/10.1117/12.2607317
Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, George Michelogiannakis, Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators, 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 2022,
- Download File: asplos2022-presentation.pdf (pdf: 1.7 MB)
Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, George Michelogiannakis, "Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators", 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), ACM, February 2022,
- Download File: asplos2022.pdf (pdf: 1.9 MB)
George Michelogiannakis, Benjamin Klenk, Brandon Cook, Min Yee Teh, Madeleine Glick, Larry Dennison, Keren Bergman, John Shalf, "A Case For Intra-Rack Resource Disaggregation in HPC", ACM Transactions on Architecture and Code Optimization, February 2022,
2021
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-flux Shift Buffer for Race Logic", 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), August 2021,
George Michelogiannakis, SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC, IEEE International Parallel and Distributed Processing Symposium, May 2021,
- Download File: ipdps-2021-2.pptx (pptx: 1.7 MB)
George Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Bautista, Dilip Vasudevan, Anastasiia Butko, "SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC", IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021,
Georgios Tzimpragos, Jennifer Volk, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, John Shalf, Timothy Sherwood, "Temporal Computing With Superconductors", IEEE MIcro, March 2021, 41:71-79, doi: 10.1109/MM.2021.3066377
George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman, Maximizing The Impact of Emerging Photonic Switches At The System Level, SPIE photonics west, March 2021,
- Download File: photonics-west-2021.pdf (pdf: 770 KB)
George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman, "Maximizing the impact of emerging photonic switches at the system level", SPIE 11692, Optical Interconnects XXI, 116920Z, March 2021,
Michael Minion
2021
Pietro Benedusi, Michael L Minion, Rolf Krause, "An experimental comparison of a space-time multigrid method with PFASST for a reaction-diffusion problem", Computers & Mathematics with Applications, October 1, 2021,
- Download File: Benedusi-Minion-Krause.pdf (pdf: 372 KB)
Tommaso Buvoli, Michael Minion, "IMEX Runge-Kutta Parareal for Non-diffusive Equations", Springer Proceedings in Mathematics & Statistics, August 25, 2021,
Sebastian Götschel, Michael Minion, Daniel Ruprecht, Robert Speck, "Twelve Ways To Fool The Masses When Giving Parallel-In-Time Results Authors", Springer Proceedings in Mathematics & Statistics, August 25, 2021,
- Download File: Twelve-Ways.pdf (pdf: 847 KB)
Bashir Mohammed
2022
Qiang Du, Dan Wang, Tong Zhou, Antonio Gilardi, Mariam Kiran, Bashir Mohammed, Derun Li, and Russell Wilcox, "Experimental beam combining stabilization using machine learning trained while phases drift", Advanced Solid State Lasers 2022, © 2022 Optica Publishing Group, June 1, 2022, Vol. 30,:pp. 12639-, doi: https://doi.org/10.1364/OE.450255
2021
Shen Sheng, Mariam Kiran, Bashir Mohammed, "DynamicDeepFlow: An Approach for Identifying Changes in Network Traffic Flow Using Unsupervised Clustering", (BEST PAPER) 4th International Conference on Machine Learning for Networking (MLN'2021), December 6, 2021,
Bashir Mohammed, Mariam Kiran, Bjoern Enders, "NetGraf: An End-to-End Learning Network Monitoring Service", 2021 IEEE Workshop on Innovating the Network for Data-Intensive Science (INDIS), November 15, 2021, doi: 10.1109/INDIS54524.2021.00007
B Mohammed, M Kiran; N Krishnaswamy; Keshang, Wu, "Predicting WAN Traffic Volumes using Fourier and Multivariate SARIMA Approach", International Journal of Big Data Intelligence, November 3, 2021, doi: 10.1504/IJBDI.2021.118742
M Kiran, B Mohammed, Q Du, D Wang, S Shen, R Wilcox, "Controlling Laser Beam Combining via an Active Reinforcement Learning Algorithm", Advanced Solid State Lasers 2021, Washington, DC United States, October 4, 2021,
Destinee Morrow
2023
Nathan A. Kimbrel, Allison E. Ashley-Koch, Xue J. Qin, Jennifer H. Lindquist, Melanie E. Garrett, Michelle F. Dennis, Lauren P. Hair, Jennifer E. Huffman, Daniel A. Jacobson, Ravi K. Madduri, Jodie A. Trafton, Hilary Coon, Anna R. Docherty, Niamh Mullins, Douglas M. Ruderfer, Philip D. Harvey, Benjamin H. McMahon, David W. Oslin, Jean C. Beckham, Elizabeth R. Hauser, Michael A. Hauser, Million Veteran Program Suicide Exemplar Workgroup, International Suicide Genetics Consortium, Veterans Affairs Mid-Atlantic Mental Illness Research Education and Clinical Center Workgroup, Veterans Affairs Million Veteran Program, "Identification of Novel, Replicable Genetic Risk Loci for Suicidal Thoughts and Behaviors Among US Military Veterans", JAMA Psychiatry, February 1, 2023, 80:100-191, doi: 10.1001/jamapsychiatry.2022.3896
2022
Destinee Morrow, Rafael Zamora-Resendiz, Jean C Beckham, Nathan A Kimbrel, David W Oslin, Suzanne Tamang, Million Veteran Program Suicide Exemplar Workgroup, Silvia Crivelli, "A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes", Journal of Psychiatric Research, July 1, 2022, 151:328-338, doi: 10.1016/j.jpsychires.2022.04.009
Juliane Mueller
2022
C Varadharajan, AP Appling, B Arora, DS Christianson, VC Hendrix, V Kumar, AR Lima, J Müller, S Oliver, M Ombadi, T Perciano, JM Sadler, H Weierbach, JD Willard, Z Xu, J Zwart, "Can machine learning accelerate process understanding and decision-relevant predictions of river water quality?", Hydrological Processes, January 1, 2022, 36, doi: 10.1002/hyp.14565
2021
V. Dumont, C. Garner, A. Trivedi, C. Jones, V. Ganapati, J. Mueller, T. Perciano, M. Kiran, and M. Day, "HYPPO: A Surrogate-Based Multi-Level Parallelism Tool for Hyperparameter Optimization", 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), November 15, 2021,
J Müller, B Faybishenko, D Agarwal, S Bailey, C Jiang, Y Ryu, C Tull, L Ramakrishnan, Assessing data change in scientific datasets, Concurrency and Computation: Practice and Experience, 2021, doi: 10.1002/cpe.6245
Daniel Murnane
2022
Alina Lazar, others, Accelerating the Inference of the Exa.TrkX Pipeline, 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded - Towards Sustainable, Diverse, Performant and Effective Scientific Computing, 2022,
Chun-Yi Wang, others, Reconstruction of Large Radius Tracks with the Exa.TrkX pipeline, 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded - Towards Sustainable, Diverse, Performant and Effective Scientific Computing, 2022,
Savannah Thais, Paolo Calafiura, Grigorios Chachamis, Gage DeZoort, Javier Duarte, Sanmay Ganguly, Michael Kagan, Daniel Murnane, Mark S. Neubauer, Kazuhiro Terao, Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges, 2022 Snowmass Summer Study, 2022,
2021
Xiangyang Ju, others, Performance of a geometric deep learning pipeline for HL-LHC particle tracking, Eur. Phys. J. C, Pages: 876 2021, doi: 10.1140/epjc/s10052-021-09675-8
Andrew Myers
2023
H. Klion, R. Jambunathan, M. E. Rowan, E. Yang, D. Willcox, J.-L. Vay, R. Lehe, A. Myers, A. Huebl, W. Zhang, "Particle-in-Cell Simulations of Relativistic Magnetic Reconnection with Advanced Maxwell Solver Algorithms", arXiv preprint, submitted to The Astrophysical Journal, April 20, 2023,
2021
Andrew Myers, Ann Almgren, Diana Almorim, John Bell, Luca Fedeli, Lixin Ge, Kevin Gott, David Grote, Mark Hogan, Axel Huebl, Revathi Jambunathan, Remi Lehe, Cho Ng, Michael Rowan, Olga Shapoval, Maxence Thevenet, Jean-Luc Vay, Henri Vincenti, Eloise Yang, Neil Zaim, Weiqun Zhang, Yin Zhao, Edoardo Zoni, "Porting WarpX to GPU-accelerated platforms", Parallel Computing, December 1, 2021,
Jean Sexton, Zarija Lukic, Ann Almgren, Chris Daley, Brian Friesen, Andrew Myers, and Weiqun Zhang, "Nyx: A Massively Parallel AMR Code for Computational Cosmology", The Journal Of Open Source Software, July 10, 2021,
Weiqun Zhang, Andrew Myers, Kevin Gott, Ann Almgren and John Bell, "AMReX: Block-Structured Adaptive Mesh Refinement for Multiphysics Applications", The International Journal of High Performance Computing Applications, June 12, 2021,
L. Fedeli, A. Sainte-Marie, N. Zaim, M. Thevenet, J. L. Vay, A. Myers, F. Quere, and H. Vincenti, "Probing strong-field QED with Doppler-boosted petawatt-class lasers", Physical Review Letters, May 10, 2021,
Sherwood Richers, Don E. Willcox, Nicole M. Ford, and Andrew Myers, "Particle-in-cell simulation of the neutrino fast flavor instabilit", Physical Review D, April 20, 2021,
Jordan Musser, Ann S Almgren, William D Fullmer, Oscar Antepara, John B Bell, Johannes Blaschke, Kevin Gott, Andrew Myers, Roberto Porcu, Deepak Rangarajan, Michele Rosso, Weiqun Zhang, and Madhava Syamlal, "MFIX:Exa: A Path Towards Exascale CFD-DEM Simulations", The International Journal of High Performance Computing Applications, April 16, 2021,
J-L Vay, Ann Almgren, LD Amorim, John Bell, L Fedeli, L Ge, K Gott, DP Grote, M Hogan, A Huebl, R Jambunathan, R Lehe, A Myers, C Ng, M Rowan, O Shapoval, M Thevenet, H Vincenti, E Yang, N Zaim, W Zhang, Y Zhao and E Zoni, "Modeling of a chain of three plasma accelerator stages with the WarpX electromagnetic PIC code on GPUs", Physics of Plasmas, February 9, 2021,
Ravi K. Naik
2022
Noah Goss, Alexis Morvan, Brian Marinelli, Bradley K Mitchell, Long B Nguyen, Ravi K Naik, Larry Chen, Christian J{\"u}nger, John Mark Kreikebaum, David I Santiago, others, "High-fidelity qutrit entangling gates for superconducting circuits", Nature Communications, 2022, 13:7481, doi: 10.1038/s41467-022-34851-z
Akel Hashim, Rich Rines, Victory Omole, Ravi K. Naik, John Mark Kreikebaum, David I. Santiago, Frederic T. Chong, Irfan Siddiqi, Pranav Gokhale, "Optimized SWAP networks with equivalent circuit averaging for QAOA", Phys. Rev. Research, 2022, 033028, doi: 10.1103/PhysRevResearch.4.033028
- Download File: PhysRevResearch4033028.bib (bib: 602 bytes)
Srivatsan Chakram, Kevin He, Akash V. Dixit, Andrew E. Oriani, Ravi K. Naik, Nelson Leung, Hyeokshin Kwon, Wen-Long Ma, Liang Jiang, David I. Schuster, "Multimode photon blockade", Nature Physics, 2022, doi: 10.1038/s41567-022-01630-y
Yosep Kim, Alexis Morvan, Long B Nguyen, Ravi K Naik, Christian J\ unger, Larry Chen, John Mark Kreikebaum, David I Santiago, Irfan Siddiqi, "High-fidelity three-qubit iToffoli gate for fixed-frequency superconducting qubits", Nature Physics, 2022, 1--6, doi: 10.1038/s41567-022-01590-3
2021
Akel Hashim, Ravi K. Naik, Alexis Morvan, Jean-Loup Ville, Bradley Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin P. O Brien, Ian Hincks, Joel J. Wallman, Joseph Emerson, Irfan Siddiqi, "Randomized Compiling for Scalable Quantum Computing on a Noisy Superconducting Quantum Processor", Physical Review X, 2021, 11:041039, doi: 10.1103/PhysRevX.11.041039
Kenneth Rudinger, Craig W Hogle, Ravi K Naik, Akel Hashim, Daniel Lobser, David I Santiago, Matthew D Grace, Erik Nielsen, Timothy Proctor, Stefan Seritan, others, "Experimental Characterization of Crosstalk Errors with Simultaneous Gate Set Tomography", PRX Quantum, 2021, 2:040338, doi: 10.1103/PRXQuantum.2.040338
Bradley K. Mitchell, Ravi K. Naik, Alexis Morvan, Akel Hashim, John Mark Kreikebaum, Brian Marinelli, Wim Lavrijsen, Kasra Nowrouzi, David I. Santiago, Irfan Siddiqi, "Hardware-Efficient Microwave-Activated Tunable Coupling between Superconducting Qubits", Physical Review Letters, 2021, 127:200502, doi: 10.1103/PhysRevLett.127.200502
Yilun Xu, Gang Huang, Jan Balewski, Ravi Naik, Alexis Morvan, Bradley Mitchell, Kasra Nowrouzi, David I. Santiago, Irfan Siddiqi, "QubiC: An Open-Source FPGA-Based Control and Measurement System for Superconducting Quantum Information Processors", IEEE Transactions on Quantum Engineering, 2021, 2:1-11, doi: 10.1109/TQE.2021.3116540
Srivatsan Chakram, Andrew E. Oriani, Ravi K. Naik, Akash V. Dixit, Kevin He, Ankur Agrawal, Hyeokshin Kwon, David I. Schuster, "Seamless High-Q Microwave Cavities for Multimode Circuit Quantum Electrodynamics", Physical Review Letters, 2021, 127:107701, doi: 10.1103/PhysRevLett.127.107701
G Koolstra, N Stevenson, S Barzili, L Burns, K Siva, S Greenfield, W Livingston, A Hashim, RK Naik, JM Kreikebaum, KP O'Brien, DI Santiago, J Dressel, I Siddiqi, "Monitoring fast superconducting qubit dynamics using a neural network", Preprint, August 2021,
Akel Hashim, Ravi Naik, Alexis Morvan, Jean-Loup Ville, Brad Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin O Brien, Ian Hincks, Joel Wallman, Joseph V Emerson, David Ivan Santiago, Irfan Siddiqi, Scalable Quantum Computing on a Noisy Superconducting Quantum Processor via Randomized Compiling, Bulletin of the American Physical Society, 2021,
Coherent errors in quantum hardware severely limit the performance of quantum algorithms in an unpredictable manner, and mitigating their impact is necessary for realizing reliable, large-scale quantum computations. Randomized compiling achieves this goal by converting coherent errors into stochastic noise, dramatically reducing unpredictable errors in quantum algorithms and enabling accurate predictions of aggregate performance via cycle benchmarking estimates. In this work, we demonstrate significant performance gains under randomized compiling for both the four-qubit quantum Fourier transform algorithm and for random circuits of variable depth on a superconducting quantum processor. We also validate solution accuracy using experimentally-measured error rates. Our results demonstrate that randomized compiling can be utilized to maximally-leverage and predict the capabilities of modern-day noisy quantum processors, paving the way forward for scalable quantum computing.
Akash V Dixit, Srivatsan Chakram, Kevin He, Ankur Agrawal, Ravi K Naik, David I Schuster, Aaron Chou, "Searching for dark matter with a superconducting qubit", Physical Review Letters, 2021, 126:141302, doi: 10.1103/PhysRevLett.126.141302
Alexis Morvan, VV Ramasesh, MS Blok, JM Kreikebaum, K O’Brien, L Chen, BK Mitchell, RK Naik, DI Santiago, I Siddiqi, "Qutrit randomized benchmarking", Physical Review Letters, 2021, 126:210504, doi: 10.1103/PhysRevLett.126.210504
Esmond G. Ng
2023
Julian Bellavita, Mathias Jacquelin, Esmond G. Ng, Dan Bonachea, Johnny Corbino, Paul H. Hargrove, "symPACK: A GPU-Capable Fan-Out Sparse Cholesky Solver", 2023 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM'23), ACM, November 13, 2023, doi: 10.25344/S49P45
Sparse symmetric positive definite systems of equations are ubiquitous in scientific workloads and applications. Parallel sparse Cholesky factorization is the method of choice for solving such linear systems. Therefore, the development of parallel sparse Cholesky codes that can efficiently run on today’s large-scale heterogeneous distributed-memory platforms is of vital importance. Modern supercomputers offer nodes that contain a mix of CPUs and GPUs. To fully utilize the computing power of these nodes, scientific codes must be adapted to offload expensive computations to GPUs.
We present symPACK, a GPU-capable parallel sparse Cholesky solver that uses one-sided communication primitives and remote procedure calls provided by the UPC++ library. We also utilize the UPC++ "memory kinds" feature to enable efficient communication of GPU-resident data. We show that on a number of large problems, symPACK outperforms comparable state-of-the-art GPU-capable Cholesky factorization codes by up to 14x on the NERSC Perlmutter supercomputer.
2022
Anne M. Felden, Daniel F. Martin, Esmond G. Ng, "SUHMO: an AMR SUbglacial Hydrology MOdel v1.0", Geosci. Model Dev. Discuss., July 27, 2022,
- Download File: gmd-2022-190.pdf (pdf: 5.5 MB)
2021
Daniel F. Martin, Stephen L. Cornford, Esmond G. Ng, Impact of Improved Bedrock Geometry and Basal Friction Relations on Antarctic Vulnerability to Regional Ice Shelf Collapse, Americal Geophysical Union Fall Meeting, December 15, 2021,
Courtney Shafer, Daniel F Martin and Esmond G Ng, "Comparing the Shallow-Shelf and L1L2 Approximations using BISICLES in the Context of MISMIP+ with Buttressing Effects", AGU Fall Meeting, December 13, 2021,
Anne M. Felden, Daniel F. Martin, Esmond G. Ng, SUHMO: An SUbglacial Hydrology MOdel based on the Chombo AMR framework, American Geophysical Union Fall Meeting, December 13, 2021,
Tan Thanh Nhat Nguyen
2021
Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,
- Download File: pmbs21-DL-final.pdf (pdf: 632 KB)
Tan Nguyen, Erich Strohmaier, John Shalf, "Facilitating CoDesign with Automatic Code Similarity Learning", 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), November 14, 2021,
Tan Nguyen, Colin MacLean, Marco Siracusa, Douglas Doerfler, Nicholas J. Wright, Samuel Williams, "FPGA‐based HPC accelerators: An evaluation on performance and energy efficiency", CCPE, August 22, 2021, doi: 10.1002/cpe.6570
Douglas Doerfler, Farzad Fatollahi-Fard, Colin MacLean, Tan Nguyen, Samuel Williams, Nicholas J. Wright, Marco Siracusa, "Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs", International Workshop on OpenCL (iWOCL), April 2021, doi: 10.1145/3456669.3456671
Andy Nonaka
2023
I. Srivastava, D. R. Ladiges, A. Nonaka, A. L. Garcia, J. B. Bell, "Staggered Scheme for the Compressible Fluctuating Hydrodynamics of Multispecies Fluid Mixtures", Physical Review E, January 24, 2023, 107:015305, doi: 10.1103/PhysRevE.107.015305
2022
D. R. Ladiges, J. G. Wang, I. Srivastava, S. P. Carney, A. Nonaka, A. L. Garcia, A. Donev, J. B. Bell, "Modeling Electrokinetic Flows with the Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm", Physical Review E, November 19, 2022, 106:035104, doi: 10.1103/PhysRevE.106.035104
Z. Yao, R. Jambunathan, Y. Zeng, and A. Nonaka, "A Massively Parallel Time-Domain Coupled Electrodynamics-Micromagnetics Solver", International Journal of High Performance Computing Applications, January 10, 2022, accepted,
2021
Daniel R. Ladiges, Sean P. Carney, Andrew Nonaka, Katherine Klymko, Guy C. Moore, Alejandro L. Garcia, Sachin R. Natesh, Aleksandar Donev, John B. Bell, "A Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm for Modeling Electrolytes", Physical Review Fluids, April 1, 2021, 6(4):044309,
Peter Nugent
2022
Melissa L. Graham, Robert A. Knop, Thomas Kennedy, Peter E. Nugent, Eric Bellm, Márcio Catelan, Avi Patel, Hayden Smotherman, Monika Soraisam, Steven Stetzler, Lauren N. Aldoroty, Autumn Awbrey, Karina Baeza-Villagra, Pedro H. Bernardinelli, Federica Bianco, Dillon Brout, Riley Clarke, William I. Clarkson, Thomas Collett, James R. A. Davenport, Shenming Fu, John E. Gizis, Ari Heinze, Lei Hu, Saurabh W. Jha, Mario Jurić, J. Bryce Kalmbach, Alex Kim, Chien-Hsiu Lee, Chris Lidman, Mark Magee, Clara E. Martínez-Vázquez, Thomas Matheson, Gautham Narayan, Antonella Palmese, Christopher A. Phillips, Markus Rabus, Armin Rest, Nicolás Rodríguez-Segovia, Rachel Street, A. Katherina Vivas, Lifan Wang, Nicholas Wolf, Jiawen Yang, "Deep drilling in the time domain with DECam: Survey characterization", Monthly Notices of the Royal Astronomical Society, November 2022,
Venkitesh Ayyar, Robert Knop, Autumn Awbrey, Alexis Andersen, Peter Nugent, "Identifying Transient Candidates in the Dark Energy Survey Using Convolutional Neural Networks", Publications of the Astronomical Society of the Pacific, September 2022, 134:094501,
The ability to discover new transient candidates via image differencing without direct human intervention is an important task in observational astronomy. For these kind of image classification problems, machine learning techniques such as Convolutional Neural Networks (CNNs) have shown remarkable success. In this work, we present the results of an automated transient candidate identification on images with CNNs for an extant data set from the Dark Energy Survey Supernova program, whose main focus was on using Type Ia supernovae for cosmology. By performing an architecture search of CNNs, we identify networks that efficiently select non-artifacts (e.g., supernovae, variable stars, AGN, etc.) from artifacts (image defects, mis-subtractions, etc.), achieving the efficiency of previous work performed with random Forests, without the need to expend any effort in feature identification. The CNNs also help us identify a subset of mislabeled images. Performing a relabeling of the images in this subset, the resulting classification with CNNs is significantly better than previous results, lowering the false positive rate by 27% at a fixed missed detection rate of 0.05.
K. Wang, S. Lee, J. Balewski, A. Sim, P. Nugent, A. Agrawal, A. Choudhary, K. Wu, W-K. Liao, "Using Multi-resolution Data to Accelerate Neural Network Training in Scientific Applications", 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2022), 2022, doi: 10.1109/CCGrid54584.2022.00050
2021
S. Lee, Q. Kang, K. Wang, J. Balewski, A. Sim, A. Agrawal, A. Choudhary, P. Nugent, K. Wu, W-K. Liao, "Asynchronous I/O Strategy for Large-Scale Deep Learning Applications", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021, doi: 10.1109/HiPC53243.2021.00046
Leonid Oliker
2022
Taylor Groves, Chris Daley, Rahulkumar Gayatri, Hai Ah Nam, Nan Ding, Lenny Oliker, Nicholas J. Wright, Samuel Williams, "A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures", PMBS, November 2022,
- Download File: PMBS22_GPU_final.pdf (pdf: 719 KB)
Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", RESDIS, https://resdis.github.io/ws/2022/sc/, November 18, 2022,
- Download File: Methodology-for-Evaluating-the-Potential-of-Disaggregated-Memory-Systems.pdf (pdf: 5.1 MB)
K. Ibrahim, L. Oliker,, "Preprocessing Pipeline Optimization for Scientific Deep-Learning Workloads", IPDPS 22, June 3, 2022,
- Download File: SciML-optimization-12.pdf (pdf: 17 MB)
2021
Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,
- Download File: pmbs21-DL-final.pdf (pdf: 632 KB)
Drew Paine
2021
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, "Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows", 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), November 15, 2021, doi: 10.1109/WORKS54523.2021.00014
Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.
Drew Paine, Sarah Poon, Lavanya Ramakrishnan, "Investigating User Experiences with Data Abstractions on High Performance Computing Systems", June 29, 2021, LBNL LBNL-2001374,
Scientific exploration generates expanding volumes of data that commonly require High Performance Computing (HPC) systems to facilitate research. HPC systems are complex ecosystems of hardware and software that frequently are not user friendly. The Usable Data Abstractions (UDA) project set out to build usable software for scientific workflows in HPC environments by undertaking multiple rounds of qualitative user research. Qualitative research investigates how individuals accomplish their work and our interview-based study surfaced a variety of insights about the experiences of working in and with HPC ecosystems. This report examines multiple facets to the experiences of scientists and developers using and supporting HPC systems. We discuss how stakeholders grasp the design and configuration of these systems, the impacts of abstraction layers on their ability to successfully do work, and the varied perceptions of time that shape this work. Examining the adoption of the Cori HPC at NERSC we explore the anticipations and lived experiences of users interacting with this system's novel storage feature, the Burst Buffer. We present lessons learned from across these insights to illustrate just some of the challenges HPC facilities and their stakeholders need to account for when procuring and supporting these essential scientific resources to ensure their usability and utility to a variety of scientific practices.
Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy, Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, "Experiences with Reproducibility: Case Studies from Scientific Workflows", (P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems, ACM, June 21, 2021, doi: 10.1145/3456287.3465478
Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Michael Beach, Drew Paine, Lavanya Ramakrishnan, "Science Capsule - Capturing the Data Life Cycle", Journal of Open Source Software, 2021, 6:2484, doi: 10.21105/joss.02484
Marco Pritoni, Drew Paine, Gabriel Fierro, Cory Mosiman, Michael Poplawski, Joel Bender, Jessica Granderson, "Metadata Schemas and Ontologies for Building Energy Applications: A Critical Review and Use Case Analysis", Energies, April 6, 2021, doi: 10.3390/en14072024
Digital and intelligent buildings are critical to realizing efficient building energy operations and a smart grid. With the increasing digitalization of processes throughout the life cycle of buildings, data exchanged between stakeholders and between building systems have grown significantly. However, a lack of semantic interoperability between data in different systems is still prevalent and hinders the development of energy-oriented applications that can be reused across buildings, limiting the scalability of innovative solutions. Addressing this challenge, our review paper systematically reviews metadata schemas and ontologies that are at the foundation of semantic interoperability necessary to move toward improved building energy operations. The review finds 40 schemas that span different phases of the building life cycle, most of which cover commercial building operations and, in particular, control and monitoring systems. The paper’s deeper review and analysis of five popular schemas identify several gaps in their ability to fully facilitate the work of a building modeler attempting to support three use cases: energy audits, automated fault detection and diagnosis, and optimal control. Our findings demonstrate that building modelers focused on energy use cases will find it difficult, labor intensive, and costly to create, sustain, and use semantic models with existing ontologies. This underscores the significant work still to be done to enable interoperable, usable, and maintainable building models. We make three recommendations for future work by the building modeling and energy communities: a centralized repository with a search engine for relevant schemas, the development of more use cases, and better harmonization and standardization of schemas in collaboration with industry to facilitate their adoption by stakeholders addressing varied energy-focused use cases.
Gilberto Pastorello
2022
B Faybishenko, R Versteeg, G Pastorello, D Dwivedi, C Varadharajan, D Agarwal, Challenging problems of quality assurance and quality control (QA/QC) of meteorological time series data, Stochastic Environmental Research and Risk Assessment, Pages: 1049--1062 2022, doi: 10.1007/s00477-021-02106-w
2021
Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy, Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, "Experiences with Reproducibility: Case Studies from Scientific Workflows", (P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems, ACM, June 21, 2021, doi: 10.1145/3456287.3465478
Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.
D. A. Agarwal, J. Damerow, C. Varadharajan, D. S. Christianson, G. Z. Pastorello, Y.-W. Cheah, L. Ramakrishnan, "Balancing the needs of consumers and producers for scientific data collections", Ecological Informatics, 2021, 62:101251, doi: 10.1016/j.ecoinf.2021.101251
Nirmalendu Patra
2023
Patricia Gonzalez-Guerrero, Kylie Huch, Nirmalendu Patra, Thom Popovici, George Michelogiannakis, "An Area Efficient Superconducting Unary CNN Accelerator", IEEE 24th International Symposium on Quality Electronic Design (ISQED), IEEE, April 2023,
Sean Peisert
2023
Robert Currie, Sean Peisert, Anna Scaglione, Aram Shumavon, Nikhil Ravi, "Data Privacy for the Grid: Toward a Data Privacy Standard for Inverter-Based and Distributed Energy Resources", IEEE Power & Energy Magazine, October 1, 2023,
Sachin Kadam, Anna Scaglione, Nikhil Ravi, Sean Peisert, Brent Lunghino, Aram Shumavon, "Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets", Proceedings of the 2023 IEEE International Conference on Smart Applications, Communications and Networking (SmartNets), Istanbul, Turkey, July 25, 2023,
Raksha Ramakrishna, Anna Scaglione, Tong Wu, Nikhil Ravi, Sean Peisert, "Differential Privacy for Class-based Data: A Practical Gaussian Mechanism", June 23, 2023, doi: 10.1109/TIFS.2023.3289128
Nikhil Ravi, Anna Scaglione, Julieta Giraldez, Parth Pradhan, Chuck Moran, Sean Peisert, "Solar Photovoltaic Systems Metadata Inference and Differentially Private Publication", arXiv preprint arXiv:2304.03749, April 7, 2023, doi: 10.48550/arXiv.2304.03749
Sean Peisert, "The First 20 Years of IEEE Security & Privacy [From the Editors]", IEEE Security & Privacy, April 1, 2023, 21(2):4-6, doi: 10.1109/MSEC.2023.3236420
George Cybenko, Carl Landwehr, Shari Lawrence Pfleeger, Sean Peisert, A 20th Anniversary Episode Chat With S&P Editors, IEEE Security & Privacy, Pages: 9-16 April 2023, doi: 10.1109/MSEC.2023.3239179
Hector G. Martin, Tijana Radivojevic, Jeremy Zucker, Kristofer Bouchard, Jess Sustarich, Sean Peisert, Dan Arnold, Nathan Hillson, Gyorgy Babnigg, Jose M. Marti, Christopher J. Mungall, Gregg T. Beckham, Lucas Waldburger, James Carothers, ShivShankar Sundaram, Deb Agarwal, Blake A. Simmons, Tyler Backman, Deepanwita Banerjee, Deepti Tanjore, Lavanya Ramakrishnan, Anup Singh, "Perspectives for Self-Driving Labs in Synthetic Biology", Current Opinion in Biotechnology, February 2023, doi: 10.1016/j.copbio.2022.102881
2022
Ammar Haydari, Chen-Nee Chuah, Michael Zhang, Jane Macfarlane, Sean Peisert, "Differentially Private Map Matching for Mobility Trajectories", Proceedings of the 2022 Annual Computer Security Applications Conference (ACSAC), Austin, TX, ACM, December 2022, doi: 0.1145/3564625.3567974
Andrew Adams, Emily K. Adams, Dan Gunter, Ryan Kiser, Mark Krenz, Sean Peisert, John Zage, "Roadmap for Securing Operational Technology in NSF Scientific Research", Trusted CI Report, November 16, 2022, doi: 10.5281/zenodo.7327987
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power, "SoK: Limitations of Confidential Computing via TEEs for High-Performance Compute Systems", Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), September 2022,
Yize Chen, Yuanyuan Shi, Daniel Arnold, Sean Peisert, "SAVER: Safe Learning-Based Controller for Real-Time Voltage Regulation", Proceedings of the 2022 IEEE Power Engineering Society (PES) General Meeting, Denver, CO, July 2022,
Emily K. Adams, Daniel Gunter, Ryan Kiser, Mark Krenz, Sean Peisert, Susan Sons, John Zage, "Findings of the 2022 Trusted CI Study on the Security of Operational Technology in NSF Scientific Research", Trusted CI Report, July 15, 2022, doi: doi.org/10.5281/zenodo.6828675
Daniel Arnold, Sy-Toan Ngo, Ciaran Roberts, Yize Chen, Anna Scaglione, Sean Peisert, "Adam-based Augmented Random Search for Control Policies for Distributed Energy Resource Cyber Attack Mitigation", Proceedings of the 2022 American Control Conference (ACC), June 2022,
Sean Peisert, Unsafe at Any Clock Speed: the Insecurity of Computer System Design, Implementation, and Operation [From the Editors], IEEE Security & Privacy, Pages: 4-9 January 2022, doi: 10.0.4.85/MSEC.2021.3127086
2021
Andrew Adams, Kay Avila, Elisa Heymann, Mark Krenz, Jason R. Lee, Barton Miller, Sean Peisert, "Guide to Securing Scientific Software", Trusted CI Report, December 14, 2021, doi: 10.5281/zenodo.5777646
James R. Clavin, Yue Huang, Xin Wang, Pradeep M. Prakash, Sisi Duan, Jianwu Wang, Sean Peisert, "A Framework for Evaluating BFT", Proceedings of the IEEE International Conference on Parallel and Distributed Systems (ICPADS), IEEE, December 2021,
Ammar Haydari, Michael Zhang, Chen-Nee Chuah, Jane Macfarlane, Sean Peisert, Adaptive Differential Privacy Mechanism for Aggregated Mobility Dataset, arXiv preprint arXiv:2112.08487, December 10, 2021,
Yize Chen, Yuanyuan Shi, Daniel Arnold, Sean Peisert, SAVER: Safe Learning-Based Controller for Real-Time Voltage Regulation, arXiv preprint arXiv:2111.15152,, November 30, 2021,
Luca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin, Sean Peisert, W. Bradley Holtz, Anil Aswani, Dipankar Dwivedi, Haruko Wainwright, Ghanshyam Pilania, Benjamin Nachman, Babetta L. Marrone, Nicola Falco, Prabhat, Daniel Arnold, Alejandro Wolf-Yadlin, Sarah Powers, Sharlee Climer, Quinn Jackson, Ty Carlson, Michael Sohn, Petrus Zwart, Neeraj Kumar, Amy Justice, Claire Tomlin, Daniel Jacobson, Gos Micklem, Georgios V. Gkoutos, Peter J. Bickel, Jean-Baptiste Cazier, Juliane Müller, Bobbie-Jo Webb-Robertson, Rick Stevens, Mark Anderson, Ken Kreutz-Delgado, Michael W. Mahoney, James B. Brown,, Learning from Learning Machines: a New Generation of AI Technology to Meet the Needs of Science, arXiv preprint arXiv:2111.13786, November 27, 2021,
Sachin Kadam, Anna Scaglione, Nikhil Ravi, Sean Peisert, Brent Lunghino, Aram Shumavon, Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets, arXiv preprint arXiv:2111.11661, November 23, 2021,
Nikhil Ravi, Anna Scaglione, Sachin Kadam, Reinhard Gentz, Sean Peisert, Brent Lunghino, Emmanuel Levijarvi, Aram Shumavon, Differentially Private K-means Clustering Applied to Meter Data Analysis and Synthesis, arXiv preprint arXiv:2112.03801, November 23, 2021,
Nikhil Ravi, Anna Scaglione, Sean Peisert, Colored Noise Mechanism for Differentially Private Clustering, arXiv preprint arXiv:2111.07850, November 15, 2021,
Yize Chen, Daniel Arnold, Yuanyuan Shi, Sean Peisert, Understanding the Safety Requirements for Learning-based Power Systems Operations, arXiv preprint arXiv:2110.04983, October 11, 2021,
Andrew Adams, Kay Avila, Elisa Heymann, Mark Krenz, Jason R. Lee, Barton Miller, Sean Peisert, "The State of the Scientific Software World: Findings of the 2021 Trusted CI Software Assurance Annual Challenge Interviews", Trusted CI Report, September 29, 2021,
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power,, "Enabling Design Space Exploration for RISC-V Secure Compute Environments", Proceedings of the Fifth Workshop on Computer Architecture Research with RISC-V (CARRV), (co-located with ISCA 2021), June 17, 2021,
Ciaran Roberts, Sy-Toan Ngo, Alexandre Milesi, Anna Scaglione, Sean Peisert, Daniel Arnold, "Deep Reinforcement Learning for Mitigating Cyber-Physical DER Voltage Unbalance Attacks”", Proceedings of the 2021 American Control Conference (ACC), May 2021, doi: 10.23919/ACC50511.2021.9482815
Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert, "Performance Analysis of Scientific Computing Workloads on General Purpose TEEs", Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE, May 2021, doi: 10.1109/IPDPS49936.2021.00115
Sean Peisert, "Trustworthy Scientific Computing", Communications of the ACM (CACM), May 2021, doi: 10.1145/3457191
Fabio Massacci, Trent Jaeger, Sean Peisert, "SolarWinds and the Challenges of Patching: Can We Ever Stop Dancing With the Devil?", IEEE Security & Privacy, April 2021, 14-19, doi: 10.1109/MSEC.2021.3050433
Sean Peisert, Bruce Schneier, Hamed Okhravi, Fabio Massacci, Terry Benzel, Carl Landwehr, Mohammad Mannan, Jelena Mirkovic, Atul Prakash, James Bret Michael, "Perspectives on the SolarWinds Incident", IEEE Security & Privacy, April 2021, 7-13, doi: 10.1109/MSEC.2021.3051235
Sean Peisert, Reflections on the Past, Perspectives on the Future [From the Editors], IEEE Security & Privacy, January 2021, doi: 10.1109/MSEC.2020.3034670
Talita Perciano
2022
Gregory Wallace, Zhe Bai, Robbie Sadre, Talita Perciano, Nicola Bertelli, Syun'ichi Shiraiwa, Wes Bethel, John Wright, "Towards fast and accurate predictions of radio frequency power deposition and current profile via data-driven modelling: applications to lower hybrid current drive", Journal of Plasma Physics, August 18, 2022, 88:895880401, doi: 10.1017/S0022377822000708
M. G. Amankwah, D. Camps, E. W. Bethel, R. Van Beeumen, T. Perciano, "Quantum pixel representations and compression for N-dimensional images", Nature Scientific Reports, May 11, 2022, 12:7712, doi: 10.1038/s41598-022-11024-y
S. Zhang, R. Sadre, B. A. Legg, H. Pyles, T. Perciano, E. W. Bethel, D. Baker, O. Rübel, J. J. D. Yoreo, "Rotational dynamics and transition mechanisms of surface-adsorbed proteins", Proceedings of the National Academy of Sciences, April 11, 2022, 119:e202024211, doi: 10.1073/pnas.2020242119
M. Avaylon, R. Sadre, Z. Bai, T. Perciano, "Adaptable Deep Learning and Probabilistic Graphical Model System for Semantic Segmentation", Advances in Artificial Intelligence and Machine Learnin, March 31, 2022, 2:288--302, doi: 10.54364/AAIML.2022.1119
C Varadharajan, AP Appling, B Arora, DS Christianson, VC Hendrix, V Kumar, AR Lima, J Müller, S Oliver, M Ombadi, T Perciano, JM Sadler, H Weierbach, JD Willard, Z Xu, J Zwart, "Can machine learning accelerate process understanding and decision-relevant predictions of river water quality?", Hydrological Processes, January 1, 2022, 36, doi: 10.1002/hyp.14565
2021
V. Dumont, C. Garner, A. Trivedi, C. Jones, V. Ganapati, J. Mueller, T. Perciano, M. Kiran, and M. Day, "HYPPO: A Surrogate-Based Multi-Level Parallelism Tool for Hyperparameter Optimization", 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), November 15, 2021,
E. W. Bethel, C. Heinemann, and T. Perciano, "Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel", Eurographics Symposium on Parallel Graphics and Visualization, June 14, 2021,
Sarah Poon
2021
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, "Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows", 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), November 15, 2021, doi: 10.1109/WORKS54523.2021.00014
Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.
Drew Paine, Sarah Poon, Lavanya Ramakrishnan, "Investigating User Experiences with Data Abstractions on High Performance Computing Systems", June 29, 2021, LBNL LBNL-2001374,
Scientific exploration generates expanding volumes of data that commonly require High Performance Computing (HPC) systems to facilitate research. HPC systems are complex ecosystems of hardware and software that frequently are not user friendly. The Usable Data Abstractions (UDA) project set out to build usable software for scientific workflows in HPC environments by undertaking multiple rounds of qualitative user research. Qualitative research investigates how individuals accomplish their work and our interview-based study surfaced a variety of insights about the experiences of working in and with HPC ecosystems. This report examines multiple facets to the experiences of scientists and developers using and supporting HPC systems. We discuss how stakeholders grasp the design and configuration of these systems, the impacts of abstraction layers on their ability to successfully do work, and the varied perceptions of time that shape this work. Examining the adoption of the Cori HPC at NERSC we explore the anticipations and lived experiences of users interacting with this system's novel storage feature, the Burst Buffer. We present lessons learned from across these insights to illustrate just some of the challenges HPC facilities and their stakeholders need to account for when procuring and supporting these essential scientific resources to ensure their usability and utility to a variety of scientific practices.
Doru Thom Popovici
2023
Patricia Gonzalez-Guerrero, Kylie Huch, Nirmalendu Patra, Thom Popovici, George Michelogiannakis, "An Area Efficient Superconducting Unary CNN Accelerator", IEEE 24th International Symposium on Quality Electronic Design (ISQED), IEEE, April 2023,
2021
Md Abdul M Faysal, Shaikh Arifuzzaman, Cy Chan, Maximilian Bremer, Doru Popovici, John Shalf, "HyPC-Map: A Hybrid Parallel Community Detection Algorithm Using Information-Theoretic Approach", HPEC, September 20, 2021,
Raksha Ramakrishna
2023
Raksha Ramakrishna, Anna Scaglione, Tong Wu, Nikhil Ravi, Sean Peisert, "Differential Privacy for Class-based Data: A Practical Gaussian Mechanism", June 23, 2023, doi: 10.1109/TIFS.2023.3289128
Lavanya Ramakrishnan
2023
Hector G. Martin, Tijana Radivojevic, Jeremy Zucker, Kristofer Bouchard, Jess Sustarich, Sean Peisert, Dan Arnold, Nathan Hillson, Gyorgy Babnigg, Jose M. Marti, Christopher J. Mungall, Gregg T. Beckham, Lucas Waldburger, James Carothers, ShivShankar Sundaram, Deb Agarwal, Blake A. Simmons, Tyler Backman, Deepanwita Banerjee, Deepti Tanjore, Lavanya Ramakrishnan, Anup Singh, "Perspectives for Self-Driving Labs in Synthetic Biology", Current Opinion in Biotechnology, February 2023, doi: 10.1016/j.copbio.2022.102881
2022
MB Simmonds, WJ Riley, DA Agarwal, X Chen, S Cholia, R Crystal-Ornelas, ET Coon, D Dwivedi, VC Hendrix, M Huang, A Jan, Z Kakalia, J Kumar, CD Koven, L Li, M Melara, L Ramakrishnan, DM Ricciuto, AP Walker, W Zhi, Q Zhu, C Varadharajan, Guidelines for Publicly Archiving Terrestrial Model Data to Enhance Usability, Intercomparison, and Synthesis, Data Science Journal, 2022, doi: 10.5334/dsj-2022-003
2021
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, "Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows", 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), November 15, 2021, doi: 10.1109/WORKS54523.2021.00014
Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.
Drew Paine, Sarah Poon, Lavanya Ramakrishnan, "Investigating User Experiences with Data Abstractions on High Performance Computing Systems", June 29, 2021, LBNL LBNL-2001374,
Scientific exploration generates expanding volumes of data that commonly require High Performance Computing (HPC) systems to facilitate research. HPC systems are complex ecosystems of hardware and software that frequently are not user friendly. The Usable Data Abstractions (UDA) project set out to build usable software for scientific workflows in HPC environments by undertaking multiple rounds of qualitative user research. Qualitative research investigates how individuals accomplish their work and our interview-based study surfaced a variety of insights about the experiences of working in and with HPC ecosystems. This report examines multiple facets to the experiences of scientists and developers using and supporting HPC systems. We discuss how stakeholders grasp the design and configuration of these systems, the impacts of abstraction layers on their ability to successfully do work, and the varied perceptions of time that shape this work. Examining the adoption of the Cori HPC at NERSC we explore the anticipations and lived experiences of users interacting with this system's novel storage feature, the Burst Buffer. We present lessons learned from across these insights to illustrate just some of the challenges HPC facilities and their stakeholders need to account for when procuring and supporting these essential scientific resources to ensure their usability and utility to a variety of scientific practices.
Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy, Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, "Experiences with Reproducibility: Case Studies from Scientific Workflows", (P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems, ACM, June 21, 2021, doi: 10.1145/3456287.3465478
Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Michael Beach, Drew Paine, Lavanya Ramakrishnan, "Science Capsule - Capturing the Data Life Cycle", Journal of Open Source Software, 2021, 6:2484, doi: 10.21105/joss.02484
D. A. Agarwal, J. Damerow, C. Varadharajan, D. S. Christianson, G. Z. Pastorello, Y.-W. Cheah, L. Ramakrishnan, "Balancing the needs of consumers and producers for scientific data collections", Ecological Informatics, 2021, 62:101251, doi: 10.1016/j.ecoinf.2021.101251
J Müller, B Faybishenko, D Agarwal, S Bailey, C Jiang, Y Ryu, C Tull, L Ramakrishnan, Assessing data change in scientific datasets, Concurrency and Computation: Practice and Experience, 2021, doi: 10.1002/cpe.6245
Katherine Rasmussen
2022
Katherine Rasmussen, Damian Rouson, Naje George, Dan Bonachea, Hussain Kadhem, Brian Friesen, "Agile Acceleration of LLVM Flang Support for Fortran 2018 Parallel Programming", Research Poster at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), November 2022, doi: 10.25344/S4CP4S
The LLVM Flang compiler ("Flang") is currently Fortran 95 compliant, and the frontend can parse Fortran 2018. However, Flang does not have a comprehensive 2018 test suite and does not fully implement the static semantics of the 2018 standard. We are investigating whether agile software development techniques, such as pair programming and test-driven development (TDD), can help Flang to rapidly progress to Fortran 2018 compliance. Because of the paramount importance of parallelism in high-performance computing, we are focusing on Fortran’s parallel features, commonly denoted “Coarray Fortran.” We are developing what we believe are the first exhaustive, open-source tests for the static semantics of Fortran 2018 parallel features, and contributing them to the LLVM project. A related effort involves writing runtime tests for parallel 2018 features and supporting those tests by developing a new parallel runtime library: the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine).
Nikhil Ravi
2023
Robert Currie, Sean Peisert, Anna Scaglione, Aram Shumavon, Nikhil Ravi, "Data Privacy for the Grid: Toward a Data Privacy Standard for Inverter-Based and Distributed Energy Resources", IEEE Power & Energy Magazine, October 1, 2023,
Sachin Kadam, Anna Scaglione, Nikhil Ravi, Sean Peisert, Brent Lunghino, Aram Shumavon, "Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets", Proceedings of the 2023 IEEE International Conference on Smart Applications, Communications and Networking (SmartNets), Istanbul, Turkey, July 25, 2023,
Raksha Ramakrishna, Anna Scaglione, Tong Wu, Nikhil Ravi, Sean Peisert, "Differential Privacy for Class-based Data: A Practical Gaussian Mechanism", June 23, 2023, doi: 10.1109/TIFS.2023.3289128
Nikhil Ravi, Anna Scaglione, Julieta Giraldez, Parth Pradhan, Chuck Moran, Sean Peisert, "Solar Photovoltaic Systems Metadata Inference and Differentially Private Publication", arXiv preprint arXiv:2304.03749, April 7, 2023, doi: 10.48550/arXiv.2304.03749
2021
Sachin Kadam, Anna Scaglione, Nikhil Ravi, Sean Peisert, Brent Lunghino, Aram Shumavon, Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets, arXiv preprint arXiv:2111.11661, November 23, 2021,
Nikhil Ravi, Anna Scaglione, Sachin Kadam, Reinhard Gentz, Sean Peisert, Brent Lunghino, Emmanuel Levijarvi, Aram Shumavon, Differentially Private K-means Clustering Applied to Meter Data Analysis and Synthesis, arXiv preprint arXiv:2112.03801, November 23, 2021,
Nikhil Ravi, Anna Scaglione, Sean Peisert, Colored Noise Mechanism for Differentially Private Clustering, arXiv preprint arXiv:2111.07850, November 15, 2021,
Damian Rouson
2023
Michelle Mills Strout, Damian Rouson, Amir Kamil, Dan Bonachea, Jeremiah Corrado, Paul H. Hargrove, Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran, Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC23), November 12, 2023,
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models.
The tutorial is targeted for users with little-to-no parallel programming experience, but everyone is welcome. A partial differential equation example will be demonstrated in all three programming models. That example and others will be provided to attendees in a virtual environment. Attendees will be shown how to compile and run these programming examples, and the virtual environment will remain available to attendees throughout the conference, along with Slack-based interactive tech support.
Come join us to learn about some productive and performant parallel programming models!
Michelle Mills Strout, Damian Rouson, Amir Kamil, Dan Bonachea, Jeremiah Corrado, Paul H. Hargrove, Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran (CUF23), ECP/NERSC/OLCF Tutorial, July 2023,
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models. This tutorial should be accessible to users with little-to-no parallel programming experience, and everyone is welcome. A partial differential equation example will be demonstrated in all three programming models along with performance and scaling results on big machines. That example and others will be provided in a cloud instance and Docker container. Attendees will be shown how to compile and run these programming examples, and provided opportunities to experiment with different parameters and code alternatives while being able to ask questions and share their own observations. Come join us to learn about some productive and performant parallel programming models!
Secondary tutorial sites by event sponsors:
Damian Rouson, Producing Software for Science with Class, SIAM Conference on Computational Science and Engineering, March 1, 2023,
- Download File: Rouson-SIAM-CSE-2023.pdf (pdf: 7.5 MB)
The Computer Languages and Systems Software (CLaSS) Group at Berkeley Lab researches and develops programming models, languages, libraries, and applications for parallel and quantum computing. The open-source software under development in CLaSS includes the GASNet-EX networking middleware, the UPC++ partitioned global address space (PGAS) template library, the Berkeley Quantum Synthesis Toolkit (BQSKit), and the MetaHipMer metagenome assembler. This talk will start with an overview of CLaSS software and the software sustainability practices commonly employed across the group. The talk will then dive more deeply into the our burgeoning contributions to the ecosystem supporting modern Fortran, including our test development for the LLVM Flang Fortran compiler. This presentation will demonstrate how agile software development techniques are helping to ensure robust front-end support for standard Fortran 2018 parallel programming features. The talk will also present several key insights that inspired our design and development of the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine) parallel runtime library, emphasizing the design choices that help to ensure sustainability. Lastly, the talk will demonstrate the productivity benefits associated with the first Caffeine application in Motility Analysis of T-Cell Histories in Activation (Matcha).
Brad Richardson, Damian Rouson, Harris Snyder, Robert Singelterry, "Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran", Workshop on Asynchronous Many-Task Systems and Applications (WAMTA'23), Baton Rouge, LA, February 2023, doi: 10.25344/S4ZC73
Most parallel scientific programs contain compiler directives (pragmas) such as those from OpenMP, explicit calls to runtime library procedures such as those implementing the Message Passing Interface (MPI), or compiler-specific language extensions such as those provided by CUDA. By contrast, the recent Fortran standards empower developers to express parallel algorithms without directly referencing lower-level parallel programming models. Fortran’s parallel features place the language within the Partitioned Global Address Space (PGAS) class of programming models. When writing programs that exploit data-parallelism, application developers often find it straightforward to develop custom parallel algorithms. Problems involving complex, heterogeneous, staged calculations, however, pose much greater challenges. Such applications require careful coordination of tasks in a manner that respects dependencies prescribed by a directed acyclic graph. When rolling one’s own solution proves difficult, extending a customizable framework becomes attractive. The paper presents the design, implementation, and use of the Framework for Extensible Asynchronous Task Scheduling (FEATS), which we believe to be the first task-scheduling tool written in modern Fortran. We describe the benefits and compromises associated with choosing Fortran as the implementation language, and we propose ways in which future Fortran standards can best support the use case in this paper.
Paul H. Hargrove, Dan Bonachea, Johnny Corbino, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'23)", Poster at Exascale Computing Project (ECP) Annual Meeting 2023, January 2023,
The Pagoda project is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. The first component is GASNet-EX, a portable, high-performance, global-address-space communication library. The second component is UPC++, a C++ template library. Together, these libraries enable agile, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
GASNet-EX is a portable, high-performance communications middleware library which leverages hardware support to implement Remote Memory Access (RMA) and Active Message communication primitives. GASNet-EX supports a broad ecosystem of alternative HPC programming models, including UPC++, Legion, Chapel and multiple implementations of UPC and Fortran Coarrays. GASNet-EX is implemented directly over the native APIs for networks of interest in HPC. The tight semantic match of GASNet-EX APIs to the client requirements and hardware capabilities often yields better performance than competing libraries.
UPC++ provides high-level productivity abstractions appropriate for Partitioned Global Address Space (PGAS) programming such as: remote memory access (RMA), remote procedure call (RPC), support for accelerators (e.g. GPUs), and mechanisms for aggressive asynchrony to hide communication costs. UPC++ implements communication using GASNet-EX, delivering high performance and portability from laptops to exascale supercomputers. HPC application software using UPC++ includes: MetaHipMer2 metagenome assembler, SIMCoV viral propagation simulation, NWChemEx TAMM, and graph computation kernels from ExaGraph.
2022
"Berkeley Lab’s Networking Middleware GASNet Turns 20: Now, GASNet-EX is Gearing Up for the Exascale Era", Linda Vu, HPCWire (Lawrence Berkeley National Laboratory CS Area Communications), December 7, 2022, doi: 10.25344/S4BP4G
GASNet Celebrates 20th Anniversary
For 20 years, Berkeley Lab’s GASNet has been fueling developers’ ability to tap the power of massively parallel supercomputers more effectively. The middleware was recently upgraded to support exascale scientific applications.
Katherine Rasmussen, Damian Rouson, Naje George, Dan Bonachea, Hussain Kadhem, Brian Friesen, "Agile Acceleration of LLVM Flang Support for Fortran 2018 Parallel Programming", Research Poster at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), November 2022, doi: 10.25344/S4CP4S
The LLVM Flang compiler ("Flang") is currently Fortran 95 compliant, and the frontend can parse Fortran 2018. However, Flang does not have a comprehensive 2018 test suite and does not fully implement the static semantics of the 2018 standard. We are investigating whether agile software development techniques, such as pair programming and test-driven development (TDD), can help Flang to rapidly progress to Fortran 2018 compliance. Because of the paramount importance of parallelism in high-performance computing, we are focusing on Fortran’s parallel features, commonly denoted “Coarray Fortran.” We are developing what we believe are the first exhaustive, open-source tests for the static semantics of Fortran 2018 parallel features, and contributing them to the LLVM project. A related effort involves writing runtime tests for parallel 2018 features and supporting those tests by developing a new parallel runtime library: the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine).
Damian Rouson, Dan Bonachea, "Caffeine: CoArray Fortran Framework of Efficient Interfaces to Network Environments", Proceedings of the Eighth Annual Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC2022), Dallas, Texas, USA, IEEE, November 2022, doi: 10.25344/S4459B
This paper provides an introduction to the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine), a parallel runtime library built atop the GASNet-EX exascale networking library. Caffeine leverages several non-parallel Fortran features to write type- and rank-agnostic interfaces and corresponding procedure definitions that support parallel Fortran 2018 features, including communication, collective operations, and related services. One major goal is to develop a runtime library that can eventually be considered for adoption by LLVM Flang, enabling that compiler to support the parallel features of Fortran. The paper describes the motivations behind Caffeine's design and implementation decisions, details the current state of Caffeine's development, and previews future work. We explain how the design and implementation offer benefits related to software sustainability by lowering the barrier to user contributions, reducing complexity through the use of Fortran 2018 C-interoperability features, and high performance through the use of a lightweight communication substrate.
William F. Godoy, Ritu Arora, Keith Beattie, David E. Bernholdt, Sarah E. Bratt, Daniel S. Katz, Ignacio Laguna, Amiya K. Maji, Addi Malviya-Thakur, Rafael M. Mudafort, Nitin Sukhija, Damian Rouson, Cindy Rubio-Gonzalez, Karan Vahi, "Giving Research Software Engineers a Larger Stage Through the Better Scientific Software Fellowship", Computing in Science & Engineering, October 2022, 24 (5):6-13, doi: 10.1109/MCSE.2023.3253847
The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. The BSSwF’s vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). Case studies from several of the program’s participants illustrate the diverse ways the BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as the BSSwF can help recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations, and ideas for a larger audience.
Paul H. Hargrove, Dan Bonachea, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'22)", Poster at Exascale Computing Project (ECP) Annual Meeting 2022, May 5, 2022,
We present UPC++ and GASNet-EX, distributed libraries which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Procedure Call (RPC) and for Remote Memory Access (RMA) to host and GPU memories. The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems.
2021
Katherine A. Yelick, Amir Kamil, Damian Rouson, Dan Bonachea, Paul H. Hargrove, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications (SC21), Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), November 15, 2021,
UPC++ is a C++ library supporting Partitioned Global Address Space (PGAS) programming. UPC++ offers low-overhead one-sided Remote Memory Access (RMA) and Remote Procedure Calls (RPC), along with future/promise-based asynchrony to express dependencies between computation and asynchronous data movement. UPC++ supports simple/regular data structures as well as more elaborate distributed applications where communication is fine-grained and/or irregular. UPC++ provides a uniform abstraction for one-sided RMA between host and GPU/accelerator memories anywhere in the system. UPC++'s support for aggressive asynchrony enables applications to effectively overlap communication and reduce latency stalls, while the underlying GASNet-EX communication library delivers efficient low-overhead RMA/RPC on HPC networks.
This tutorial introduces UPC++, covering the memory and execution models and basic algorithm implementations. Participants gain hands-on experience incorporating UPC++ features into application proxy examples. We examine a few UPC++ applications with irregular communication (metagenomic assembler and COVID-19 simulation) and describe how they utilize UPC++ to optimize communication performance.
Oliver Rübel
2022
S. Zhang, R. Sadre, B. A. Legg, H. Pyles, T. Perciano, E. W. Bethel, D. Baker, O. Rübel, J. J. D. Yoreo, "Rotational dynamics and transition mechanisms of surface-adsorbed proteins", Proceedings of the National Academy of Sciences, April 11, 2022, 119:e202024211, doi: 10.1073/pnas.2020242119
2021
Hamish A. Carr, Gunther H. Weber, Christopher M. Sewell, Oliver R\ ubel, Patricia Fasel, James P. Ahrens, "Scalable Contour Tree Computation by Data Parallel Peak Pruning", Transactions on Visualization and Computer Graphics, 2021, 27:2437--2454, doi: 10.1109/TVCG.2019.2948616
Hamish Carr, Oliver Rübel, Gunther H. Weber, James Ahrens, "Optimization and Augmentation for Data Parallel Contour Trees", IEEE Transactions on Visualization and Computer Graphics, 2021, doi: 10.1109/TVCG.2021.3064385
Anna Scaglione
2023
Robert Currie, Sean Peisert, Anna Scaglione, Aram Shumavon, Nikhil Ravi, "Data Privacy for the Grid: Toward a Data Privacy Standard for Inverter-Based and Distributed Energy Resources", IEEE Power & Energy Magazine, October 1, 2023,
Sachin Kadam, Anna Scaglione, Nikhil Ravi, Sean Peisert, Brent Lunghino, Aram Shumavon, "Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets", Proceedings of the 2023 IEEE International Conference on Smart Applications, Communications and Networking (SmartNets), Istanbul, Turkey, July 25, 2023,
Raksha Ramakrishna, Anna Scaglione, Tong Wu, Nikhil Ravi, Sean Peisert, "Differential Privacy for Class-based Data: A Practical Gaussian Mechanism", June 23, 2023, doi: 10.1109/TIFS.2023.3289128
Nikhil Ravi, Anna Scaglione, Julieta Giraldez, Parth Pradhan, Chuck Moran, Sean Peisert, "Solar Photovoltaic Systems Metadata Inference and Differentially Private Publication", arXiv preprint arXiv:2304.03749, April 7, 2023, doi: 10.48550/arXiv.2304.03749
2022
Daniel Arnold, Sy-Toan Ngo, Ciaran Roberts, Yize Chen, Anna Scaglione, Sean Peisert, "Adam-based Augmented Random Search for Control Policies for Distributed Energy Resource Cyber Attack Mitigation", Proceedings of the 2022 American Control Conference (ACC), June 2022,
2021
Sachin Kadam, Anna Scaglione, Nikhil Ravi, Sean Peisert, Brent Lunghino, Aram Shumavon, Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets, arXiv preprint arXiv:2111.11661, November 23, 2021,
Nikhil Ravi, Anna Scaglione, Sachin Kadam, Reinhard Gentz, Sean Peisert, Brent Lunghino, Emmanuel Levijarvi, Aram Shumavon, Differentially Private K-means Clustering Applied to Meter Data Analysis and Synthesis, arXiv preprint arXiv:2112.03801, November 23, 2021,
Nikhil Ravi, Anna Scaglione, Sean Peisert, Colored Noise Mechanism for Differentially Private Clustering, arXiv preprint arXiv:2111.07850, November 15, 2021,
Ciaran Roberts, Sy-Toan Ngo, Alexandre Milesi, Anna Scaglione, Sean Peisert, Daniel Arnold, "Deep Reinforcement Learning for Mitigating Cyber-Physical DER Voltage Unbalance Attacks”", Proceedings of the 2021 American Control Conference (ACC), May 2021, doi: 10.23919/ACC50511.2021.9482815
Oguz Selvitopi
2022
Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", RESDIS, https://resdis.github.io/ws/2022/sc/, November 18, 2022,
- Download File: Methodology-for-Evaluating-the-Potential-of-Disaggregated-Memory-Systems.pdf (pdf: 5.1 MB)
2021
Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad, "Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale", 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021, doi: 10.1109/IPDPS49936.2021.00018
Nazanin Jafari, Oguz Selvitopi, Cevdet Aykanat, "Fast shared-memory streaming multilevel graph partitioning", Journal of Parallel and Distributed Computing, January 2021, 147:140-151, doi: https://doi.org/10.1016/j.jpdc.2020.09.004
O Selvitopi, B Brock, I Nisa, A Tripathy, K Yelick, A Buluç, "Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication", Proceedings of the International Conference on Supercomputing, January 2021, 431--442, doi: 10.1145/3447818.3461472
Koushik Sen
2021
Ed Younis, Koushik Sen, Katherine Yelick, Costin Iancu, QFAST: Quantum Synthesis Using a Hierarchical Continuous Circuit Space, Bulletin of the American Physical Society, March 2021,
We present QFAST, a quantum synthesis tool designed to produce short circuits and to scale well in practice. Our contributions are: 1) a novel representation of circuits able to encode placement and topology; 2) a hierarchical approach with an iterative refinement formulation that combines "coarse-grained" fast optimization during circuit structure search with a good, but slower, optimization stage only in the final circuit instantiation. When compared against state-of-the-art techniques, although not always optimal, QFAST can reduce circuits for "time-dependent evolution" algorithms, as used by domain scientists, by 60x in depth. On typical circuits, it provides 4x better depth reduction than the widely used Qiskit and UniversalQ compilers. We also show the composability and tunability of our formulation in terms of circuit depth and running time. For example, we show how to generate shorter circuits by plugging in the best available third party synthesis algorithm at a given hierarchy level. Composability enables portability across chip architectures, which is missing from similar approaches.
QFAST is integrated with Qiskit and available at github.com/bqskit.
Jean Sexton
2021
Jean Sexton, Zarija Lukic, Ann Almgren, Chris Daley, Brian Friesen, Andrew Myers, and Weiqun Zhang, "Nyx: A Massively Parallel AMR Code for Computational Cosmology", The Journal Of Open Source Software, July 10, 2021,
Courtney Shafer
2021
Courtney Shafer, Daniel F Martin and Esmond G Ng, "Comparing the Shallow-Shelf and L1L2 Approximations using BISICLES in the Context of MISMIP+ with Buttressing Effects", AGU Fall Meeting, December 13, 2021,
Tamsin L. Edwards, Sophie Nowicki, Ben Marzeion, Regine Hock, Heiko Goelzer, Hélène Seroussi, Nicolas C. Jourdain, Donald A. Slater, Fiona E. Turner, Christopher J. Smith, Christine M. McKenna, Erika Simon, Ayako Abe-Ouchi, Jonathan M. Gregory, Eric Larour, William H. Lipscomb, Antony J. Payne, Andrew Shepherd, Cécile Agosta, Patrick Alexander, Torsten Albrecht, Brian Anderson, Xylar Asay-Davis, Andy Aschwanden, Alice Barthel, Andrew Bliss, Reinhard Calov, Christopher Chambers, Nicolas Champollion, Youngmin Choi, Richard Cullather, Joshua Cuzzone, Christophe Dumas, Denis Felikson, Xavier Fettweis, Koji Fujita, Benjamin K. Galton-Fenzi, Rupert Gladstone, Nicholas R. Golledge, Ralf Greve, Tore Hattermann, Matthew J. Hoffman, Angelika Humbert, Matthias Huss, Philippe Huybrechts, Walter Immerzeel, Thomas Kleiner, Philip Kraaijenbrink, Sébastien Le clec’h, Victoria Lee, Gunter R. Leguy, Christopher M. Little, Daniel P. Lowry, Jan-Hendrik Malles, Daniel F. Martin, Fabien Maussion, Mathieu Morlighem, James F. O’Neill, Isabel Nias, Frank Pattyn, Tyler Pelle, Stephen F. Price, Aurélien Quiquet, Valentina Radić, Ronja Reese, David R. Rounce, Martin Rückamp, Akiko Sakai, Courtney Shafer, Nicole-Jeanne Schlegel, Sarah Shannon, Robin S. Smith, Fiammetta Straneo, Sainan Sun, Lev Tarasov, Luke D. Trusel, Jonas Van Breedam, Roderik van de Wal, Michiel van den Broeke, Ricarda Winkelmann, Harry Zekollari, Chen Zhao, Tong Zhang, Thomas Zwinger, "Projected land ice contributions to twenty-first-century sea level rise", Nature, May 5, 2021, 593:74-82, doi: 10.1038/s41586-021-03302-y
- Download File: Edwards-et-al-2021-Nature-preprint.pdf (pdf: 40 MB)
John M. Shalf
2023
Zhenguo Wu, Liang Yuan Dai, Asher Novick, Madeleine Glick, Ziyi Zhu, Sébastien Rumley, George Michelogiannakis, John Shalf, Keren Bergman, "Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications", IEEE Journal of Lightwave Technology, May 2023,
2022
John Shalf, George Michelogiannakis, Heterogeneous Integration for HPC, OCP global summit, October 19, 2022,
- Download File: chiplets_2022.pdf (pdf: 1.2 MB)
George Michelogiannakis, Madeleine Glick, John Shalf, Keren Bergman, Photonics as a Means to Implement Intra-rack Resource Disaggregation, SPIE photonics west, March 2022,
George Michelogiannakis, Madeleine Glick, John Shalf, Keren Bergman, "Photonics as a means to implement intra-rack resource disaggregation", Proceedings Volume 12027, Metro and Data Center Optical Networks and Short-Reach Links V, March 2022, doi: https://doi.org/10.1117/12.2607317
George Michelogiannakis, Benjamin Klenk, Brandon Cook, Min Yee Teh, Madeleine Glick, Larry Dennison, Keren Bergman, John Shalf, "A Case For Intra-Rack Resource Disaggregation in HPC", ACM Transactions on Architecture and Code Optimization, February 2022,
2021
Md Abdul M Faysal, Shaikh Arifuzzaman, Cy Chan, Maximilian Bremer, Doru Popovici, John Shalf, "HyPC-Map: A Hybrid Parallel Community Detection Algorithm Using Information-Theoretic Approach", HPEC, September 20, 2021,
Georgios Tzimpragos, Jennifer Volk, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, John Shalf, Timothy Sherwood, "Temporal Computing With Superconductors", IEEE MIcro, March 2021, 41:71-79, doi: 10.1109/MM.2021.3066377
George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman, Maximizing The Impact of Emerging Photonic Switches At The System Level, SPIE photonics west, March 2021,
- Download File: photonics-west-2021.pdf (pdf: 770 KB)
George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman, "Maximizing the impact of emerging photonic switches at the system level", SPIE 11692, Optical Interconnects XXI, 116920Z, March 2021,
Wissam M. Sid-Lakhdar
2021
Y. Liu, W. M. Sid-Lakhdar, O. Marques, X. Zhu, C. Meng, J. W. Demmel, X. S. Li, "GPTune: multitask learning for autotuning exascale applications", PPoPP, February 17, 2021, doi: 10.1145/3437801.3441621
Irfan Siddiqi
2021
Akel Hashim, Ravi Naik, Alexis Morvan, Jean-Loup Ville, Brad Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin O Brien, Ian Hincks, Joel Wallman, Joseph V Emerson, David Ivan Santiago, Irfan Siddiqi, Scalable Quantum Computing on a Noisy Superconducting Quantum Processor via Randomized Compiling, Bulletin of the American Physical Society, 2021,
Coherent errors in quantum hardware severely limit the performance of quantum algorithms in an unpredictable manner, and mitigating their impact is necessary for realizing reliable, large-scale quantum computations. Randomized compiling achieves this goal by converting coherent errors into stochastic noise, dramatically reducing unpredictable errors in quantum algorithms and enabling accurate predictions of aggregate performance via cycle benchmarking estimates. In this work, we demonstrate significant performance gains under randomized compiling for both the four-qubit quantum Fourier transform algorithm and for random circuits of variable depth on a superconducting quantum processor. We also validate solution accuracy using experimentally-measured error rates. Our results demonstrate that randomized compiling can be utilized to maximally-leverage and predict the capabilities of modern-day noisy quantum processors, paving the way forward for scalable quantum computing.
Alex Sim
2023
R. Monga, A. Sim (advisor), K. Wu (advisor), "Comparative Study of the Cache Utilization Trends for Regional Scientific Data Caches", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’23), ACM Student Research Competition (SRC), 2023,
H-C. Yang, L. Jin, A. Lazar, A. Todd-Blick, A. Sim, K. Wu, Q. Chen, C. A. Spurlock, "Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective", Systems, 2023, 11(6):314, doi: 10.3390/systems11060314
R. Shao, A. Sim, K. Wu, J. Kim, "Leveraging History to Predict Abnormal Transfers in Distributed Workflows", Sensors, 2023, 23(12):5485, doi: 10.3390/s23125485
Z. Deng, A. Sim, K. Wu, C. Guok, I. Monga, F. Andrijauskas, F. Wuerthwein, D. Weitzel, "Analyzing Transatlantic Network Traffic Patterns with Scientific Data Caches", 6th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2023), 2023, doi: 10.1145/3589012.3594897
C. Guok, E. Kissel, A. Sim, ESnet's In-Network Caching Pilot, The Network Conference 2023 (TNC'23), 2023,
E. Kissel, A. Sim, C. Guok, Experiences in deploying in-network data caches, 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), 2023,
J. Bellavita, C. Sim, K. Wu, A. Sim, S. Yoo, H. Ito, V. Garonne, E. Lancon, Understanding Data Access Patterns for dCache System, 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), 2023,
C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, F. Wurthwein, D. Davila, H. Newman, J. Balcas, Predicting Resource Usage Trends with Southern California Petabyte Scale Cache, 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), 2023,
S. Kim, A. Sim, K. Wu, S. Byna, Y. Son, H. Eom, "Design and Implementation of I/O Performance Prediction Scheme on HPC Systems through Large-scale Log Analysis", Journal of Big Data, 2023, 10(65), doi: 10.1186/s40537-023-00741-4
C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, F. Wurthwein, D. Davila, H. Newman, J. Balcas, "Effectiveness and predictability of in-network storage cache for Scientific Workflows", International Conference on Computing, Networking and Communication (ICNC 2023), 2023, doi: 10.1109/ICNC57223.2023.10074058
J. Wang, K. Wu, A. Sim, S. Hwangbo, "Locating Partial Discharges in Power Transformers with Convolutional Iterative Filtering", Sensors, 2023, 23, doi: 10.3390/s23041789
H-C. Yang, L. Jin, A. Lazar, A. Todd-Blick, A. Sim, K. Wu, Q. Chen, C. A. Spurlock, Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective, Transportation Research Board 102nd Annual Meeting,, 2023,
J. Bang, A. Sim, G. Lockwood, H. Eom, H. Sung, "Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems", IEEE Access, 2023, doi: 10.1109/ACCESS.2022.3233829
2022
Julian Bellavita, Alex Sim (advisor), John Wu (advisor), "Predicting Scientific Dataset Popularity Using dCache Logs", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC), Second place winner, 2022,
The dCache installation is a storage management system that acts as a disk cache for high-energy physics (HEP) data. Storagespace on dCache is limited relative to persistent storage devices, therefore, a heuristic is needed to determine what data should be kept in the cache. A good cache policy would keep frequently accessed data in the cache, but this requires knowledge of future dataset popularity. We present methods for forecasting the number of times a dataset stored on dCache will be accessed in the future. We present a deep neural network that can predict future dataset accesses accurately, reporting a final normalized loss of 4.6e-8. We present a set of algorithms that can forecast future dataset accesses given an access sequence. Included are two novel algorithms, Backup Predictor and Last N Successors, that outperform other file prediction algorithms. Findings suggest that it is possible to anticipate dataset popularity in advance.
C. Sim, C. Guok (advisor), A. Sim (advisor), K. Wu (advisor), "Data Throughput Performance Trends of Regional Scientific Data Cache", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC), 2022,
J. Wang, K. Wu, A. Sim, S. Hwangbo, "Feature Engineering and Classification Models for Partial Discharge in Power Transformers", arXiv, 2022, doi: 10.48550/arXiv.2210.12216
L. Jin, A. Lazar, C. Brown, V. Garikapati, B. Sun, S. Ravulaparthy, Q. Chen, A. Sim, K. Wu, T. Wenzel, T. Ho, C. A. Spurlock, "What Makes You Hold onto That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions", Frontiers in Future Transportation, Connected Mobility and Automation, 2022, 3:894654, doi: 10.3389/ffutr.2022.894654
Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son, "Design and implementation of dynamic I/O control scheme for large scale distributed file systems", Cluster Computing, 2022, 25(6):1--16, doi: 10.1007/s10586-022-03640-0
- Download File: wu2022.bib (bib: 22 KB)
R. Han, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, J. Balcas, H. Newman, "Access Trends of In-network Cache for Scientific Data", 5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA), in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534110
J. Bellavita, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, "Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches", 5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534111
R. Shao, J. Kim A. Sim, K. Wu, "Predicting Slow Connections in Scientific Computing", 5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534112
J. Kim, M. Cafaro, J. Chou, A. Sim, "SNTA’22: The 5th Workshop on Systems and Network Telemetry and Analytics", In the proceedings of The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'22), 2022, doi: 10.1145/3502181.3535108
D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, W. Arndt, J. Blaschke, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, T. Lehman, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, L. Stephey, R. Thomas, G. Torok, "LBNL Superfacility Project Report", Lawrence Berkeley National Laboratory, 2022, doi: 10.48550/arXiv.2206.11992
Yujing Ma, Florin Rusu, Kesheng Wu, Alexander Sim, 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Pages: 1088--1097 2022, doi: 10.1109/IPDPSW55747.2022.00177
- Download File: wu2022.bib (bib: 22 KB)
J. Kim, M. Jin, Y. Homma, A. Sim, W. Kroeger, K. Wu, "Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific Workflow", arXiv, 2022, doi: 10.48550/arXiv.2205.09703
K. Wang, S. Lee, J. Balewski, A. Sim, P. Nugent, A. Agrawal, A. Choudhary, K. Wu, W-K. Liao, "Using Multi-resolution Data to Accelerate Neural Network Training in Scientific Applications", 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2022), 2022, doi: 10.1109/CCGrid54584.2022.00050
B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, "Enhancing IoT Anomaly Detection Performance for Federated Learning", Digital Communications and Networks, Special Issue on Edge Computation and Intelligence, 2022, doi: 10.1016/j.dcan.2022.02.007
A. Sim, E. Kissel, C. Guok, "Deploying in-network caches in support of distributed scientific data sharing", arXiv whitepaper, 2022, doi: /10.48550/arXiv.2203.06843
John Wu, Ben Brown, Paolo Calafiura, Quincey Koziol, Dongeun Lee, Alex Sim, Devesh Tiwari, Support for In-Flight Data Analyses in Scientific Workflows, DOE ASCR Workshop on the Management and Storage of Scientific Data, 2022, doi: 10.2172/1843500
John Wu, Bin Dong, Alex Sim, Automating Data Management Through Unified Runtime Systems, DOE ASCR Workshop on the Management and Storage of Scientific Data, 2022, doi: 10.2172/1843500
A. Pereira, A. Sim, K. Wu, S. Yoo, H. Ito, "Data access pattern analysis for dCache storage system", International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022), 2022,
Ling Jin, Alina Lazar, Caitlin Brown, Bingrong Sun, Venu Garikapati, Srinath Ravulaparthy, Qianmiao Chen, Alexander Sim, Kesheng Wu, Tin Ho, Thomas Wenzel, C. Anna Spurlock, What Makes You Hold on to That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions, Transportation Research Board 101st Annual Meeting, 2022,
- Download File: wu2022.bib (bib: 22 KB)
2021
J. Bang, C. Kim, K. Wu, A. Sim, S. Byna, H. Sung, H. Eom, "An In-Depth I/O Pattern Analysis in HPC Systems", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021, doi: 10.1109/HiPC53243.2021.00056
S. Lee, Q. Kang, K. Wang, J. Balewski, A. Sim, A. Agrawal, A. Choudhary, P. Nugent, K. Wu, W-K. Liao, "Asynchronous I/O Strategy for Large-Scale Deep Learning Applications", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021, doi: 10.1109/HiPC53243.2021.00046
A. Lazar, L. Jin, C. Brown, C. A. Spurlock, A. Sim, K. Wu, "Performance of the Gold Standard and Machine Learning in Predicting Vehicle Transactions", the 3rd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD 2021), 2021, doi: 10.1109/BigData52589.2021.9671286
J. Cheung, A. Sim, J. Kim, K. Wu, "Performance Prediction of Large Data Transfers", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), ACM Student Research Competition (SRC), 2021,
A. Syal, A. Lazar, J. Kim, A. Sim, K. Wu, "Network traffic performance analysis from passive measurements using gradient boosting machine learning", International Journal of Big Data Intelligence, 2021, 8:13-30, doi: 10.1504/IJBDI.2021.118741
Y. Ma, F. Rusu, K. Wu, A. Sim, Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers, arXiv preprint arXiv:2110.07029, 2021,
E. Copps, A. Sim (Advisor), K. Wu (Advisor), "Analyzing scientific data sharing patterns with in-network data caching", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2021), ACM Student Research Competition (SRC), 2021,
M. Nakashima, A. Sim, Y. Kim, J. Kim, J. Kim, "Automated Feature Selection for Anomaly Detection in Network Traffic Data", ACM Transactions on Management Information Systems (TMIS), 2021, 12:1-28, doi: 10.1145/3446636
A. Lazar, A. Sim, K. Wu, "GPU-based Classification for Wireless Intrusion Detection", 4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464445
Y. Wang, K. Wu, A. Sim, S. Yoo, S. Misawa, "Access Patterns of Disk Cache for Large Scientific Archive", 4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464444
E. Copps, H. Zhang, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, E. Fajardo, "Analyzing scientific data sharing patterns with in-network data caching", 4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464441
Y. Ma, F. Ruso, A. Sim, K. Wu, "Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU+GPU Architectures", Heterogeneity in Computing Workshop (HCW 2021), in conjunction with the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2021, doi: 10.1109/IPDPSW52791.2021.00012
J. Kim, A. Sim, J. Kim, K, Wu, J. Hahm, Improving Botnet Detection with Recurrent Neural Network and Transfer Learning, arXiv preprint arXiv:2104.12602, 2021,
Ethan Smith
2021
Akel Hashim, Ravi K. Naik, Alexis Morvan, Jean-Loup Ville, Bradley Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin P. O Brien, Ian Hincks, Joel J. Wallman, Joseph Emerson, Irfan Siddiqi, "Randomized Compiling for Scalable Quantum Computing on a Noisy Superconducting Quantum Processor", Physical Review X, 2021, 11:041039, doi: 10.1103/PhysRevX.11.041039
Akel Hashim, Ravi Naik, Alexis Morvan, Jean-Loup Ville, Brad Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin O Brien, Ian Hincks, Joel Wallman, Joseph V Emerson, David Ivan Santiago, Irfan Siddiqi, Scalable Quantum Computing on a Noisy Superconducting Quantum Processor via Randomized Compiling, Bulletin of the American Physical Society, 2021,
Coherent errors in quantum hardware severely limit the performance of quantum algorithms in an unpredictable manner, and mitigating their impact is necessary for realizing reliable, large-scale quantum computations. Randomized compiling achieves this goal by converting coherent errors into stochastic noise, dramatically reducing unpredictable errors in quantum algorithms and enabling accurate predictions of aggregate performance via cycle benchmarking estimates. In this work, we demonstrate significant performance gains under randomized compiling for both the four-qubit quantum Fourier transform algorithm and for random circuits of variable depth on a superconducting quantum processor. We also validate solution accuracy using experimentally-measured error rates. Our results demonstrate that randomized compiling can be utilized to maximally-leverage and predict the capabilities of modern-day noisy quantum processors, paving the way forward for scalable quantum computing.
Ishan Srivastava
2023
I. Srivastava, D. R. Ladiges, A. Nonaka, A. L. Garcia, J. B. Bell, "Staggered Scheme for the Compressible Fluctuating Hydrodynamics of Multispecies Fluid Mixtures", Physical Review E, January 24, 2023, 107:015305, doi: 10.1103/PhysRevE.107.015305
2022
D. R. Ladiges, J. G. Wang, I. Srivastava, S. P. Carney, A. Nonaka, A. L. Garcia, A. Donev, J. B. Bell, "Modeling Electrokinetic Flows with the Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm", Physical Review E, November 19, 2022, 106:035104, doi: 10.1103/PhysRevE.106.035104
J. M. Monti, J. T. Clemmer, I. Srivastava, L. E. Silbert, G. S. Grest, J. B. Lechman, "Large-Scale Frictionless Jamming with Power-Law Particle Size Distributions", Physical Review E, September 2, 2022, 106:034901, doi: 10.1103/PhysRevE.106.034901
A. P. Santos, I. Srivastava, L. E. Silbert, J. B. Lechman, G. S. Grest, "Fluctuations and power-law scaling of dry, frictionless granular rheology near the hard-particle limit", Physical Review Fluids, August 19, 2022, 7:084303, doi: 10.1103/PhysRevFluids.7.084303
W. D. Fullmer, R. Porcu, J. Musser, A. S. Almgren, I. Srivastava, "The Divergence of Nearby Trajectories in Soft-Sphere DEM", Particuology, April 1, 2022, 63:1 - 8, doi: 10.1016/j.partic.2021.06.008
2021
J. T. Clemmer, I. Srivastava, G. S. Grest, J. B. Lechman, "Shear is Not Always Simple: Rate-Dependent Effects of Loading Geometry on Granular Rheology", Physical Review Letters, December 22, 2021, 127:268003, doi: 10.1103/PhysRevLett.127.268003
I. Srivastava, L. E. Silbert, J. B. Lechman, G. S. Grest, "Flow and Arrest in Stressed Granular Materials", Soft Matter, December 17, 2021, doi: 10.1039/D1SM01344K
I. Srivastava, S. A. Roberts, J. T. Clemmer, L. E. Silbert, J. B. Lechman, G. S. Grest, "Jamming of Bidisperse Frictional Spheres", Physical Review Research, August 13, 2021, 3:L032042, doi: 10.1103/PhysRevResearch.3.L032042
Houjun Tang
2022
Houjun Tang, Quincey Koziol, John Ravi, and Suren Byna,, "Transparent Asynchronous Parallel I/O using Background Threads", IEEE Transactions on Parallel and Distributed Systems, April 4, 2022, 33, doi: 10.1109/TPDS.2021.3090322
2021
Cong Xu, Suparna Bhattacharya, Martin Foltin, Suren Byna, and Paolo Faraboschi, "Data-Aware Storage Tiering for Deep Learning", 6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21, November 21, 2021,
Houjun Tang, Bing Xie, Suren Byna, Phillip Carns, Quincey Koziol, Sudarsun Kannan, Jay Lofstead, and Sarp Oral,, "SCTuner: An Auto-tuner Addressing Dynamic I/O Needs on Supercomputer I/O Sub-systems", 6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21, November 21, 2021,
Suren Byna, Houjun Tang, and Quincey Koziol,, Automatic and Transparent Scientific Data Management with Object Abstractions, PASC 2021, in a Minisymposium on "Data Movement Orchestration on HPC Systems", July 31, 2021,
Bing Xie, Houjun Tang, Suren Byna, Jesse Hanley, Quincey Koziol, Tonglin Li, Sarp Oral,, "Battle of the Defaults: Extracting Performance Characteristics of HDF5 under Production Load", CCGrid 2021, May 31, 2021,
David McCallen, Houjun Tang, Suiwen Wu, Eric Eckert, Junfei Huang, N Anders Petersson, "Coupling of regional geophysics and local soil-structure models in the EQSIM fault-to-structure earthquake simulation framework", The International Journal of High Performance Computing Applications, May 25, 2021, doi: 10.1177/10943420211019118
David McCallen, Anders Petersson, Arthur Rodgers, Arben Pitarka, Mamun Miah, Floriana Petrone, Bjorn Sjogreen, Norman Abrahamson, Houjun Tang, "EQSIM—A multidisciplinary framework for fault-to-structure earthquake simulations on exascale computers part I: Computational models and workflow", Earthquake Spectra, May 1, 2021, 37:707-735, doi: 10.1177/8755293020970982
Jean Luca Bez, Houjun Tang, Bing Xie, David Williams-Young, Rob Latham, Rob Ross, Sarp Oral, Suren Byna, "I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis", 2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW), January 1, 2021, 15-22, doi: 10.1109/PDSW54622.2021.00008
Tonglin Li, Suren Byna, Quincey Koziol, Houjun Tang, Jean Luca Bez, Qiao Kang, "h5bench: HDF5 I/O Kernel Suite for Exercising HPC I/O Patterns", Cray User Group (CUG) 2021, January 1, 2021,
Will Thacher
2023
Will Thacher and Hans Johansen and Daniel Martin, "A high order Cartesian grid, finite volume method for elliptic interface problems", Journal of Computational Physics, October 15, 2023, 491, doi: 10.1016/j.jcp.2023.112351
David Trebotich
2023
Tim Kneafsey, David Trebotich, Terry Ligocki, "Direct Numerical Simulation of Flow Through Nanoscale Shale Pores in a Mesoscale Sample", Album of Porous Media, edited by E.F. Médici, A.D. Otero, (Springer Cham: April 14, 2023) Pages: 87 doi: https://doi.org/10.1007/978-3-031-23800-0_69
Sergi Molins, David Trebotich, "Pore-Scale Controls on Calcite Dissolution using Direct Numerical Simulations", Album of Porous Media, edited by E.F. Médici, A.D. Otero, (Springer Cham: April 14, 2023) Pages: 135 doi: https://doi.org/10.1007/978-3-031-23800-0_112
David Trebotich, Terry Ligocki, "High Resolution Simulation of Fluid Flow in Press Felts Used in Paper Manufacturing", Album of Porous Media, edited by E.F. Médici, A.D. Otero, (Springer Cham: April 14, 2023) Pages: 132 doi: https://doi.org/10.1007/978-3-031-23800-0_109
2021
T. Groves, N. Ravichandrasekaran, B. Cook, N. Keen, D. Trebotich, N. Wright, B. Alverson, D. Roweth, K. Underwood, "Not All Applications Have Boring Communication Patterns: Profiling Message Matching with BMM", Concurrency and Computation: Practice and Experience, April 26, 2021, doi: 0.1002/cpe.6380
Roel Van Beeumen
2022
M. G. Amankwah, D. Camps, E. W. Bethel, R. Van Beeumen, T. Perciano, "Quantum pixel representations and compression for N-dimensional images", Nature Scientific Reports, May 11, 2022, 12:7712, doi: 10.1038/s41598-022-11024-y
2021
R. Van Beeumen, L. Perisa, D. Kressner, C. Yang, "A Flexible Power Method for Solving Infinite Dimensional Tensor Eigenvalue Problems", January 30, 2021,
Brian Van Straalen
2023
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2022
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2021
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Dilip Vasudevan
2023
Ran Cheng, Christoph Kirst, Dilip Vasudevan, "Superconducting-Oscillatory Neural Network With Pixel Error Detection for Image Recognition", IEEE Transaction on Applied Superconductivity, August 2023, 33:1-7,
Dilip Vasudevan, George Michelogiannakis, "Efficient Temporal Arithmetic Logic Design for Superconducting RSFQ Logic", IEEE Transactions on Applied Superconductivity, March 2023,
2021
George Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Bautista, Dilip Vasudevan, Anastasiia Butko, "SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC", IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021,
Georgios Tzimpragos, Jennifer Volk, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, John Shalf, Timothy Sherwood, "Temporal Computing With Superconductors", IEEE MIcro, March 2021, 41:71-79, doi: 10.1109/MM.2021.3066377
Georgios Tzimpragos, Jennifer Volk, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, John Shalf, Timothy Sherwood, "Temporal Computing With Superconductors", IEEE MIcro, March 2021, 41:71-79, doi: 10.1109/MM.2021.3066377
Bin Wang
2021
Serges Love Teutu Talla, Isabelle Kemajou-Brown, Cy Chan, Bin Wang, "A Binary Multi-Subsystems Transportation Networks Estimation using Mobiliti Data", 2021 American Control Conference (ACC), May 25, 2021,
Hengjie Wang
2022
Hengjie Wang, Robert Planas, Aparna Chandramowlishwaran, Ramin Bostanabad, "Mosaic flows: A transferable deep learning framework for solving PDEs on unseen domains", Computer Methods in Applied Mechanics and Engineering, 2022, 389:114424,
J. Galen Wang
2022
D. R. Ladiges, J. G. Wang, I. Srivastava, S. P. Carney, A. Nonaka, A. L. Garcia, A. Donev, J. B. Bell, "Modeling Electrokinetic Flows with the Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm", Physical Review E, November 19, 2022, 106:035104, doi: 10.1103/PhysRevE.106.035104
2021
J. Galen Wang, Roseanna N. Zia, "Vitrification is a spontaneous non-equilibrium transition driven by osmotic pressure", Journal of Physics: Condensed Matter, April 23, 2021, doi: 10.1088/1361-648x/abeec0
Xiange Wang
2023
Nathan A. Kimbrel, Allison E. Ashley-Koch, Xue J. Qin, Jennifer H. Lindquist, Melanie E. Garrett, Michelle F. Dennis, Lauren P. Hair, Jennifer E. Huffman, Daniel A. Jacobson, Ravi K. Madduri, Jodie A. Trafton, Hilary Coon, Anna R. Docherty, Niamh Mullins, Douglas M. Ruderfer, Philip D. Harvey, Benjamin H. McMahon, David W. Oslin, Jean C. Beckham, Elizabeth R. Hauser, Michael A. Hauser, Million Veteran Program Suicide Exemplar Workgroup, International Suicide Genetics Consortium, Veterans Affairs Mid-Atlantic Mental Illness Research Education and Clinical Center Workgroup, Veterans Affairs Million Veteran Program, "Identification of Novel, Replicable Genetic Risk Loci for Suicidal Thoughts and Behaviors Among US Military Veterans", JAMA Psychiatry, February 1, 2023, 80:100-191, doi: 10.1001/jamapsychiatry.2022.3896
2022
Xiange Wang, Rafael Zamora-Resendiz, Courtney D. Shelley, Carrie Manore, Xinlian Liu, David W. Oslin, Benjamin McMahon, Jean C. Beckham, Nathan A. Kimbrel, Silvia Crivelli, "An examination of the association between altitude and suicide deaths, suicide attempts, and suicidal ideation among veterans at both the patient and geospatial level", Journal, July 11, 2022,
Destinee Morrow, Rafael Zamora-Resendiz, Jean C Beckham, Nathan A Kimbrel, David W Oslin, Suzanne Tamang, Million Veteran Program Suicide Exemplar Workgroup, Silvia Crivelli, "A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes", Journal of Psychiatric Research, July 1, 2022, 151:328-338, doi: 10.1016/j.jpsychires.2022.04.009
Daniel Waters
2023
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Paul H. Hargrove, Dan Bonachea, Johnny Corbino, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'23)", Poster at Exascale Computing Project (ECP) Annual Meeting 2023, January 2023,
The Pagoda project is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. The first component is GASNet-EX, a portable, high-performance, global-address-space communication library. The second component is UPC++, a C++ template library. Together, these libraries enable agile, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
GASNet-EX is a portable, high-performance communications middleware library which leverages hardware support to implement Remote Memory Access (RMA) and Active Message communication primitives. GASNet-EX supports a broad ecosystem of alternative HPC programming models, including UPC++, Legion, Chapel and multiple implementations of UPC and Fortran Coarrays. GASNet-EX is implemented directly over the native APIs for networks of interest in HPC. The tight semantic match of GASNet-EX APIs to the client requirements and hardware capabilities often yields better performance than competing libraries.
UPC++ provides high-level productivity abstractions appropriate for Partitioned Global Address Space (PGAS) programming such as: remote memory access (RMA), remote procedure call (RPC), support for accelerators (e.g. GPUs), and mechanisms for aggressive asynchrony to hide communication costs. UPC++ implements communication using GASNet-EX, delivering high performance and portability from laptops to exascale supercomputers. HPC application software using UPC++ includes: MetaHipMer2 metagenome assembler, SIMCoV viral propagation simulation, NWChemEx TAMM, and graph computation kernels from ExaGraph.
2022
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Paul H. Hargrove, Dan Bonachea, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'22)", Poster at Exascale Computing Project (ECP) Annual Meeting 2022, May 5, 2022,
We present UPC++ and GASNet-EX, distributed libraries which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Procedure Call (RPC) and for Remote Memory Access (RMA) to host and GPU memories. The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
2021
Daniel Waters, Colin A. MacLean, Dan Bonachea, Paul H. Hargrove, "Demonstrating UPC++/Kokkos Interoperability in a Heat Conduction Simulation (Extended Abstract)", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S4630V
We describe the replacement of MPI with UPC++ in an existing Kokkos code that simulates heat conduction within a rectangular 3D object, as well as an analysis of the new code’s performance on CUDA accelerators. The key challenges were packing the halos in Kokkos data structures in a way that allowed for UPC++ remote memory access, and streamlining synchronization costs. Additional UPC++ abstractions used included global pointers, distributed objects, remote procedure calls, and futures. We also make use of the device allocator concept to facilitate data management in memory with unique properties, such as GPUs. Our results demonstrate that despite the algorithm’s good semantic match to message passing abstractions, straightforward modifications to use UPC++ communication deliver vastly improved performance and scalability in the common case. We find the one-sided UPC++ version written in a natural way exhibits good performance, whereas the message-passing version written in a straightforward way exhibits performance anomalies. We argue this represents a productivity benefit for one-sided communication models.
Paul H. Hargrove, Dan Bonachea, Colin A. MacLean, Daniel Waters, "GASNet-EX Memory Kinds: Support for Device Memory in PGAS Programming Models", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21) Research Poster, November 2021, doi: 10.25344/S4P306
Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work includes two major components: UPC++ (a C++ template library) and GASNet-EX (a portable, high-performance communication library). This poster describes recent advances in GASNet-EX to efficiently implement Remote Memory Access (RMA) operations to and from memory on accelerator devices such as GPUs. Performance is illustrated via benchmark results from UPC++ and the Legion programming system, both using GASNet-EX as their communications library.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Paul H. Hargrove, Dan Bonachea, Max Grossman, Amir Kamil, Colin A. MacLean, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'21)", Poster at Exascale Computing Project (ECP) Annual Meeting 2021, April 2021,
We present UPC++ and GASNet-EX, which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Memory Access (RMA) and Remote Procedure Call (RPC). The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems
Gunther H. Weber
2022
E. Wes Bethel, Burlen Loring, Utkarsh Ayachit, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, Dave Pugmire, Silvio Rizzi, Thompson, Will Usher, Gunther H. Weber, Brad Whitlock, Wolf, Kesheng Wu, "Proximity Portability and In Transit, M-to-N Data Partitioning and Movement in SENSEI", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_20
E. Wes Bethel, Burlen Loring, Utkarsh Ayatchit, David Camp, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, David Pugmire, Silvio Rizzi, Thompson, Gunther H. Weber, Brad Whitlock, Matthew Wolf, Kesheng Wu, "The SENSEI Generic In Situ Interface: Tool and Processing Portability at Scale", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_13
Sugeerth Murugesan, Mariam Kiran, Bernd Hamann, Gunther H. Weber, "Netostat: Analyzing Dynamic Flow Patterns in High-Speed Networks", Cluster Computing, 2022, doi: 10.1007/s10586-022-03543-0
2021
Jan-Tobias Sohns, Gunther H. Weber, Christoph Garth, "Distributed Task-Parallel Topology-Controlled Volume Rendering", Topological Methods in Data Analysis and Visualization VI: Theory, Algorithms, and Applications, (Springer International Publishing: 2021) Pages: 55-69 doi: 10.1007/978-3-030-83500-2_4
Hamish A. Carr, Gunther H. Weber, Christopher M. Sewell, Oliver R\ ubel, Patricia Fasel, James P. Ahrens, "Scalable Contour Tree Computation by Data Parallel Peak Pruning", Transactions on Visualization and Computer Graphics, 2021, 27:2437--2454, doi: 10.1109/TVCG.2019.2948616
Hamish Carr, Oliver Rübel, Gunther H. Weber, James Ahrens, "Optimization and Augmentation for Data Parallel Contour Trees", IEEE Transactions on Visualization and Computer Graphics, 2021, doi: 10.1109/TVCG.2021.3064385
Robbie Sadre, Colin Ophus, Anstasiia Butko, Gunther H Weber, "Deep Learning Segmentation of Complex Features in Atomic-Resolution Phase Contrast Transmission Electron Microscopy Images", Microscopy and Microanalysis, 2021, doi: 10.1017/S1431927621000167
Stefan M. Wild
2023
Raghu Bollapragada, Stefan M. Wild, "Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization", Mathematical Programming Computation, 2023, 15:327--364, doi: 10.1007/s12532-023-00233-9
Tyler H. Chang, Stefan M. Wild, ParMOO: A Python library for parallel multiobjective simulation optimization, Journal of Open Source Software, Pages: 4468 2023, doi: 10.21105/joss.04468
2022
V. Cirigliano, Z. Davoudi, J. Engel, R. J. Furnstahl, G. Hagen, U. Heinz, H. Hergert, M. Horoi, C. W. Johnson, A. Lovato, E. Mereghetti, W. Nazarewicz, A. Nicholson, T. Papenbrock, S. Pastore, M. Plumlee, D. R. Phillips, P. E. Shanahan, S. R. Stroberg, F. Viens, A. Walker-Loud, K. A. Wendt, S. M. Wild, "Towards Precise and Accurate Calculations of Neutrinoless Double-Beta Decay", Journal of Physics G: Nuclear and Particle Physics, 2022, 49:120502, doi: 10.1088/1361-6471/aca03e
Ozge Surer, Filomena M. Nunes, Matthew Plumlee, Stefan M. Wild, "Uncertainty Quantification in Breakup Reactions", Physical Review C, 2022, 106:024607, doi: 10.1103/PhysRevC.106.024607
V. Cirigliano, Z. Davoudi, J. Engel, R. J. Furnstahl, G. Hagen, U. Heinz, H. Hergert, M. Horoi, C. W. Johnson, A. Lovato, E. Mereghetti, W. Nazarewicz, A. Nicholson, T. Papenbrock, S. Pastore, M. Plumlee, D. R. Phillips, P. E. Shanahan, S. R. Stroberg, F. Viens, A. Walker-Loud, K. A. Wendt, S. M. Wild, "Towards Precise and Accurate Calculations of Neutrinoless Double-Beta Decay: Project Scoping Workshop Report", 2022, doi: 10.48550/ARXIV.2207.01085
Aleksandra Ciprijanovic, Diana Kafkes, Gregory Snyder, F. Javier Sanchez, Gabriel Nathan Perdue, Kevin Pedro, Brian Nord, Sandeep Madireddy, Stefan M. Wild, "DeepAdversaries: Examining the Robustness of Deep Learning Models for Galaxy Morphology Classification", Machine Learning: Science and Technology, 2022, 3:035007, doi: 10.1088/2632-2153/ac7f1a
Stephen Hudson, Jeffrey Larson, John-Luke Navarro, Stefan M. Wild, "libEnsemble: A Library to Coordinate the Concurrent Evaluation of Dynamic Ensembles of Calculations", IEEE Transactions on Parallel and Distributed Systems, 2022, 33:977--988, doi: 10.1109/TPDS.2021.3082815
2021
Jed Brown, Yunhui He, Scott MacLachlan, Matt Menickelly, Stefan M. Wild, "Tuning Multigrid Methods with Robust Optimization and Local Fourier Analysis", SIAM Journal on Scientific Computing, 2021, A109--A138, doi: 10.1137/19m1308669
Donald Willcox
2023
H. Klion, R. Jambunathan, M. E. Rowan, E. Yang, D. Willcox, J.-L. Vay, R. Lehe, A. Myers, A. Huebl, W. Zhang, "Particle-in-Cell Simulations of Relativistic Magnetic Reconnection with Advanced Maxwell Solver Algorithms", arXiv preprint, submitted to The Astrophysical Journal, April 20, 2023,
Samuel W. Williams
2023
Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, February 8, 2023,
2022
Taylor Groves, Chris Daley, Rahulkumar Gayatri, Hai Ah Nam, Nan Ding, Lenny Oliker, Nicholas J. Wright, Samuel Williams, "A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures", PMBS, November 2022,
- Download File: PMBS22_GPU_final.pdf (pdf: 719 KB)
Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", RESDIS, https://resdis.github.io/ws/2022/sc/, November 18, 2022,
- Download File: Methodology-for-Evaluating-the-Potential-of-Disaggregated-Memory-Systems.pdf (pdf: 5.1 MB)
Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams, "Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations", MCHPC, November 2022,
- Download File: MCHPC22_final.pdf (pdf: 401 KB)
Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, May 2022,
2021
Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,
- Download File: pmbs21-DL-final.pdf (pdf: 632 KB)
Marco Siracusa, Emanuele Del Sozzo, Marco Rabozzi, Lorenzo Di Tucci, Samuel Williams, Donatella Sciuto, Marco Domenico Santambrogio, "A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model", Transactions on Computers (TC), September 2021, doi: 10.1109/TC.2021.3111761
Tan Nguyen, Colin MacLean, Marco Siracusa, Douglas Doerfler, Nicholas J. Wright, Samuel Williams, "FPGA‐based HPC accelerators: An evaluation on performance and energy efficiency", CCPE, August 22, 2021, doi: 10.1002/cpe.6570
Nan Ding, Muaaz Awan, Samuel Williams, "Instruction Roofline: An insightful visual performance model for GPUs", CCPE, August 4, 2021, doi: 10.1002/cpe.6591
Nan Ding, Yang Liu, Samuel Williams, Xiaoye S. Li, "A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), July 19, 2021,
- Download File: Multi-GPU-SpTRSV-ACDA21-.pdf (pdf: 897 KB)
Charlene Yang, Yunsong Wang, Thorsten Kurth, Steven Farrell, Samuel Williams, "Hierarchical Roofline Performance Analysis for Deep Learning Applications", Intelligent Computing, LNNS, July 15, 2021, doi: 10.1007/978-3-030-80126-7
Douglas Doerfler, Farzad Fatollahi-Fard, Colin MacLean, Tan Nguyen, Samuel Williams, Nicholas J. Wright, Marco Siracusa, "Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs", International Workshop on OpenCL (iWOCL), April 2021, doi: 10.1145/3456669.3456671
Samuel Williams, Roofline Analysis on NVIDIA GPUs, ECP Annual Meeting, April 2021,
- Download File: ECP21-Roofline-2-NVIDIA.pdf (pdf: 14 MB)
Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, April 2021,
- Download File: ECP21-Roofline-1-intro.pdf (pdf: 22 MB)
Tuowen Zhao, Mary Hall, Hans Johansen, Samuel Williams, "Improving Communication by Optimizing On-Node Data Movement with Data Layout", PPoPP, February 2021,
- Download File: PPoPP-Bricks-MPI-final.pdf (pdf: 864 KB)
David Bruce Williams-Young
2021
Karol Kowalski, Raymond Bair, Nicholas P. Bauman, Jeffery S. Boschen, Eric J. Bylaska, Jeff Daily, Wibe A. de Jong, Thom Dunning, Niranjan Govind, Robert J. Harrison, Murat Keceli, Kristopher Keipert, Sriram Krishnamoorthy, Suraj Kumar, Erdal Mutlu, Bruce Palmer, Ajay Panyala, Bo Peng, Ryan M. Richard, T. P. Straatsma, Peter Sushko, Edward F. Valeev, Marat Valiev, Hubertus J. J. van Dam, Jonathan M. Waldrop, David B. Williams-Young, Chao Yang, Marcin Zalewski, Theresa L. Windus, "From NWChem to NWChemEx: Evolving with the Computational Chemistry Landscape", Chemical Reviews, March 31, 2021, doi: 10.1021/acs.chemrev.0c00998
Jean Luca Bez, Houjun Tang, Bing Xie, David Williams-Young, Rob Latham, Rob Ross, Sarp Oral, Suren Byna, "I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis", 2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW), January 1, 2021, 15-22, doi: 10.1109/PDSW54622.2021.00008
Nicholas J. Wright
2021
Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,
- Download File: pmbs21-DL-final.pdf (pdf: 632 KB)
Tan Nguyen, Colin MacLean, Marco Siracusa, Douglas Doerfler, Nicholas J. Wright, Samuel Williams, "FPGA‐based HPC accelerators: An evaluation on performance and energy efficiency", CCPE, August 22, 2021, doi: 10.1002/cpe.6570
Kesheng Wu
2023
R. Monga, A. Sim (advisor), K. Wu (advisor), "Comparative Study of the Cache Utilization Trends for Regional Scientific Data Caches", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’23), ACM Student Research Competition (SRC), 2023,
H-C. Yang, L. Jin, A. Lazar, A. Todd-Blick, A. Sim, K. Wu, Q. Chen, C. A. Spurlock, "Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective", Systems, 2023, 11(6):314, doi: 10.3390/systems11060314
R. Shao, A. Sim, K. Wu, J. Kim, "Leveraging History to Predict Abnormal Transfers in Distributed Workflows", Sensors, 2023, 23(12):5485, doi: 10.3390/s23125485
Z. Deng, A. Sim, K. Wu, C. Guok, I. Monga, F. Andrijauskas, F. Wuerthwein, D. Weitzel, "Analyzing Transatlantic Network Traffic Patterns with Scientific Data Caches", 6th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2023), 2023, doi: 10.1145/3589012.3594897
J. Bellavita, C. Sim, K. Wu, A. Sim, S. Yoo, H. Ito, V. Garonne, E. Lancon, Understanding Data Access Patterns for dCache System, 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), 2023,
C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, F. Wurthwein, D. Davila, H. Newman, J. Balcas, Predicting Resource Usage Trends with Southern California Petabyte Scale Cache, 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), 2023,
S. Kim, A. Sim, K. Wu, S. Byna, Y. Son, H. Eom, "Design and Implementation of I/O Performance Prediction Scheme on HPC Systems through Large-scale Log Analysis", Journal of Big Data, 2023, 10(65), doi: 10.1186/s40537-023-00741-4
C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, F. Wurthwein, D. Davila, H. Newman, J. Balcas, "Effectiveness and predictability of in-network storage cache for Scientific Workflows", International Conference on Computing, Networking and Communication (ICNC 2023), 2023, doi: 10.1109/ICNC57223.2023.10074058
J. Wang, K. Wu, A. Sim, S. Hwangbo, "Locating Partial Discharges in Power Transformers with Convolutional Iterative Filtering", Sensors, 2023, 23, doi: 10.3390/s23041789
H-C. Yang, L. Jin, A. Lazar, A. Todd-Blick, A. Sim, K. Wu, Q. Chen, C. A. Spurlock, Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective, Transportation Research Board 102nd Annual Meeting,, 2023,
2022
Julian Bellavita, Alex Sim (advisor), John Wu (advisor), "Predicting Scientific Dataset Popularity Using dCache Logs", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC), Second place winner, 2022,
The dCache installation is a storage management system that acts as a disk cache for high-energy physics (HEP) data. Storagespace on dCache is limited relative to persistent storage devices, therefore, a heuristic is needed to determine what data should be kept in the cache. A good cache policy would keep frequently accessed data in the cache, but this requires knowledge of future dataset popularity. We present methods for forecasting the number of times a dataset stored on dCache will be accessed in the future. We present a deep neural network that can predict future dataset accesses accurately, reporting a final normalized loss of 4.6e-8. We present a set of algorithms that can forecast future dataset accesses given an access sequence. Included are two novel algorithms, Backup Predictor and Last N Successors, that outperform other file prediction algorithms. Findings suggest that it is possible to anticipate dataset popularity in advance.
C. Sim, C. Guok (advisor), A. Sim (advisor), K. Wu (advisor), "Data Throughput Performance Trends of Regional Scientific Data Cache", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC), 2022,
J. Wang, K. Wu, A. Sim, S. Hwangbo, "Feature Engineering and Classification Models for Partial Discharge in Power Transformers", arXiv, 2022, doi: 10.48550/arXiv.2210.12216
L. Jin, A. Lazar, C. Brown, V. Garikapati, B. Sun, S. Ravulaparthy, Q. Chen, A. Sim, K. Wu, T. Wenzel, T. Ho, C. A. Spurlock, "What Makes You Hold onto That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions", Frontiers in Future Transportation, Connected Mobility and Automation, 2022, 3:894654, doi: 10.3389/ffutr.2022.894654
Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son, "Design and implementation of dynamic I/O control scheme for large scale distributed file systems", Cluster Computing, 2022, 25(6):1--16, doi: 10.1007/s10586-022-03640-0
- Download File: wu2022.bib (bib: 22 KB)
R. Han, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, J. Balcas, H. Newman, "Access Trends of In-network Cache for Scientific Data", 5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA), in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534110
J. Bellavita, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, "Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches", 5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534111
R. Shao, J. Kim A. Sim, K. Wu, "Predicting Slow Connections in Scientific Computing", 5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534112
Yujing Ma, Florin Rusu, Kesheng Wu, Alexander Sim, 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Pages: 1088--1097 2022, doi: 10.1109/IPDPSW55747.2022.00177
- Download File: wu2022.bib (bib: 22 KB)
J. Kim, M. Jin, Y. Homma, A. Sim, W. Kroeger, K. Wu, "Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific Workflow", arXiv, 2022, doi: 10.48550/arXiv.2205.09703
K. Wang, S. Lee, J. Balewski, A. Sim, P. Nugent, A. Agrawal, A. Choudhary, K. Wu, W-K. Liao, "Using Multi-resolution Data to Accelerate Neural Network Training in Scientific Applications", 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2022), 2022, doi: 10.1109/CCGrid54584.2022.00050
B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, "Enhancing IoT Anomaly Detection Performance for Federated Learning", Digital Communications and Networks, Special Issue on Edge Computation and Intelligence, 2022, doi: 10.1016/j.dcan.2022.02.007
Lipeng Wan, Axel Huebl, Junmin Gu, Franz Poeschel, Ana Gainaru, Ruonan Wang, Jieyang Chen, Xin Liang, Dmitry Ganyushin, Todd Munson, Ian Foster, Jean-Luc Vay, Norbert Podhorszki, Kesheng Wu, Scott Klasky, "Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization", IEEE Transactions on Parallel and Distributed Systems, 2022, 33:878-890, doi: 10.1109/TPDS.2021.3100784
John Wu, Ben Brown, Paolo Calafiura, Quincey Koziol, Dongeun Lee, Alex Sim, Devesh Tiwari, Support for In-Flight Data Analyses in Scientific Workflows, DOE ASCR Workshop on the Management and Storage of Scientific Data, 2022, doi: 10.2172/1843500
John Wu, Bin Dong, Alex Sim, Automating Data Management Through Unified Runtime Systems, DOE ASCR Workshop on the Management and Storage of Scientific Data, 2022, doi: 10.2172/1843500
A. Pereira, A. Sim, K. Wu, S. Yoo, H. Ito, "Data access pattern analysis for dCache storage system", International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022), 2022,
Ling Jin, Alina Lazar, Caitlin Brown, Bingrong Sun, Venu Garikapati, Srinath Ravulaparthy, Qianmiao Chen, Alexander Sim, Kesheng Wu, Tin Ho, Thomas Wenzel, C. Anna Spurlock, What Makes You Hold on to That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions, Transportation Research Board 101st Annual Meeting, 2022,
- Download File: wu2022.bib (bib: 22 KB)
E. Wes Bethel, Burlen Loring, Utkarsh Ayachit, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, Dave Pugmire, Silvio Rizzi, Thompson, Will Usher, Gunther H. Weber, Brad Whitlock, Wolf, Kesheng Wu, "Proximity Portability and In Transit, M-to-N Data Partitioning and Movement in SENSEI", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_20
E. Wes Bethel, Burlen Loring, Utkarsh Ayatchit, David Camp, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, David Pugmire, Silvio Rizzi, Thompson, Gunther H. Weber, Brad Whitlock, Matthew Wolf, Kesheng Wu, "The SENSEI Generic In Situ Interface: Tool and Processing Portability at Scale", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_13
2021
J. Bang, C. Kim, K. Wu, A. Sim, S. Byna, H. Sung, H. Eom, "An In-Depth I/O Pattern Analysis in HPC Systems", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021, doi: 10.1109/HiPC53243.2021.00056
S. Lee, Q. Kang, K. Wang, J. Balewski, A. Sim, A. Agrawal, A. Choudhary, P. Nugent, K. Wu, W-K. Liao, "Asynchronous I/O Strategy for Large-Scale Deep Learning Applications", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021, doi: 10.1109/HiPC53243.2021.00046
A. Lazar, L. Jin, C. Brown, C. A. Spurlock, A. Sim, K. Wu, "Performance of the Gold Standard and Machine Learning in Predicting Vehicle Transactions", the 3rd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD 2021), 2021, doi: 10.1109/BigData52589.2021.9671286
J. Cheung, A. Sim, J. Kim, K. Wu, "Performance Prediction of Large Data Transfers", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), ACM Student Research Competition (SRC), 2021,
B Mohammed, M Kiran; N Krishnaswamy; Keshang, Wu, "Predicting WAN Traffic Volumes using Fourier and Multivariate SARIMA Approach", International Journal of Big Data Intelligence, November 3, 2021, doi: 10.1504/IJBDI.2021.118742
A. Syal, A. Lazar, J. Kim, A. Sim, K. Wu, "Network traffic performance analysis from passive measurements using gradient boosting machine learning", International Journal of Big Data Intelligence, 2021, 8:13-30, doi: 10.1504/IJBDI.2021.118741
Y. Ma, F. Rusu, K. Wu, A. Sim, Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers, arXiv preprint arXiv:2110.07029, 2021,
E. Copps, A. Sim (Advisor), K. Wu (Advisor), "Analyzing scientific data sharing patterns with in-network data caching", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2021), ACM Student Research Competition (SRC), 2021,
A. Lazar, A. Sim, K. Wu, "GPU-based Classification for Wireless Intrusion Detection", 4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464445
Y. Wang, K. Wu, A. Sim, S. Yoo, S. Misawa, "Access Patterns of Disk Cache for Large Scientific Archive", 4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464444
E. Copps, H. Zhang, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, E. Fajardo, "Analyzing scientific data sharing patterns with in-network data caching", 4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464441
Y. Ma, F. Ruso, A. Sim, K. Wu, "Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU+GPU Architectures", Heterogeneity in Computing Workshop (HCW 2021), in conjunction with the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2021, doi: 10.1109/IPDPSW52791.2021.00012
J. Kim, A. Sim, J. Kim, K, Wu, J. Hahm, Improving Botnet Detection with Recurrent Neural Network and Transfer Learning, arXiv preprint arXiv:2104.12602, 2021,
Donghun Koo, Jaehwan Lee, Jialin Liu, Eun-Kyu Byun, Jae-Hyuck Kwak, Glenn K Lockwood, Soonwook Hwang, Katie Antypas, Kesheng Wu, Hyeonsang Eom, "An empirical study of I/O separation for burst buffers in HPC systems", Journal of Parallel and Distributed Computing, 2021, 148:96-108, doi: 10.1016/j.jpdc.2020.10.007
Chao Yang
2021
Karol Kowalski, Raymond Bair, Nicholas P. Bauman, Jeffery S. Boschen, Eric J. Bylaska, Jeff Daily, Wibe A. de Jong, Thom Dunning, Niranjan Govind, Robert J. Harrison, Murat Keceli, Kristopher Keipert, Sriram Krishnamoorthy, Suraj Kumar, Erdal Mutlu, Bruce Palmer, Ajay Panyala, Bo Peng, Ryan M. Richard, T. P. Straatsma, Peter Sushko, Edward F. Valeev, Marat Valiev, Hubertus J. J. van Dam, Jonathan M. Waldrop, David B. Williams-Young, Chao Yang, Marcin Zalewski, Theresa L. Windus, "From NWChem to NWChemEx: Evolving with the Computational Chemistry Landscape", Chemical Reviews, March 31, 2021, doi: 10.1021/acs.chemrev.0c00998
J. Goings, H. Hu, C. Yang, X. Li, "Reinforcement Learning Configuration Interaction", March 31, 2021,
R. Van Beeumen, L. Perisa, D. Kressner, C. Yang, "A Flexible Power Method for Solving Infinite Dimensional Tensor Eigenvalue Problems", January 30, 2021,
Jackie Zhi Yao
2022
Z. Yao, R. Jambunathan, Y. Zeng, and A. Nonaka, "A Massively Parallel Time-Domain Coupled Electrodynamics-Micromagnetics Solver", International Journal of High Performance Computing Applications, January 10, 2022, accepted,
2021
Meriam Gay Bautista, Zhi Jackie Yao, Anastasiia Butko, Mariam Kiran, Mekena Metcalf, "Towards Automated Superconducting Circuit Calibration using Deep Reinforcement Learning", 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Tampa, FL, USA, IEEE, August 23, 2021, pp. 462-46, doi: 10.1109/ISVLSI51109.2021.00091
Katherine Yelick
2021
Katherine A. Yelick, Amir Kamil, Damian Rouson, Dan Bonachea, Paul H. Hargrove, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications (SC21), Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), November 15, 2021,
UPC++ is a C++ library supporting Partitioned Global Address Space (PGAS) programming. UPC++ offers low-overhead one-sided Remote Memory Access (RMA) and Remote Procedure Calls (RPC), along with future/promise-based asynchrony to express dependencies between computation and asynchronous data movement. UPC++ supports simple/regular data structures as well as more elaborate distributed applications where communication is fine-grained and/or irregular. UPC++ provides a uniform abstraction for one-sided RMA between host and GPU/accelerator memories anywhere in the system. UPC++'s support for aggressive asynchrony enables applications to effectively overlap communication and reduce latency stalls, while the underlying GASNet-EX communication library delivers efficient low-overhead RMA/RPC on HPC networks.
This tutorial introduces UPC++, covering the memory and execution models and basic algorithm implementations. Participants gain hands-on experience incorporating UPC++ features into application proxy examples. We examine a few UPC++ applications with irregular communication (metagenomic assembler and COVID-19 simulation) and describe how they utilize UPC++ to optimize communication performance.
Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, Aydın Buluç, "BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 2021, doi: 10.1101/464420
Ed Younis, Koushik Sen, Katherine Yelick, Costin Iancu, QFAST: Quantum Synthesis Using a Hierarchical Continuous Circuit Space, Bulletin of the American Physical Society, March 2021,
We present QFAST, a quantum synthesis tool designed to produce short circuits and to scale well in practice. Our contributions are: 1) a novel representation of circuits able to encode placement and topology; 2) a hierarchical approach with an iterative refinement formulation that combines "coarse-grained" fast optimization during circuit structure search with a good, but slower, optimization stage only in the final circuit instantiation. When compared against state-of-the-art techniques, although not always optimal, QFAST can reduce circuits for "time-dependent evolution" algorithms, as used by domain scientists, by 60x in depth. On typical circuits, it provides 4x better depth reduction than the widely used Qiskit and UniversalQ compilers. We also show the composability and tunability of our formulation in terms of circuit depth and running time. For example, we show how to generate shorter circuits by plugging in the best available third party synthesis algorithm at a given hierarchy level. Composability enables portability across chip architectures, which is missing from similar approaches.
QFAST is integrated with Qiskit and available at github.com/bqskit.
O Selvitopi, B Brock, I Nisa, A Tripathy, K Yelick, A Buluç, "Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication", Proceedings of the International Conference on Supercomputing, January 2021, 431--442, doi: 10.1145/3447818.3461472
G Guidi, M Ellis, A Buluç, K Yelick, D Culler, "10 years later: Cloud computing is closing the performance gap", ICPE 2021 - Companion of the ACM/SPEC International Conference on Performance Engineering, January 1, 2021, 41--48, doi: 10.1145/3447545.3451183
Ed Younis
2022
Mathias Weiden, Justin Kalloor, John Kubiatowicz, Ed Younis, Costin Iancu, "Wide Quantum Circuit Optimization with Topology Aware Synthesis", Third International Workshop on Quantum Computing Software, November 13, 2022,
Unitary synthesis is an optimization technique that can achieve optimal gate counts while mapping quantum circuits to restrictive qubit topologies. Synthesis algorithms are limited in scalability by their exponentially growing run times. Application to wide circuits requires partitioning into smaller components. In this work, we explore methods to reduce depth and multi-qubit gate count of wide, mapped quantum circuits using synthesis. We present TopAS, a topology aware synthesis tool that preconditions quantum circuits before mapping. Partitioned subcircuits are optimized and fitted to sparse subtopologies to balance the opposing demands of synthesis and mapping algorithms. Compared to state of the art wide circuit synthesis algorithms, TopAS is able to reduce depth on average by 35.2% and CNOT count by 11.5% for mesh topologies. Compared to the optimization and mapping algorithms of Qiskit and Tket, TopAS is able to reduce CNOT counts by 30.3% and depth by 38.2% on average.
2021
Ed Younis, Koushik Sen, Katherine Yelick, Costin Iancu, QFAST: Quantum Synthesis Using a Hierarchical Continuous Circuit Space, Bulletin of the American Physical Society, March 2021,
We present QFAST, a quantum synthesis tool designed to produce short circuits and to scale well in practice. Our contributions are: 1) a novel representation of circuits able to encode placement and topology; 2) a hierarchical approach with an iterative refinement formulation that combines "coarse-grained" fast optimization during circuit structure search with a good, but slower, optimization stage only in the final circuit instantiation. When compared against state-of-the-art techniques, although not always optimal, QFAST can reduce circuits for "time-dependent evolution" algorithms, as used by domain scientists, by 60x in depth. On typical circuits, it provides 4x better depth reduction than the widely used Qiskit and UniversalQ compilers. We also show the composability and tunability of our formulation in terms of circuit depth and running time. For example, we show how to generate shorter circuits by plugging in the best available third party synthesis algorithm at a given hierarchy level. Composability enables portability across chip architectures, which is missing from similar approaches.
QFAST is integrated with Qiskit and available at github.com/bqskit.
Rafael Zamora-Resendiz
2023
Nathan A. Kimbrel, Allison E. Ashley-Koch, Xue J. Qin, Jennifer H. Lindquist, Melanie E. Garrett, Michelle F. Dennis, Lauren P. Hair, Jennifer E. Huffman, Daniel A. Jacobson, Ravi K. Madduri, Jodie A. Trafton, Hilary Coon, Anna R. Docherty, Niamh Mullins, Douglas M. Ruderfer, Philip D. Harvey, Benjamin H. McMahon, David W. Oslin, Jean C. Beckham, Elizabeth R. Hauser, Michael A. Hauser, Million Veteran Program Suicide Exemplar Workgroup, International Suicide Genetics Consortium, Veterans Affairs Mid-Atlantic Mental Illness Research Education and Clinical Center Workgroup, Veterans Affairs Million Veteran Program, "Identification of Novel, Replicable Genetic Risk Loci for Suicidal Thoughts and Behaviors Among US Military Veterans", JAMA Psychiatry, February 1, 2023, 80:100-191, doi: 10.1001/jamapsychiatry.2022.3896
2022
Destinee Morrow, Rafael Zamora-Resendiz, Jean C Beckham, Nathan A Kimbrel, David W Oslin, Suzanne Tamang, Million Veteran Program Suicide Exemplar Workgroup, Silvia Crivelli, "A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes", Journal of Psychiatric Research, July 1, 2022, 151:328-338, doi: 10.1016/j.jpsychires.2022.04.009
Wei Zhang
2021
Wei Zhang, Suren Byna, Hyogi Sim, Sangkeun Lee, Sudharshan Vazhkudai, and Yong Chen,, "Exploiting User Activeness for Data Retention in HPC Systems", International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21), November 21, 2021, doi: https://doi.org/10.1145/3458817.3476201
- Download File: 3458817.3476201-2.pdf (pdf: 1.5 MB)
Weiqun Zhang
2023
H. Klion, R. Jambunathan, M. E. Rowan, E. Yang, D. Willcox, J.-L. Vay, R. Lehe, A. Myers, A. Huebl, W. Zhang, "Particle-in-Cell Simulations of Relativistic Magnetic Reconnection with Advanced Maxwell Solver Algorithms", arXiv preprint, submitted to The Astrophysical Journal, April 20, 2023,
2021
Jean Sexton, Zarija Lukic, Ann Almgren, Chris Daley, Brian Friesen, Andrew Myers, and Weiqun Zhang, "Nyx: A Massively Parallel AMR Code for Computational Cosmology", The Journal Of Open Source Software, July 10, 2021,
Weiqun Zhang, Andrew Myers, Kevin Gott, Ann Almgren and John Bell, "AMReX: Block-Structured Adaptive Mesh Refinement for Multiphysics Applications", The International Journal of High Performance Computing Applications, June 12, 2021,
Jordan Musser, Ann S Almgren, William D Fullmer, Oscar Antepara, John B Bell, Johannes Blaschke, Kevin Gott, Andrew Myers, Roberto Porcu, Deepak Rangarajan, Michele Rosso, Weiqun Zhang, and Madhava Syamlal, "MFIX:Exa: A Path Towards Exascale CFD-DEM Simulations", The International Journal of High Performance Computing Applications, April 16, 2021,
Wibe Albert de Jong
2021
Karol Kowalski, Raymond Bair, Nicholas P. Bauman, Jeffery S. Boschen, Eric J. Bylaska, Jeff Daily, Wibe A. de Jong, Thom Dunning, Niranjan Govind, Robert J. Harrison, Murat Keceli, Kristopher Keipert, Sriram Krishnamoorthy, Suraj Kumar, Erdal Mutlu, Bruce Palmer, Ajay Panyala, Bo Peng, Ryan M. Richard, T. P. Straatsma, Peter Sushko, Edward F. Valeev, Marat Valiev, Hubertus J. J. van Dam, Jonathan M. Waldrop, David B. Williams-Young, Chao Yang, Marcin Zalewski, Theresa L. Windus, "From NWChem to NWChemEx: Evolving with the Computational Chemistry Landscape", Chemical Reviews, March 31, 2021, doi: 10.1021/acs.chemrev.0c00998
Other
2023
Nicholson Koukpaizan, Roofline Analysis using AMD Tools on AMD GPUs, ECP Annual Meeting, February 2023,
Neil Mehta, Roofline Performance Analysis on NVIDIA GPUs, ECP Annual Meeting, February 2023,
JaeHyuk Kwack, Roofline Performance Analysis w/Intel Advisor on Intel CPUs & GPUs, ECP Annual Meeting, February 2023,
2022
JaeHyuk Kwack, ROOFLINE PERFORMANCE ANALYSIS W/ INTEL ADVISOR ON INTEL CPUS & GPUS, ECP Annual Meeting, May 2022,
- Download File: ECP22-Roofline-4-Intel-and-ALCF.pdf (pdf: 14 MB)
Neil Mehta, Roofline on NVIDIA at NERSC, ECP Annual Meeting, May 2022,
- Download File: ECP22-Roofline-2-NVIDIA-and-NERSC.pdf (pdf: 2.6 MB)
S. Dhawan, A. Goobar, M. Smith, J. Johansson, M. Rigault, J. Nordin, R. Biswas, D. Goldstein, P. Nugent, Y. -L. Kim, A. A. Miller, M. J. Graham, M. Medford, M. M. Kasliwal, S. R. Kulkarni, Dmitry A. Duev, E. Bellm, P. Rosnet, R. Riddle, J. Sollerman, The Zwicky Transient Facility Type Ia supernova survey: first data release and results, Monthly Notices of the RAS, Pages: 2228-2241 2022, doi: 10.1093/mnras/stab3093
Yuan Qi Ni, Dae-Sik Moon, Maria R. Drout, Abigail Polin, David J. Sand, Santiago Gonz\ alez-Gait\ an, Sang Chul Kim, Youngdae Lee, Hong Soo Park, D. Andrew Howell, Peter E. Nugent, Anthony L. Piro, Peter J. Brown, Llu\ \is Galbany, Jamison Burke, Daichi Hiramatsu, Griffin Hosseinzadeh, Stefano Valenti, Niloufar Afsariardchi, Jennifer E. Andrews, John Antoniadis, Iair Arcavi, Rachael L. Beaton, K. Azalee Bostroem, Raymond G. Carlberg, S. Bradley Cenko, Sang-Mok Cha, Yize Dong, Avishay Gal-Yam, Joshua Haislip, Thomas W. -S. Holoien, Sean D. Johnson, Vladimir Kouprianov, Yongseok Lee, Christopher D. Matzner, Nidia Morrell, Curtis McCully, Giuliano Pignata, Daniel E. Reichart, Jeffrey Rich, Stuart D. Ryder, Nathan Smith, Samuel Wyatt, Sheng Yang, Infant-phase reddening by surface Fe-peak elements in a normal type Ia supernova, Nature Astronomy, 2022, doi: 10.1038/s41550-022-01603-4
Melissa L. Graham, Christoffer Fremling, Daniel A. Perley, Rahul Biswas, Christopher A. Phillips, Jesper Sollerman, Peter E. Nugent, Sarafina Nance, Suhail Dhawan, Jakob Nordin, Ariel Goobar, Adam Miller, James D. Neill, Xander J. Hall, Matthew J. Hankins, Dmitry A. Duev, Mansi M. Kasliwal, Mickael Rigault, Eric C. Bellm, David Hale, Przemek Mr\ oz, S. R. Kulkarni, Supernova siblings and their parent galaxies in the Zwicky Transient Facility Bright Transient Survey, Monthly Notices of the RAS, Pages: 241-254 2022, doi: 10.1093/mnras/stab3802
2021
Jonathan Madsen, Roofline Instrumentation with TiMemory, ECP Annual Meeting, April 2021,
- Download File: ECP21-Roofline-7-TiMemory.pdf (pdf: 490 KB)
Jonathan Madsen, Roofline Model using NSight Compute, ECP Annual Meeting, April 2021,
- Download File: ECP21-Roofline-3-NERSC.pdf (pdf: 4 MB)