Recent Publications
2025
J. Kim, A. Sim, K. Wu, J. Kim, "Improving Slow Transfer Predictions: Generative Methods Compared", IEEE International Conference on Computing, Networking and Communications (ICNC 2025), 2025,
B. Fan, A. Sim, K. Wu, J. Kim, "Conditional Recurrent Neural Networks for Enhancing Throughput Prediction and Slow File Transfers Detection in Large Science Workflows", 22nd IEEE Consumer Communications & Networking Conference (CCNC 2025), 2025,
M. Adams, P. Wang, J. Merson, K. Huck, M. Knepley, "A performance portable, fully implicit Landau collision operator with batched linear solvers", SIAM Journal on Scientific Computing, January 1, 2025,
- Download File: 3f390d41-6a05-4c32-8f76-3059b1c8c71a.pdf (pdf: 3.3 MB)
Modern accelerators use hierarchical parallel programming models that enable massive multithreading within a processing element (PE), with multiple PEs per device driven by traditional processes. Batching is a technique for exposing PE-level parallelism in algorithms that have traditionally run on MPI processes or multiple threads within a single process. Opportunities for batching arise in, for example, kinetic discretizations of magnetized plasmas where collisions are advanced in velocity space at each spatial point independently.
This paper builds on previous work on a high-performance, fully nonlinear, Landau collision operator by batching the linear solver, as well as batching the spatial point problems and adding new support for multiple grids for multiscale, multi-species problems. An anisotropic relaxation verification test that agrees well with previous published results and analytical models is presented. The performance results from NVIDIA A100 and AMD MI250X nodes are presented with hardware utilization analysis for each architecture. The entire implicit Landau operator time advance is implemented in Kokkos for performance portability, running entirely on the device and is available in the PETSc numerical library.
2024
Dan Bonachea, Katherine Rasmussen, Brad Richardson, Damian Rouson, "Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.5", Lawrence Berkeley National Laboratory Tech Report, December 2024, LBNL 2001636, doi: 10.25344/S4CG6G
This document specifies an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is primarily responsible for implementing coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, teams and collective subroutines. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF subroutines. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler's own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.
I. Mahmud, P. Zuk, C. Wang, M. Kiran, K. Wu, K. Thareja, K. Raghavan, A. Mandal, E. Deelman, "DISTRI: Development and Integration of Simulation Tools for Resilient Infrastructure", 5th International Workshop on Big Data & AI Tools, Models, and Use Cases for Innovative Scientific Discovery (BTSD), 2024,
B. Dong, A. Nayak, K. Wu, V. Tribaldos, J. Ajo-Franklin, Q. Zhang, S. Byna, F. Guo, P. Dobson, A. Sim, "TensorSearch: Parallel Similarity Search on Tensors", IEEE International Conference on Big Data (BigData), 2024,
- Download File: TensorSearch-final-version-paper.pdf (pdf: 6.2 MB)
Hyunju Oh, Wei Zhang, Christopher D. Rickett, Sreenivas R. Sukumar, Suren Byna, "Evaluating Performance Trade-offs of Caching Strategies for AI-Powered Querying Systems", 2024 IEEE International Conference on Big Data (IEEE BigData 2024), Washington DC, USA, 2024,
Camera-ready in preparation
Xuan Jiang, Raja Sengupta, James Demmel, Samuel Williams, "Large scale multi-GPU based parallel traffic simulation for accelerated traffic assignment and propagation", Transportation Research Part C: Emerging Technologies, December 2024, 169:104873, doi: 10.1016/j.trc.2024.104873
Jean Luca Bez, Analyzing Parallel I/O, ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), BoF, 2024,
P. Zuk, H. Jin, I. Mahmud, K. Raghavan, K. Thareja, S. Wu, P. Balaprakash, F, Cappello, Z. Chen, E. Deelman, S. Di, A. Hamade, M. Kiran, A. Mandal, E. Scott, C. Wang, K. Wu, SWARM: Scientific Workflow Applications on Resilient Metasystem, ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), BoF, 2024,
Jean Luca Bez, Drishti: I/O Insights for All, ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), 2024,
P. Zuk, H. Jin, I. Mahmud, K. Raghavan, K. Thareja, S. Wu, P. Balaprakash, F, Cappello, Z. Chen, E. Deelman, S. Di, A. Hamade, M. Kiran, A. Mandal, E. Scott, C. Wang, K. Wu, "SWARM: Scientific Workflow Applications on Resilient Metasystem", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), 2024,
Jean Luca Bez, IO500: The High-Performance Storage Community, ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), BoF, 2024,
E. Wang, A. Sim, K. Wu, "Comparing Cache Utilization Trends for Regional Scientific Caches with Transfer Learning Models", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), ACM Student Research Competition (SRC), 2024,
M. Sudarshan, A. Sim, K. Wu, "Predicting Dataset Popularity for Improved Distributed Content Caching in High Energy Physics", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), ACM Student Research Competition (SRC), 2024,
Rajeev Jain, Houjun Tang, Akash Dhruv, Suren Byna, "Enabling Data Reduction for Flash-X Simulations", 10th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD), 2024,
Junmin Gu, John Wu, Paul Lin, CS Chang, Seong-Hoe Ku, Stephane Ethier, Jong Choi, Accurate in-situ in-transit analysis of particle diffusion for large-scale tokamak simulation, ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), 2024,
V. Lakshminarayana, C. Oguchi, A. Sim, K. Wu, D. Ghosal, "A Study of a Deterministic Networking Framework for Latency Critical Large Scientific Data Transfers", 11th Annual International Workshop on Innovating the Network for Data-Intensive Science (INDIS 2024), 2024,
Dan Bonachea, Katherine Rasmussen, Brad Richardson, Damian Rouson, "Parallel Runtime Interface for Fortran (PRIF): A Multi-Image Solution for LLVM Flang", Tenth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC2024), Atlanta, Georgia, USA, IEEE, November 2024, doi: 10.25344/S4N017
- Download File: LLVM-HPC24_PRIF_Slides.pdf (pdf: 975 KB)
Fortran compilers that provide support for Fortran’s native parallel features often do so with a runtime library that depends on details of both the compiler implementation and the communication library, while others provide limited or no support at all. This paper introduces a new generalized interface that is both compiler- and runtime-library-agnostic, providing flexibility while fully supporting all of Fortran’s parallel features. The Parallel Runtime Interface for Fortran (PRIF) was developed to be portable across shared- and distributed-memory systems, with varying operating systems, toolchains and architectures. It achieves this by defining a set of Fortran procedures corresponding to each of the parallel features defined in the Fortran standard that may be invoked by a Fortran compiler and implemented by a runtime library. PRIF aims to be used as the solution for LLVM Flang to provide parallel Fortran support. This paper also briefly describes our PRIF prototype implementation: Caffeine.
Jean Luca Bez, Suren Byna, "Exploring the Proactive Data Containers Runtime System in VAST - A Case Study", 9th International Parallel Data Systems Workshop (PDSW), 2024,
Wei Zhang, Houjun Tang, Suren Byna, "BULKI - Binary Unified Layout for Key-value Interchange", 9th International Parallel Data Systems Workshop (PDSW), 2024,
Damian Rouson, Baboucarr Dibba, Katherine Rasmussen, Brad Richardson, David Torres, Yunhao Zhang, Ethan Gutmann, Kareem Ergawy, Michael Klemm, Sameer Shende, Just Write Fortran: Experiences with a Language-Based Alternative to MPI+X, Talk at IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2024, doi: 10.25344/S4H88D
Fortran 2023, with its "do concurrent" and coarray parallel programming features, displaces many uses of extra-language parallel programming models such as MPI, OpenMP, and OpenACC. The Cray, Intel, LFortran, LLVM, and NVIDIA compilers automatically parallelize do concurrent in shared memory. The Cray, Intel, and GNU compilers support coarrays in shared- and distributed-memory, while the NAG compiler supports coarrays in shared memory. Thus, language-based parallelism is emerging as a portable alternative to MPI+X.
This talk will present experiences with automatic "do concurrent" parallelization in the deep learning library Inference-Engine and coarray communication in the Intermediate Complexity Atmospheric Research (ICAR), respectively.
M. Schreyer, T. Sattarov, A. Sim, K. Wu, "Imb-FinDiff: Conditional Diffusion Models for Class Imbalance Synthesis of Financial Tabular Data", 5th ACM International Conference on AI in Finance (ICAIF'24), 2024, doi: 10.1145/3677052.3698659
Jhe-Yu Liou, Muaaz Awan, Kirtus Leyba, Petr Sulc, Steven Hofmeyr, Carole-Jean Wu, Stephanice Forrest, "Evolving to find optimizations humans miss: using evolutionary computation to improve GPU code for bioinformatics applications", ACM Transactions on Evolutionary Learning and Optimization, November 15, 2024, doi: 10.1145/3703920
Sterling Smith, Zichuan Anthony Xing, Torrin Bechtel, Severin Denk, Earl DeShazer, Orso Meneghini, Tom Neiser, Laurie Stephey, Oscar Antepara, Christopher Mitchell Clark, Eli Dart, Pengfei Ding, Sean Flanagan, Raffi Nazikian, David Schissel, Christine Simpson, Nicholas Tyler, Thomas D. Uram, Samuel Williams, "Expediting Higher Fidelity Plasma State Reconstructions for the DIII-D National Fusion Facility Using Leadership Class Computing Resources", Extreme-Scale Experiment-in-the-Loop Computing (XLOOP), November 2024,
Oscar Antepara, Samuel Williams, Max Carlson, Jerry Watkins, "Performance Portable Optimizations of an Ice-sheet Modeling Code on GPU-supercomputers", Performance, Portability & Productivity in HPC (P3HPC), November 2024,
Oscar Antepara, Samuel Williams, Hans Johansen, Mary Hall, "High-Performance, Scalable Geometric Multigrid via Fine-Grain Data Blocking for GPUs", Performance, Portability & Productivity in HPC (P3HPC), November 10, 2024,
Brian Austin, Dhruva Kulkarni, Brandon Cook, Samuel Williams, Nicholas J. Wright, "System-Wide Roofline Profiling - a Case Study on NERSC’s Perlmutter Supercomputer", Performance Modeling, Benchmarking, and Simulation (PMBS), November 2024,
Shashank Subramanian, Ermal Rrapaj, Peter Harrington, Smeet Chheda, Steven Farrell, Brian Austin, Samuel Williams, Nicholas Wright, Wahid Bhimji, "Comprehensive Performance Modeling and System Design Insights for Foundation Models", Performance Modeling, Benchmarking, and Simulation (PMBS), November 2024,
Nan Ding, Brian Austin, Yang Liu, Neil Mehta, Steven Farrell, Johannes P. Blaschke, Leonid Oliker, Hai Ah Nam, Nicholas J. Wright, Samuel Williams, "A Workflow Roofline Model for End-to-End Workflow Performance Analysis", Supercomputing (SC), November 2024,
- Download File: Workflow_roofline-6.pdf (pdf: 1.2 MB)
Sean R Miller, Matthew Schipper, Lars G Fritsche, Ralph Jiang, Garth Strohbehn, Erkin Ötleş, Benjamin H McMahon, Silvia Crivelli, Rafael Zamora‐Resendiz, Nithya Ramnath, Shinjae Yoo, Xin Dai, Kamya Sankar, Donna M Edwards, Steven G Allen, Michael D Green, Alex K Bryant, "Pan‐Cancer Survival Impact of Immune Checkpoint Inhibitors in a National Healthcare System", November 7, 2024,
A. Sim, E. Wang, R. Monga, J. Balcas, K. Wu, C. Guok, I. Monga, D. Davila, F. Wurthwein, H. Newman, Comparing Cache Utilization Trends for Regional Data Caches, 27th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2024), 2024,
J. Aldrich, A. Sim, K. Wu, S. Yoo, H. Ito, V. Garonne, E. Lancon, "Exploring Data Caching Policy with Data Access Patterns from dCache Logs", 27th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2024), 2024,
M Scot Breitenfeld, Houjun Tang, Huihuo Zheng, Jordan Henderson, Suren Byna, "HDF5 in the Exascale Era: Delivering Efficient and Scalable Parallel I/O for Exascale Applications", The International Journal of High Performance Computing Applications, October 16, 2024, doi: 10.1177/10943420241288244
Katherine Rasmussen, Damian Rouson, Dan Bonachea, Brad Richardson, "A Full-Stack Exploration of Language-Based Parallelism in Fortran 2023", Poster at CARLA2024: Latin America High Performance Computing Conference, September 30, 2024, doi: 10.25344/S4RP5K
This poster explores native parallel features in Fortran 2023 through the lens of supporting applications with libraries, compilers, and parallel runtimes. The language revision informally named Fortran 2008 introduced parallelism in the form of Single Program Multiple Data (SPMD) execution with two broad feature sets: (1) loop-level parallelism via do concurrent and (2) a Partitioned Global Address Space (PGAS) comprised of distributed “coarray” data structures. Fortran’s native parallelism has demonstrated high performance [1] and reduced the burden of inserting what sometimes amounts to more directives than code. Several compilers support both feature sets, typically by translating do concurrent into serial do loops annotated by parallel directives and by translating SPMD/PGAS features into direct calls to a communication library. Our research focuses primarily on two questions: (1) can the compiler’s parallel runtime library be developed in the language being compiled (Fortran) and (2) can we define an interface to the runtime that liberates compilers from being hardwired to one runtime and vice versa. We are answering these questions by developing the Parallel Runtime Interface for Fortran (PRIF) [2] and the Co-Array Fortran Framework of Efficient Interfaces to Network Environments (Caffeine) [3]. Caffeine is initially targeting adoption by LLVM Flang, a new open-source Fortran compiler developed by a broad community in industry, academia, and government labs. We are also exploring the use of these features in Inference-Engine, a deep learning library designed to facilitate neural network training and inference for high-performance computing applications written in modern Fortran.
Jie Li, George Michelogiannakis, Samuel Maloney, Brandon Cook, Estela Suarez, John Shalf, "Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources", IEEE International Conference on Cluster Computing (CLUSTER), September 2024, doi: 10.1109/CLUSTER59578.2024.00033
Leyba K, Hofmeyr S, Forrest S, Cannon J, Moses M, "SIMCoV-GPU: Accelerating an Agent-Based Model for Exascale", HPDC '24, August 30, 2024, doi: 10.1145/3625549.3658692
Mahesh Lakshminarasimhan, Oscar Antepara, Tuowen Zhao, Benjamin Sepanski, Protonu Basu, Hans Johansen, Mary Hall, Samuel Williams, "Bricks: A high-performance portability layer for computations on block-structured grids", The International Journal of High Performance Computing Applications (IJHPCA), August 19, 2024, doi: 10.1177/1094342024126828
Shakila Shafiq, Md. Sazzadur Rahman, Shamim Ahmed Shaon, Imtiaz Mahmud, A. S. M. Sanwar Hosen, "A Review on Software-Defined Networking for Internet of Things Inclusive of Distributed Computing, Blockchain, and Mobile Network Technology: Basics, Trends, Challenges, and Future Research Potentials", International Journal of Distributed Sensor Networks, August 13, 2024, doi: 10.1155/2024/9006405
Mahesh Lakshminarasimhan, Mary Hall, Samuel Williams, Oscar Antepara, "BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs", Proceedings of the 53rd International Conference on Parallel Processing (ICPP), August 12, 2024,
- Download File: ICPP24_BrickDL_final-v2.pdf (pdf: 1.7 MB)
David J. Torres, Damian Rouson, "Investigating the ecological fallacy through sampling distributions constructed from finite populations", Monte Carlo Methods and Applications, August 2024, doi: 10.1515/mcma-2024-2013
Correlation coefficients and linear regression values computed from group averages can differ from correlation coefficients and linear regression values computed using individual scores. This observation known as the ecological fallacy often assumes that all the individual scores are available from a population. In many situations, one must use a sample from the larger population. In such cases, the computed correlation coefficient and linear regression values will depend on the sample that is chosen and the underlying sampling distribution. The sampling distribution of correlation coefficients and linear regression values for group averages will be identical to the sampling distribution for individuals for normally distributed variables for random samples drawn from infinitely large continuous distributions. However, data that is acquired in practice is often acquired when sampling without replacement from a finite population. Our objective is to demonstrate through Monte Carlo simulations that the sampling distributions for correlation and linear regression will also be similar for individuals and group averages when sampling without replacement from normally distributed variables. These simulations suggest that when a random sample from a population is selected, the correlation coefficients and linear regression values computed from individual scores will not be more accurate in estimating the entire population values compared to samples when group averages are used as long as the sample size is the same.
David McCallen, Arben Pitarka, Houjun Tang, Ramesh Pankajakshan, Anders Petersson, Mamun Miah, "Transformational Regional-Scale Earthquake Simulations with the DOE EarthQuake SIMulation Exascale Framework", Scientific Impact of the Exascale Computing Project (ECP), August 1, 2024, doi: 10.1109/MCSE.2024.3397768
Will Thacher, Hans Johansen, Daniel Martin, "A high order cut-cell method for solving the shallow-shelf equations", Journal of Computational Science, August 1, 2024, 80, doi: 10.1016/j.jocs.2024.102319
Alexander V. Dudchenko, Oluwamayowa O. Amusat, "Neural Networks for Prediction of Complex Chemistry in Water Treatment Process Optimization", Proceedings of the 10th International Conference on Foundations of Computer-Aided Process Design (FOCAPD 2024), Denver, PSE Press, July 19, 2024, 3:267-274, doi: 10.69997/sct.107047
Samuele Ferracin, Akel Hashim, Jean-Loup Ville, Ravi Naik, Arnaud Carignan-Dugas, Hammam Qassim, Alexis Morvan, David I. Santiago, Irfan Siddiqi, Joel J. Wallman, "Efficiently improving the performance of noisy quantum computers", Quantum, 2024, 8:1410, doi: 10.22331/q-2024-07-15-1410
Dan Bonachea, Katherine Rasmussen, Brad Richardson, Damian Rouson, "Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.4", Lawrence Berkeley National Laboratory Tech Report, July 12, 2024, LBNL 2001604, doi: 10.25344/S4WG64
This document specifies an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is responsible for coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, and teams. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF procedures. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler’s own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.
Hiniduma, K., Byna, S., Bez, J. L., Madduri, R., "AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI", 36th International Conference on Scientific and Statistical Database Management (SSDBM 2024), 2024,
J. Gu, P. Lin, K. Wu, S.-H. Ku, C.S Chang, R. Hager, A. Scheinberg, J. Choi, "Efficient Streaming Analysis of High-Resolution Plasma Transport", 36th International Conference on Scientific and Statistical Database Management (SSDBM 2024), 2024,
Oluwamayowa O. Amusat, Alexander V. Dudchenko, Adam A. Atia, Timothy Bartholomew, "Cost-optimal Selection of pH Control for Mineral Scaling Prevention in High Recovery Reverse Osmosis Desalination", Proceedings of the 10th International Conference on Foundations of Computer-Aided Process Design (FOCAPD 2024), Denver, PSE Press, July 9, 2024, 3:253-260, doi: 10.69997/sct.143335
Egersdoerfer, C., Sareen, Arnav., Bez, J. L., Byna, S., Dai, D., "ION: Navigating HPC I/O Optimization Journey using Large Language Models", 16th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage'24), 2024, doi: 10.1145/3655038.3665950
David Trebotich, Randolph R Settgast, Terry Ligocki, William Tobin, Gregory H Miller, Sergi Molins, Carl I Steefel, "A multiphysics coupling framework for exascale simulation of subsurface fracture evolution", Frontiers in High Performance Computing, June 30, 2024, 2, doi: 10.3389/fhpcp.2024.1416727
- Download File: FrontiersHPC2024.pdf (pdf: 1.4 MB)
John Bachan, Jianlan Ye, Xuan Jiang, Tan Nguyen, Mahesh Natarajan, Maximilian Bremer, Cy Chan, "Devastator: A Scalable Parallel Discrete Event Simulation Framework for Modern C++", In 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM PADS ’24), June 24, 2024,
Alex K Bryant, Rafael Zamora‐Resendiz, Xin Dai, Destinee Morrow, Yuewei Lin, Kassidy M Jungles, James M Rae, Akshay Tate, Ashley N Pearson, Ralph Jiang, Lars Fritsche, Theodore S Lawrence, Weiping Zou, Matthew Schipper, Nithya Ramnath, Shinjae Yoo, Silvia Crivelli, Michael D Green, "Artificial intelligence to unlock real‐world evidence in clinical oncology: A primer on recent advances", Cancer Medicine, June 20, 2024, doi: https://doi.org/10.1002/cam4.7253
Dan Bonachea, Katherine Rasmussen, Brad Richardson, Damian Rouson, Parallel Runtime Interface for Fortran (PRIF): A Compiler/Runtime-Library Agnostic Interface to Support the Parallel Features of Fortran 2023, Platform for Advanced Scientific Computing (PASC) Modern Fortran Minisymposium, June 5, 2024,
- Download File: PRIF-PASC24.pdf (pdf: 1.6 MB)
Fortran 2023 natively supports single-program, multiple-data parallel programming with a partitioned global address space and collective subroutines, synchronization, atomics, locks, and more. Each of the four actively developed compilers that support Fortran’s parallel features uses its own parallel runtime library. The Parallel Runtime Interface for Fortran (PRIF) proposes to liberate compiler development from reliance on a single runtime and empower runtime developers to support more than one compiler. PRIF also aims to broaden the community of runtime developers to include the Fortran compiler’s users: Fortran programmers. PRIF does so by specifying the interface in Fortran, which makes it attractive to write the parallel runtime library in Fortran. Additionally, PRIF has been designed to be portable across both shared and distributed memory, varying architectures, as well as different operating systems. In this talk, I will describe the motivation behind the development of PRIF, describe the design of the interface itself and the benefits of adopting it. I will also provide a brief status report on the first PRIF implementation: Caffeine.
Nan Ding, Pieter Maris, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, LeAnn Lindsey, Christopher Daley, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, Samuel Williams, "Evaluating the potential of disaggregated memory systems for HPC applications", Concurrency and Computation, Practice and Experience (CCPE), May 2024, doi: https://doi.org/10.1002/cpe.8147
Damian Rouson, What Happens to a Dream Deferred? Chasing Automatic Offloading in Fortran 2023, Keynote Talk at the Nineteenth International Workshop on Automatic Performance Tuning (iWAPT 2024), May 31, 2024,
- Download File: iWAPT-2024-Keynote.pdf (pdf: 6.7 MB)
In 1951, Harlem Renaissance poet Langston Hughes asked this talk's titular question at the outset of a poem entitled "Harlem." Six years later, IBM mathematician John Backus developed Fortran, the world's first widely used high-level programming language. Backus went on to explore functional programming and to highlight the functional style in his Turing Award lecture in 1977, a year that also demarcates what one might consider the end of the classical era of Fortran. This talk will demonstrate how modern Fortran began to deliver on Backus's functional programming dream, starting with pure procedures in the 1995 standard. The talk will further demonstrate how this style culminated in a powerful and flexible facility for expressing independent iterations via the "do concurrent" construct, which the Fortran standard committee included in Fortran 2008 with the intention to facilitate automatic Graphics Processing Unit (GPU) programming. Fortran 2008 was published in 2010, but it took another decade for compilers to deliver on the promise of automatic GPU offloading. This talk will detail the trials and tribulations of Berkeley Lab's Fortran team in chasing the automatic offloading dream in our Inference-Engine deep learning library and Matcha high-performance computing (HPC) application.
Bin Dong, Kesheng Wu, Suren Byna, "The Art of Sparsity: Mastering High-Dimensional Tensor Storage", 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 27, 2024,
- Download File: sci_data_sparse_update.pdf (pdf: 473 KB)
Jie Li, George Michelogiannakis, Brandon Cook, John Shalf, Yong Chen, "Scheduling and Allocation of Disaggregated Memory Resources in HPC Systems", IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, May 2024,
Hammad Ather, Jean Luca Bez, Yankun Xia, Suren Byna, "Drilling Down I/O Bottlenecks with Cross-layer I/O Profile Exploration", 38th IEEE International Parallel & Distributed Processing Symposium, San Francisco, CA, USA, May 27, 2024,
Neeraj Rajesh, Keith Bateman, Jean Luca Bez, Suren Byna, Anthony Kougkas, Xian-He Sun, "TunIO: An AI-powered Framework for Optimizing HPC I/O", 38th IEEE International Parallel & Distributed Processing Symposium, San Fransicso, CA, US, May 27, 2024,
D.K. Sung, Y. Son, A. Sim, K. Wu, S. Byna, H. Tang, H. Eom, C. Kim, S. Kim, "A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis", 38th IEEE International Parallel & Distributed Processing Symposium (IPDPS2024), 2024,
Sergi Molins, David Trebotich, Carl I. Steefel, "Approaches for the simulation of coupled processes in evolving fractured porous media enabled by exascale computing", Computing in Science & Engineering, May 23, 2024, doi: 10.1109/MCSE.2024.3403983
- Download File: CiSE2024.pdf (pdf: 6.6 MB)
Dan Bonachea, Paul H. Hargrove, "GASNet-EX Specification Collection, Revision 2024.5.0", Lawrence Berkeley National Laboratory Tech Report, May 2024, LBNL 2001595, doi: 10.25344/S4160B
GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in emerging exascale systems. It provides network-independent, high-performance communication primitives including Remote Memory Access (RMA) and Active Messages (AM). GASNet-EX is an evolution of the popular GASNet communication system, building upon over 20 years of lessons learned, and the primary goals are high performance, interface portability, and expressiveness. The library has been used to implement parallel programming models and libraries such as UPC, UPC++, Fortran coarrays, Legion, Chapel, and many others.
This anthology collects together the four separate volumes that currently comprise the GASNet-EX specification, as of the 2024.5.0 release of GASNet-EX.
Wei Zhang, Houjun Tang, Suren Byna, "IDIOMS: Index-powered Distributed Object-centric Metadata Search for Scientific Data Management", The 24th IEEE/ACM international Symposium on Cluster, Cloud and Internet Computing. Philadelphia, 2024 (CCGrid 2024), Philadelphia, PA, USA, IEEE, May 9, 2024, doi: 10.1109/CCGrid59990.2024.00072
- Download File: 956600a598.pdf (pdf: 782 KB)
D McCallen, A Pitarka, H Tang, R Pankajakshan, NA Petersson, M Miah, "Transformational Regional-Scale Earthquake Simulations with the DOE EarthQuake SIMulation (EQSIM) Exascale Framework", Computing in Science & Engineering, May 8, 2024, doi: 10.1109/MCSE.2024.3397768
David McCallen, Arben Pitarka, Houjun Tang, Ramesh Pankajakshan, N Anders Petersson, Mamun Miah, Junfei Huang, "Regional-scale fault-to-structure earthquake simulations with the EQSIM framework: Workflow maturation and computational performance on GPU-accelerated exascale platforms", Earthquake Spectra, May 3, 2024, 40(3):1615-1652, doi: 10.1177/87552930241246235
Dan Bonachea, Katherine Rasmussen, Brad Richardson, Damian Rouson, "Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.3", Lawrence Berkeley National Laboratory Tech Report, May 3, 2024, LBNL 2001590, doi: 10.25344/S4501W
This document specifies an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is responsible for coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, and teams. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF procedures. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler’s own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.
Lois Curfman McInnes, Paige Kinsley, Daniel Martin, Suzanne Parete-Koon, Sreeranjani (Jini) Ramprakash, "Building a Diverse and Inclusive HPC Community for Mission-Driven Team Science", Computing in Science & Engineering, April 12, 2024, 25:5:31-38, doi: 10.1109/MCSE.2023.3348943
Patricia Gonzalez-Guerrero, Κylie Huch, Nirmalendu Patra, Thom Popovici, George Michelogiannakis, "Towards practical superconducting accelerators for machine learning using U-SFQ", ACM Journal on Emerging Technologies in Computing Systems, April 2024,
Ankur Agrawal, Akash V. Dixit, Tanay Roy, Srivatsan Chakram, Kevin He, Ravi K. Naik, David I. Schuster, Aaron Chou, "Stimulated Emission of Signal Photons from Dark Matter Waves", Physical Review Letters, 2024, 132:140801, doi: 10.1103/PhysRevLett.132.140801
Hofmeyr S, Buluç A, Riley R, Egan R, Selvitopi O, Oliker L, Yelick K, Shakya M, Youtsey B, Azad A, "Exabiome: Advancing Microbial Science through Exascale Computing", Computing in Science & Engineering, April 1, 2024, doi: 10.1109/MCSE.2024.3402546
Brad Richardson, Damian Rouson, Harris Snyder, Robert Singleterry, "Scheduling and Performance of Asynchronous Tasks in Fortran 2018 with FEATS", SN Computer Science, March 2024, 5 (354), doi: 10.1007/s42979-024-02682-y
Most parallel scientific programs contain compiler directives (pragmas) such as those from OpenMP, explicit calls to runtime library procedures such as those implementing the Message Passing Interface (MPI), or compiler-specific language extensions such as those provided by CUDA. By contrast, the recent Fortran standards empower developers to express parallel algorithms without directly referencing lower-level parallel programming models. Fortran’s parallel features place the language within the Partitioned Global Address Space (PGAS) class of programming models. When writing programs that exploit data parallelism, application developers often find it straightforward to develop custom parallel algorithms. Problems involving complex, heterogeneous, staged calculations, however, pose much greater challenges. Such applications require careful coordination of tasks in a manner that respects dependencies prescribed by a directed acyclic graph. When rolling one’s own solution proves difficult, extending a customizable framework becomes attractive. The paper presents the design, implementation, and use of the Framework for Extensible Asynchronous Task Scheduling (FEATS), which we believe to be the first task scheduling tool written in modern Fortran. We describe the benefits and compromises associated with choosing Fortran as the implementation language, and we propose ways in which future Fortran standards can best support the use case in this paper.
L. Zhou, Q. Lin, K. Chowdhury, S. Masood, A. Eichenberger, H. Min, A. Sim, J. Wang, Y. Wang, K. Wu, B. Yuan, J. Zou, "Serving Deep Learning Model in Relational Databases", 27th International Conference on Extending Database Technology (EDBT2024), 2024,
Oluwamayowa Amusat, Adam Atia, Timothy Bartholomew, Alexander Dudchenko, Cost-Optimization of Process-Scale Desalination Systems Incorporating Surrogate-based Water Chemistry Models, INFORMS Optimization Society Conference, March 22, 2024,
Imtiaz Mahmud, Mariam Kiran, Ewa Deelman, Anirban Mandal, Prasanna Balaprakash, Krishnan Raghavan, Hongwei Jin, Cong Wang, Komal Thareja, George Papadimitriou, Investigating BBRv3’s Performance in Large Science File Transfer on FABRIC, KNIT’8 Workshop, San Diego, CA, USA, March 19, 2024,
R. Frehner, K. Wu, A. Sim, J. Kim, K. Stockinger, "Detecting Anomalies in Time Series Using Kernel Density Approaches", IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3371891
R. Han, M, Zheng, S. Byna, H. Tang, B. Dong, D. Dai, Y. Chen, D. Kim, J. Hassoun, D. Thorsley, M. Wolf, "PROV-IO: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems", IEEE Transactions on Parallel and Distributed Systems, March 14, 2024,
Oluwamayowa O Amusat, Adam A Atia, Alexander V Dudchenko, Timothy V Bartholomew, "Modeling Framework for Cost Optimization of Process-Scale Desalination Systems with Mineral Scaling and Precipitation", ACS ES&T Engineering, March 8, 2024, doi: 10.1021/acsestengg.3c00537
Jan Balewski, Mercy G Amankwah, Roel Van Beeumen, E Wes Bethel, Talita Perciano, Daan Camps, "Quantum-parallel vectorized data encodings and computations on trapped-ion and transmon QPUs", Journal, February 10, 2024, 14, doi: https://doi.org/10.1038/s41598-024-53720-x
David Trebotich, "Exascale CFD in Heterogeneous Systems", Journal of Fluids Engineering, February 9, 2024, 146(4):041104, doi: 10.1115/1.4064534
- Download File: FE-23-1357_AuthorProof.pdf (pdf: 1.5 MB)
George Michelogiannakis, John Shalf, Chiplets for HPC, OCP Summit, February 6, 2024,
- Download File: georgem_hpc.pptx.pdf (pdf: 6.3 MB)
Jean Luca Bez, Houjun Tang, Scot Breitenfeld, Huihuo Zheng, Wei-Keng Liao, Kaiyuan Hou, Zanhua Huang, Suren Byna, "h5bench: Exploring HDF5 Access Patterns Performance in Pre-Exascale Platforms", Concurrency and Computation: Practice and Experience (CCPE), January 31, 2024,
Sayera Dhaubhadel, Kumkum Ganguly, Ruy M Ribeiro, Judith D Cohn, James M Hyman, Nicolas W Hengartner, Beauty Kolade, Anna Singley, Tanmoy Bhattacharya, Patrick Finley, Drew Levin, Haedi Thelen, Kelly Cho, Lauren Costa, Yuk-Lam Ho, Amy C Justice, John Pestian, Daniel Santel, Rafael Zamora-Resendiz, Silvia Crivelli, Suzanne Tamang, Susana Martins, Jodie Trafton, David W Oslin, Jean C Beckham, Nathan A Kimbrel, Benjamin H McMahon, "High dimensional predictions of suicide risk in 4.2 million US Veterans using ensemble transfer learning", scientific reports, January 20, 2024,
Long B Nguyen, Yosep Kim, Akel Hashim, Noah Goss, Brian Marinelli, Bibek Bhandari, Debmalya Das, Ravi K Naik, John Mark Kreikebaum, Andrew N Jordan, others, "Programmable Heisenberg interactions between Floquet qubits", Nature Physics, 2024, 20:240-246, doi: 10.1038/s41567-023-02326-7
Zhe Bai, Abdelilah Essiari, Talita Perciano, Kristofer E Bouchard, "AutoCT: Automated CT registration, segmentation, and quantification", Software X, January 5, 2024, 26, doi: https://doi.org/10.1016/j.softx.2024.101673
Oliver T, Varghese N, Roux S, Schulz F, Huntemann M, Clum A, Foster B, Foster B, Riley R, LaButti K, Egan R, Hajek P, Mukherjee S, Ovchinnikova G, Reddy TBK, Calhoun S, Hayes RD, Rohwer RR, Zhou Z, Daum C, Copeland A, Chen I-MA, Ivanova NN, Kyrpides NC, Mouncey NJ, del Rio TG, Grigoriev IV, Hofmeyr S, Oliker L, Yelick K, Anantharaman K, McMahon KD, Woyke T, Eloe-Fadrosh EA, "Coassembly and binning of a twenty-year metagenomic time-series from Lake Mendota", Nature Scientific Data, January 1, 2024, doi: 10.1038/S41597-024-03826-8
2023
Damian Rouson, Brad Richardson, Dan Bonachea, Katherine Rasmussen, "Parallel Runtime Interface for Fortran (PRIF) Design Document, Revision 0.2", Lawrence Berkeley National Laboratory Tech Report, December 20, 2023, LBNL 2001563, doi: 10.25344/S4DG6S
This design document proposes an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is responsible for coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, and teams. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF procedures. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler’s own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2023.9.0", Lawrence Berkeley National Laboratory Tech Report LBNL-2001561, December 2023, doi: 10.25344/S4J592
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
C. M. Oguchi, D. Ghosal, A. Sim, K. Wu, "Counterfactual Analysis: A Case Study on Impact of External Events on Building Energy Consumption", International Workshop on Big Data Analytics for Sustainability (BDA4S), 2023,
A, Sharma, X. Li, H. Guan, G. Sun, L. Zhang, L. Wang, K. Wu, L. Cao, E. Zhu, A. Sim, T. Wu, J. Zou, "Automatic Data Transformation Using Large Language Model – An Experimental Study on Building Energy Data", IEEE International Conference on Big Data (BigData), 2023,
C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, D. Hazen, F. Würthwein, D. Davila, H. Newman, J. Balcas, "Predicting Resource Utilization Trends with Southern California Petabyte Scale Cache", 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP2023), 2023, doi: 10.1051/epjconf/202429501044
J. Bellavita, C. Sim, K. Wu, A. Sim, S. Yoo, H. Ito, V. Garonne, E. Lancon, "Understanding Data Access Patterns for dCache System", 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP2023), 2023, doi: 10.1051/epjconf/202429501053
A. Sim, E. Kissel, D. Hazen, C. Guok, "Experiences in deploying in-network data caches", 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP2023), 2023, doi: 10.1051/epjconf/202429507018
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.9.0", Lawrence Berkeley National Laboratory Tech Report LBNL-2001560, December 2023, doi: 10.25344/S4P01J
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes. UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Daniel F. Martin, Steven B. Roberts, Hans Johansen, David J Gardner, Carol S Woodward, "Impacts of improved time evolution in BISICLES using SUNDIALS", December 14, 2023,
- Download File: AGU2023Sundials.pdf (pdf: 1 MB)
Duncan Carpenter, Anjali Sandip, Samuel Kachuck, Daniel Martin, "Does Damaged Ice affect Ice Sheet Evolution?", American Geophysical Union Fall Meeting, December 14, 2023,
- Download File: CarpenterAGU2023.pdf (pdf: 3.1 MB)
J. W. Chung, A. Sim, B. Quiter, Y. Wu, W. Zhao, K. Wu, "Preparing Spectral Data for Machine Learning: A Study of Geological Classification from Aerial Surveys", Machine Learning and the Physical Sciences Workshop (ML4PS), 2023,
Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "PSQS: Parallel Semantic Querying Service for Self-describing File Formats", 2023 IEEE International Conference on Big Data (BigData), December 1, 2023, doi: 10.1109/BigData59044.2023.10386205
Hamza Errahmouni Barkam, Sanggeon Yun, Hanning Chen, Paul Gensler, Albi Mema, Andrew Ding, George Michelogiannakis, Hussam Amrouch, Mohsen Imani, "Reliable hyperdimensional reasoning on unreliable emerging technologies", IEEE/ACM International Conference on Computer Aided Design (ICCAD), November 2023,
George Michelogiannakis, Yehia Arafa, Brandon Cook, Liang Yuan Dai, Abdel-Hameed Hameed Badawy, Madeleine Glick, Yuyang Wang, Keren Bergman, John shalf, "Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics", IEEE International Conference on Cluster Computing (CLUSTER), November 2023,
Jordan Hines, Marie Lu, Ravi K. Naik, Akel Hashim, Jean-Loup Ville, Brad Mitchell, John Mark Kriekebaum, David I. Santiago, Stefan Seritan, Erik Nielsen, Robin Blume-Kohout, Kevin Young, Irfan Siddiqi, Birgitta Whaley, Timothy Proctor, "Demonstrating Scalable Randomized Benchmarking of Universal Gate Sets", Phys. Rev. X, 2023, 041030, doi: 10.1103/PhysRevX.13.041030
J. Gu, P. Lin, K. Wu, S-H. Ku, C.S. Chang, R.M. Churchill, J. Choi, N. Podhorszki, S. Klasky, "Unraveling Diffusion in Fusion Plasma: A Case Study of In Situ Processing and Particle Sorting", In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV'23), 2023,
I. Mahmud, G. Papadimitriou, G. Wang, M. Kiran, A. Mandal, E. Deelman, "Elephants Sharing the Highway: Studying TCP Fairness in Large Transfers over High Throughput Links", 10th International Workshop on Innovating the Network for Data Intensive Science (INDIS 2023), 2023, doi: 10.1145/3624062.3624594
Nan Ding, Muhammad Haseeb, Taylor Groves, Samuel Williams, Evaluating the Performance of One-sided Communication on CPUs and GPUs, 2023 International Workshop on Performance, Portability & Productivity in HPC, November 13, 2023,
- Download File: ws_p3hpc112.pdf (pdf: 4.7 MB)
Yang Liu, Nan Ding, Piyush Sao, Samuel Williams, Xiaoye Sherry Li, "Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters", Supercomputing (SC), November 2023,
- Download File: SC23_3DSpTRSV_final.pdf (pdf: 2.9 MB)
Oscar Antepara, Samuel Williams, Scott Kruger, Torrin Bechtel, Joseph McClenaghan, Lang Lao, "Performance-Portable GPU Acceleration of the EFIT Tokamak Plasma Equilibrium Reconstruction Code", Workshop on Accelerator Programming and Directives (WACCPD), November 2023,
- Download File: WACCPD23_EFIT_final.pdf (pdf: 697 KB)
Oscar Antepara, Hans Johansen, Samuel Williams, Tuowen Zhao, Samantha Hirsch, Priya Goyal, Mary Hall, "Performance portability evaluation of blocked stencil computations on GPUs", International Workshop on Performance, Portability & Productivity in HPC (P3HPC), November 2023,
- Download File: P3HPC23_bricks_final-v4.pdf (pdf: 684 KB)
Julian Bellavita, Mathias Jacquelin, Esmond G. Ng, Dan Bonachea, Johnny Corbino, Paul H. Hargrove, "symPACK: A GPU-Capable Fan-Out Sparse Cholesky Solver", 2023 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM'23), ACM, November 13, 2023, doi: 10.1145/3624062.3624600
Sparse symmetric positive definite systems of equations are ubiquitous in scientific workloads and applications. Parallel sparse Cholesky factorization is the method of choice for solving such linear systems. Therefore, the development of parallel sparse Cholesky codes that can efficiently run on today’s large-scale heterogeneous distributed-memory platforms is of vital importance. Modern supercomputers offer nodes that contain a mix of CPUs and GPUs. To fully utilize the computing power of these nodes, scientific codes must be adapted to offload expensive computations to GPUs.
We present symPACK, a GPU-capable parallel sparse Cholesky solver that uses one-sided communication primitives and remote procedure calls provided by the UPC++ library. We also utilize the UPC++ "memory kinds" feature to enable efficient communication of GPU-resident data. We show that on a number of large problems, symPACK outperforms comparable state-of-the-art GPU-capable Cholesky factorization codes by up to 14x on the NERSC Perlmutter supercomputer.
R. Monga, A. Sim (advisor), K. Wu (advisor), "Comparative Study of the Cache Utilization Trends for Regional Scientific Data Caches", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’23), ACM Student Research Competition (SRC), First place winner, 2023,
Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian Jin, Houjun Tang, Jean Sexton, Sheng Di, Kai Zhao, Bo Fang, Zarija Lukić, Franck Cappello, James Ahrens, Dingwen Tao, "AMRIC: A novel in situ lossy compression framework for efficient I/O in adaptive mesh refinement applications", SC23: International Conference for High Performance Computing, Networking, Storage and Analysis, November 12, 2023, doi: 10.1145/3581784.3613212
Jakob Luettgau, Shane Snyder, Tyler Reddy, Nikolaus Awtrey, Kevin Harms, Jean Luca Bez, Rui Wang, Rob Latham, Philip Carns, "Enabling Agile Analysis of I/O Performance Data with PyDarshan", Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, USA, Association for Computing Machinery, November 12, 2023, 1380–1391, doi: 10.1145/3624062.3624207
Nan Ding, Muhammad Haseeb, Taylor Groves, Samuel Williams, "Evaluating the Performance of One-sided Communication on CPUs and GPUs", 2023 International Workshop on Performance, Portability & Productivity in HPC, November 12, 2023,
- Download File: OneSided_MPI_P3HPC_.pdf (pdf: 2.5 MB)
Michelle Mills Strout, Damian Rouson, Amir Kamil, Dan Bonachea, Jeremiah Corrado, Paul H. Hargrove, Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran, Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC23), November 12, 2023,
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models.
The tutorial is targeted for users with little-to-no parallel programming experience, but everyone is welcome. A partial differential equation example will be demonstrated in all three programming models. That example and others will be provided to attendees in a virtual environment. Attendees will be shown how to compile and run these programming examples, and the virtual environment will remain available to attendees throughout the conference, along with Slack-based interactive tech support.
Come join us to learn about some productive and performant parallel programming models!
Rafael Zamora-Resendiz , David W. Oslin, Dina Hooshyar, Silvia Crivelli, "Using Electronic Health Record Metadata to Predict Housing Instability amongst Veterans", Preventive Medicine Reports, November 7, 2023,
E Wes Bethel, Mercy G Amankwah, Jan Balewski, Roel Van Beeumen, Daan Camps, Daniel Huang, Talita Perciano, "Quantum computing and visualization: A disruptive technological change ahead", Journal, November 6, 2023, 43, doi: https://doi.org/10.1109/MCG.2023.3316932
Meriam Gay Bautista, Darren Lyles, Kylie Huch, Patricia Gonzalez-Guerrero, George Michelogiannakis, "Area Efficient Asynchronous SFQ Pulse Round-Robin Distribution Network", IEEE Transactions on Circuits and Systems I: Regular Papers, November 2023,
Alexander Anferov, Shannon P. Harvey, Fanghui Wan, Kan-Heng Lee, Jonathan Simon, David I. Schuster, "Low-loss Millimeter-wave Resonators with an Improved Coupling Structure", arXiv.org, 2023,
George Michelogiannakis, Yehia Arafa, Brandon Cook, Liang Yuan Dai, Abdel-Hameed Badawy, Madeleine Glick, Keren Bergman, John Shalf, Efficient Intra-Rack Resource Disaggregation in HPC Using Co-Packaged DWDM Photonics, IEEE Cluster 2023, November 1, 2023,
- Download File: ieee_cluster_photonics_disaggregation_2023.pdf (pdf: 1.1 MB)
Tong Wu, Anna Scaglione, Adrian Petru Surani, Daniel Arnold, Sean Peisert, "Network-Constrained Reinforcement Learning for Optimal EV Charging Control", Proceedings of the IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), October 2023,
C.S. Chang, S-H. Ku, R. Hager, J. Choi, D. Pugmire, S. Klasky, Scott, A. Loarte, R. Pitts, J. Gu, J. Wu, The role of turbulent separatrix tangle in the improvement of the integrated pedestal/heat exhaust issue for stationary operation in ITER and Fusion Reactors, APS Division of Plasma Physics Meeting, 2023,
Will Thacher and Hans Johansen and Daniel Martin, "A high order Cartesian grid, finite volume method for elliptic interface problems", Journal of Computational Physics, October 15, 2023, 491, doi: 10.1016/j.jcp.2023.112351
E. Mercado, H. T. Jung, C. Kim, A. L. Garcia, A. J. Nonaka, and J. B. Bell, "Surface Coverage Dynamics for Reversible Dissociative Adsorption on Finite Linear Lattices", J. Chem. Phys., October 12, 2023, 159:144107,
Maximilian Bremer, Nirmalendu Patra, Tan Nguyen, Dilip Vasudevan, Cy Chan, "Benefits of Optimistic Parallel Discrete Event Simulation for Network-on-Chip Simulation", 2023 IEEE/ACM 27th International Symposium on Distributed Simulation and Real Time Applications (DS-RT), Singapore, October 2, 2023, doi: 10.1109/DS-RT58998.2023.00013
R. Jambunathan, Z. Yao, R. Lombardini, A. Rodriguez, and A. Nonaka, "Two-Fluid Physical Modeling of Superconducting Resonators in the ARTEMIS Framework", Computer Physics Communications, October 2, 2023, 291:108836,
Andrew M. Bartolo, Mohamed M. Sabry Aly, George Michelogiannakis, Subhasish Mitra, "MC-ELMM: Multi-Chip Endurance-Limited Memory Management", MEMSYS: Proceedings of the International Symposium on Memory Systems, October 2023,
Robert Currie, Sean Peisert, Anna Scaglione, Aram Shumavon, Nikhil Ravi, "Data Privacy for the Grid: Toward a Data Privacy Standard for Inverter-Based and Distributed Energy Resources", IEEE Power & Energy Magazine, October 1, 2023,
Jean Luca Bez, Suren Byna, Shadi Ibrahim, "I/O Access Patterns in HPC Applications: A 360-Degree Survey", ACM Computing Surveys, September 15, 2023, 56, doi: 10.1145/3611007
A. Dubey, T. Ben-Nun, B. L. Chamberlain, B. R. de Supinski, D. Rouson, "Performance on HPC Platforms Is Possible Without C++", Computing in Science & Engineering, September 2023, 25 (5):48-52, doi: 10.1109/MCSE.2023.3329330
Computing at large scales has become extremely challenging due to increasing heterogeneity in both hardware and software. More and more scientific workflows must tackle a range of scales and use machine learning and AI intertwined with more traditional numerical modeling methods, placing more demands on computational platforms. These constraints indicate a need to fundamentally rethink the way computational science is done and the tools that are needed to enable these complex workflows. The current set of C++-based solutions may not suffice, and relying exclusively upon C++ may not be the best option, especially because several newer languages and boutique solutions offer more robust design features to tackle the challenges of heterogeneity. In June 2023, we held a mini symposium that explored the use of newer languages and heterogeneity solutions that are not tied to C++ and that offer options beyond template metaprogramming and Parallel. For for performance and portability. We describe some of the presentations and discussion from the mini symposium in this article.
P. Kumar, A. Nonaka, R. Jambunathan, G. Pahwa, S. Salahuddin, and Z. Yao, "FerroX: A GPU-accelerated, 3D Phase-Field Simulation Framework for Modeling Ferroelectric Devices", Computer Physics Communications, September 1, 2023, 290:108757,
J. G. Wang, D. R. Ladiges, I. Srivastava, S. P. Carney, A. J. Nonaka, A. L. Garcia, J. B. Bell, "Steric effects in induced-charge electro-osmosis for strong electric fields", Physical Review Fluids, August 29, 2023, 8:083702,
"Pagoda Updates PGAS Programming With Scalable Data Structures And Aggressively Asynchronous Communication", Rob Farber, Exascale Computing Project News, August 28, 2023, doi: 10.25344/S4SP4H
Akel Hashim, Stefan Seritan, Timothy Proctor, Kenneth Rudinger, Noah Goss, Ravi K Naik, John Mark Kreikebaum, David I Santiago, Irfan Siddiqi, "Benchmarking quantum logic operations relative to thresholds for fault tolerance", npj Quantum Information, 2023, 9:109, doi: 10.1038/s41534-023-00764-y
André Ramos Carneiro, Jean Luca Bez, Carla Osthoff, Lucas Mello Schnorr, Philippe O.A. Navaux, "Uncovering I/O demands on HPC platforms: Peeking under the hood of Santos Dumont", Journal of Parallel and Distributed Computing, August 18, 2023, 182, doi: https://doi.org/10.1016/j.jpdc.2023.104744
Riley R, Bowers RM, Camargo AP, Campbell A, Egan R, Eloe-Fadrosh EA, Foster B, Hofmeyr S, Huntemann M, Kellom M, Kimbrel JA, Oliker L, Yelick K, Pett-Ridge J, Salamov A, Varghese NJ, Clum A, "Terabase-Scale Coassembly of a Tropical Soil Microbiome", Microbiology Spectrum, August 17, 2023, doi: 10.1128/SPECTRUM.00200-23
S. Bevan, S. Cornford, L. Gilbert, I. Otosaka, D. Martin, T. Surawy-Stepney, "Amundsen Sea Embayment ice-sheet mass-loss predictions to 2050 calibrated using observations of velocity and elevation change", Journal of Glaciology, August 14, 2023, 1-11, doi: 10.1017/jog.2023.57
GM Wallace, Z Bai, N Bertelli, EW Bethel, T Perciano, S Shiraiwa, JC Wright, "Towards Fast, Accurate Predictions of RF Simulations via Data-driven Modeling: Forward and Lateral Models", Conference, AIP Publishing, August 1, 2023, 2984, doi: https://doi.org/10.1063/5.0162422
Jim Basney, Sean Peisert, Scott Russell, Kelli Shute, Bart Miller, Kathy Benninger, "A Vision for Securing NSF's Essential Scientific Cyberinfrastructure - Trusted CI Five-Year Strategic Plan (2024-2029)", Trusted CI Report, August 1, 2023, doi: 10.5281/zenodo.8193607
Hao Li, Han Cai, Joseph Forman, Ran Cheng, et al., "Transport Properties of NbN Thin Films Patterned With a Focused Helium Ion Beam", IEEE Transactions on Applied Superconductivity, August 2023,
Ran Cheng, Christoph Kirst, Dilip Vasudevan, "Superconducting-Oscillatory Neural Network With Pixel Error Detection for Image Recognition", IEEE Transaction on Applied Superconductivity, August 2023, 33:1-7,
Michelle Mills Strout, Damian Rouson, Amir Kamil, Dan Bonachea, Jeremiah Corrado, Paul H. Hargrove, Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran (CUF23), ECP/NERSC/OLCF Tutorial, July 2023,
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models. This tutorial should be accessible to users with little-to-no parallel programming experience, and everyone is welcome. A partial differential equation example will be demonstrated in all three programming models along with performance and scaling results on big machines. That example and others will be provided in a cloud instance and Docker container. Attendees will be shown how to compile and run these programming examples, and provided opportunities to experiment with different parameters and code alternatives while being able to ask questions and share their own observations. Come join us to learn about some productive and performant parallel programming models!
Secondary tutorial sites by event sponsors:
Sachin Kadam, Anna Scaglione, Nikhil Ravi, Sean Peisert, Brent Lunghino, Aram Shumavon, "Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets", Proceedings of the 2023 IEEE International Conference on Smart Applications, Communications and Networking (SmartNets), Istanbul, Turkey, July 25, 2023,
H. Klion, R. Jambunathan, M. E. Rowan, E. Yang, D. Willcox, J.-L. Vay, R. Lehe, A. Myers, A. Huebl, W. Zhang, "Particle-in-Cell Simulations of Relativistic Magnetic Reconnection with Advanced Maxwell Solver Algorithms", The Astrophysical Journal, July 13, 2023, 952,
"Arte inspirando la informática cuántica en el Advanced Quantum Testbed", Monica Hernandez, July 7, 2023,
"Art Inspiring a Quantum-Ready Vision at the Advanced Quantum Testbed", Monica Hernandez, July 7, 2023,
"Éxito reportado en la generación de operaciones cuánticas entrelazadas de dos cutrits con alta fidelidad", Monica Hernandez, July 6, 2023,
"Success Generating Two-Qutrit Entangling Gates With High Fidelity", Monica Hernandez, July 6, 2023,
H-C. Yang, L. Jin, A. Lazar, A. Todd-Blick, A. Sim, K. Wu, Q. Chen, C. A. Spurlock, "Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective", Systems, 2023, 11(6):314, doi: 10.3390/systems11060314
Sean Peisert, "On Software Infrastructure: Develop, Prove, Profit? [From the Editors]", IEEE Security & Privacy, July 2023, doi: 10.1109/MSEC.2023.3273492
Mohammed A. Alhussaini, Zachary M. Binger, Bianca M. Souza-Chaves, Oluwamayowa O. Amusat, Jangho Park, Timothy V. Bartholomew, Dan Gunter, Andrea Achilli, "Analysis of backwash settings to maximize net water production in an engineering-scale ultrafiltration system for water reuse", Journal of Water Process Engineering, 2023, 53, doi: 10.1016/j.jwpe.2023.103761
R. Shao, A. Sim, K. Wu, J. Kim, "Leveraging History to Predict Abnormal Transfers in Distributed Workflows", Sensors, 2023, 23(12):5485, doi: 10.3390/s23125485
Raksha Ramakrishna, Anna Scaglione, Tong Wu, Nikhil Ravi, Sean Peisert, "Differential Privacy for Class-based Data: A Practical Gaussian Mechanism", June 23, 2023, doi: 10.1109/TIFS.2023.3289128
Z. Deng, A. Sim, K. Wu, C. Guok, I. Monga, F. Andrijauskas, F. Wuerthwein, D. Weitzel, "Analyzing Transatlantic Network Traffic Patterns with Scientific Data Caches", 6th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2023), 2023, doi: 10.1145/3589012.3594897
Bin Dong, Jean Luca Bez, Suren Byna, "AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis.", In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’23), June 16, 2023,
- Download File: IODiagnose-final.pdf (pdf: 1.9 MB)
Alexander Anferov, Kan-Heng Lee, Fang Zhao, Jonathan Simon, David I. Schuster, "Improved Coherence in Optically-Defined Niobium Trilayer Junction Qubits", arXiv.org, 2023,
C. Guok, E. Kissel, A. Sim, ESnet's In-Network Caching Pilot, The Network Conference 2023 (TNC'23), 2023,
Paul H. Hargrove, PGAS Programming Models: My 20-year Perspective, Keynote for 10th Annual Chapel Implementers and Users Workshop (CHIUW 2023), June 2, 2023, doi: 10.25344/S4K59C
Paul H. Hargrove has been involved in the world of Partitioned Global Address Space (PGAS) programming models since 1999, before he knew such a thing existed. Early involvement in the GASNet communications library as used in implementations of UPC, Titanium and Co-array Fortran convinced Paul that one could have productivity and performance without sacrificing one for the other. Since then he has been among the apostates who work to overturn the belief that message-passing is the only (or best) way to program for High-Performance Computing (HPC). Paul has been fortunate to witness the history of the PGAS community through several rare opportunities, including interactions made possible by the wide adoption of GASNet and through operating a PGAS booth at the annual SC conferences from 2007 to 2017. In this talk, Paul will share some highlights of his experiences across 24 years of PGAS history. Among these is the DARPA High Productivity Computing Systems (HPCS) project which helped give birth to Chapel.
Jie Li, George Michelogiannakis, Brandon Cook, Dulanya Cooray, Yong Chen, "Analyzing Resource Utilization in an HPC System: A Case Study of NERSC Perlmutter", ISC High Performance, Elsevier, May 2023,
George Michelogiannakis, Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter, ISC High Performance, May 2023,
- Download File: isc2023.pdf (pdf: 1.1 MB)
Hammad Ather, Jean Luca Bez, Boyana Norris, Suren Byna, "Illuminating the I/O Optimization Path of Scientific Applications", High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings, Hamburg, Germany, Springer-Verlag, May 21, 2023, 22–41, doi: https://doi.org/10.1007/978-3-031-32041-5_2
The existing parallel I/O stack is complex and difficult to tune due to the interdependencies among multiple factors that impact the performance of data movement between storage and compute systems. When performance is slower than expected, end-users, developers, and system administrators rely on I/O profiling and tracing information to pinpoint the root causes of inefficiencies. Despite having numerous tools that collect I/O metrics on production systems, it is not obvious where the I/O bottlenecks are (unless one is an I/O expert), their root causes, and what to do to solve them. Hence, there is a gap between the currently available metrics, the issues they represent, and the application of optimizations that would mitigate performance slowdowns. An I/O specialist often checks for common problems before diving into the specifics of each application and workload. Streamlining such analysis, investigation, and recommendations could close this gap without requiring a specialist to intervene in every case. In this paper, we propose a novel interactive, user-oriented visualization, and analysis framework, called Drishti. This framework helps users to pinpoint various root causes of I/O performance problems and to provide a set of actionable recommendations for improving performance based on the observed characteristics of an application. We evaluate the applicability and correctness of Drishti using four use cases from distinct science domains and demonstrate its value to end-users, developers, and system administrators when seeking to improve an application’s I/O performance.
Popovici DT, Awan MG, Guidi G, Egan R, Hofmeyr S, Oliker L, Yelick K, "Designing Efficient SIMD Kernels for High Performance Sequence Alignment", 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 19, 2023, doi: 10.1109/IPDPSW59300.2023.00038
Zhenguo Wu, Liang Yuan Dai, Asher Novick, Madeleine Glick, Ziyi Zhu, Sébastien Rumley, George Michelogiannakis, John Shalf, Keren Bergman, "Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications", IEEE Journal of Lightwave Technology, May 2023,
John Ravi, Suren Byna, Quincey Koziol, Houjun Tang, Michela Becchi, "Evaluating Asynchronous Parallel I/O on HPC Systems", 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 15, 2023, doi: 10.1109/IPDPS54959.2023.00030
Md Kamal Hossain Chowdhury, Houjun Tang, Jean Luca Bez, Purushotham V. Bangalore, Suren Byna, "Efficient Asynchronous I/O with Request Merging", 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, FL, USA, IEEE, 2023, 628-636, doi: 10.1109/IPDPSW59300.2023.00107
Nabil Abubaker, Orhun Caglayan, M. Ozan Karsavuran, Cevdet Aykanat,, "Minimizing Staleness and Communication Overhead in Distributed SGD for Collaborative Filtering", IEEE Transactions on Computers, May 2023, doi: 10.1109/TC.2023.3275107
Hammad Ather, Jean Luca Bez, Boyana Norris, Suren Byna, "Illuminating the I/O Optimization Path of Scientific Applications", International Conference on High Performance Computing (ISC'23), Springer Nature Switzerland, May 10, 2023, 22-41, doi: https://doi.org/10.1007/978-3-031-32041-5_2
E. Kissel, A. Sim, C. Guok, Experiences in deploying in-network data caches, 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), 2023,
J. Bellavita, C. Sim, K. Wu, A. Sim, S. Yoo, H. Ito, V. Garonne, E. Lancon, Understanding Data Access Patterns for dCache System, 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), 2023,
C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, F. Wurthwein, D. Davila, H. Newman, J. Balcas, Predicting Resource Usage Trends with Southern California Petabyte Scale Cache, 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), 2023,
S. Kim, A. Sim, K. Wu, S. Byna, Y. Son, H. Eom, "Design and Implementation of I/O Performance Prediction Scheme on HPC Systems through Large-scale Log Analysis", Journal of Big Data, 2023, 10(65), doi: 10.1186/s40537-023-00741-4
Kylie Huch, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Hyperdimensional Associative Memory Circuit for Scalable Machine Learning", IEEE Transactions on Applied Superconductivity, May 2023,
Alex Doe, Jane Doe, Dianna LaFerry, John Smith, "Test Title for Sample Publication", Conference, April 22, 2023, No.1:555-600,
This is a test publication for the purposes of explaining the SilverStripe 4 local publications database. It is intended as a guidepost for users and does not contain any relevant scientific information. All authors, titles, and dates are fictitious.
Tim Kneafsey, David Trebotich, Terry Ligocki, "Direct Numerical Simulation of Flow Through Nanoscale Shale Pores in a Mesoscale Sample", Album of Porous Media, edited by E.F. Médici, A.D. Otero, (Springer Cham: April 14, 2023) Pages: 87 doi: https://doi.org/10.1007/978-3-031-23800-0_69
Sergi Molins, David Trebotich, "Pore-Scale Controls on Calcite Dissolution using Direct Numerical Simulations", Album of Porous Media, edited by E.F. Médici, A.D. Otero, (Springer Cham: April 14, 2023) Pages: 135 doi: https://doi.org/10.1007/978-3-031-23800-0_112
David Trebotich, Terry Ligocki, "High Resolution Simulation of Fluid Flow in Press Felts Used in Paper Manufacturing", Album of Porous Media, edited by E.F. Médici, A.D. Otero, (Springer Cham: April 14, 2023) Pages: 132 doi: https://doi.org/10.1007/978-3-031-23800-0_109
"Innovating quantum computers with fluxonium processors", Monica Hernandez, News release, April 11, 2023,
Nikhil Ravi, Anna Scaglione, Julieta Giraldez, Parth Pradhan, Chuck Moran, Sean Peisert, "Solar Photovoltaic Systems Metadata Inference and Differentially Private Publication", arXiv preprint arXiv:2304.03749, April 7, 2023, doi: 10.48550/arXiv.2304.03749
Patricia Gonzalez-Guerrero, Kylie Huch, Nirmalendu Patra, Thom Popovici, George Michelogiannakis, "An Area Efficient Superconducting Unary CNN Accelerator", IEEE 24th International Symposium on Quality Electronic Design (ISQED), IEEE, April 2023,
Sean Peisert, "The First 20 Years of IEEE Security & Privacy [From the Editors]", IEEE Security & Privacy, April 1, 2023, 21(2):4-6, doi: 10.1109/MSEC.2023.3236420
George Cybenko, Carl Landwehr, Shari Lawrence Pfleeger, Sean Peisert, A 20th Anniversary Episode Chat With S&P Editors, IEEE Security & Privacy, Pages: 9-16 April 2023, doi: 10.1109/MSEC.2023.3239179
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 31, 2023, LBNL 2001516, doi: 10.25344/S46W2J
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Monica Hernandez, "Quantum Systems Accelerator 2023 Impact Report", Impact Report, March 17, 2023,
Raghu Bollapragada, Stefan M. Wild, "Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization", Mathematical Programming Computation, 2023, 15:327--364, doi: 10.1007/s12532-023-00233-9
Nabil Abubaker, M. Ozan Karsavuran, Cevdet Aykanat, "Scaling Stratified Stochastic Gradient Descent for Distributed Matrix Completion", IEEE Transactions on Knowledge and Data Engineering, March 2023, doi: 10.1109/TKDE.2023.3253791
Dilip Vasudevan, George Michelogiannakis, "Efficient Temporal Arithmetic Logic Design for Superconducting RSFQ Logic", IEEE Transactions on Applied Superconductivity, March 2023,
Daniel Finn, Matthew Knepley, Joseph Pusztay and Mark Adams, "A Numerical Study of Landau Damping with PETSc-PIC", CAMCoS, March 1, 2023, doi: 10.2140/camcos.2023.18.135
- Download File: Finn2023-LD.pdf (pdf: 2.7 MB)
Damian Rouson, Producing Software for Science with Class, SIAM Conference on Computational Science and Engineering, March 1, 2023,
- Download File: Rouson-SIAM-CSE-2023.pdf (pdf: 7.5 MB)
The Computer Languages and Systems Software (CLaSS) Group at Berkeley Lab researches and develops programming models, languages, libraries, and applications for parallel and quantum computing. The open-source software under development in CLaSS includes the GASNet-EX networking middleware, the UPC++ partitioned global address space (PGAS) template library, the Berkeley Quantum Synthesis Toolkit (BQSKit), and the MetaHipMer metagenome assembler. This talk will start with an overview of CLaSS software and the software sustainability practices commonly employed across the group. The talk will then dive more deeply into the our burgeoning contributions to the ecosystem supporting modern Fortran, including our test development for the LLVM Flang Fortran compiler. This presentation will demonstrate how agile software development techniques are helping to ensure robust front-end support for standard Fortran 2018 parallel programming features. The talk will also present several key insights that inspired our design and development of the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine) parallel runtime library, emphasizing the design choices that help to ensure sustainability. Lastly, the talk will demonstrate the productivity benefits associated with the first Caffeine application in Motility Analysis of T-Cell Histories in Activation (Matcha).
McCoy H, Hofmeyr S, Yelick K, Pandey P, "High-Performance Filters for GPUs", Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, February 25, 2023, doi: 10.1145/3572848.3577507
C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, F. Wurthwein, D. Davila, H. Newman, J. Balcas, "Effectiveness and predictability of in-network storage cache for Scientific Workflows", International Conference on Computing, Networking and Communication (ICNC 2023), 2023, doi: 10.1109/ICNC57223.2023.10074058
Johnny Corbino, UPC++’s Crucial Role in Quantum Chemistry, UPC++ Community BOF Virtual Symposium, February 16, 2023, doi: 10.25344/S4XG6F
Brad Richardson, Damian Rouson, Harris Snyder, Robert Singelterry, "Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran", Workshop on Asynchronous Many-Task Systems and Applications (WAMTA'23), Baton Rouge, LA, February 2023, doi: 10.25344/S4ZC73
Most parallel scientific programs contain compiler directives (pragmas) such as those from OpenMP, explicit calls to runtime library procedures such as those implementing the Message Passing Interface (MPI), or compiler-specific language extensions such as those provided by CUDA. By contrast, the recent Fortran standards empower developers to express parallel algorithms without directly referencing lower-level parallel programming models. Fortran’s parallel features place the language within the Partitioned Global Address Space (PGAS) class of programming models. When writing programs that exploit data-parallelism, application developers often find it straightforward to develop custom parallel algorithms. Problems involving complex, heterogeneous, staged calculations, however, pose much greater challenges. Such applications require careful coordination of tasks in a manner that respects dependencies prescribed by a directed acyclic graph. When rolling one’s own solution proves difficult, extending a customizable framework becomes attractive. The paper presents the design, implementation, and use of the Framework for Extensible Asynchronous Task Scheduling (FEATS), which we believe to be the first task-scheduling tool written in modern Fortran. We describe the benefits and compromises associated with choosing Fortran as the implementation language, and we propose ways in which future Fortran standards can best support the use case in this paper.
Ziqian Li, Tanay Roy, David Rodriguez Perez, Kan-Heng Lee, Eliot Kapit, David I. Schuster, "Autonomous error correction of a single logical qubit using two transmons", arXiv.org, 2023,
Nicholson Koukpaizan, Roofline Analysis using AMD Tools on AMD GPUs, ECP Annual Meeting, February 2023,
Neil Mehta, Roofline Performance Analysis on NVIDIA GPUs, ECP Annual Meeting, February 2023,
JaeHyuk Kwack, Roofline Performance Analysis w/Intel Advisor on Intel CPUs & GPUs, ECP Annual Meeting, February 2023,
Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, February 8, 2023,
J. Wang, K. Wu, A. Sim, S. Hwangbo, "Locating Partial Discharges in Power Transformers with Convolutional Iterative Filtering", Sensors, 2023, 23, doi: 10.3390/s23041789
Tyler H. Chang, Stefan M. Wild, ParMOO: A Python library for parallel multiobjective simulation optimization, Journal of Open Source Software, Pages: 4468 2023, doi: 10.21105/joss.04468
Nathan A. Kimbrel, Allison E. Ashley-Koch, Xue J. Qin, Jennifer H. Lindquist, Melanie E. Garrett, Michelle F. Dennis, Lauren P. Hair, Jennifer E. Huffman, Daniel A. Jacobson, Ravi K. Madduri, Jodie A. Trafton, Hilary Coon, Anna R. Docherty, Niamh Mullins, Douglas M. Ruderfer, Philip D. Harvey, Benjamin H. McMahon, David W. Oslin, Jean C. Beckham, Elizabeth R. Hauser, Michael A. Hauser, Million Veteran Program Suicide Exemplar Workgroup, International Suicide Genetics Consortium, Veterans Affairs Mid-Atlantic Mental Illness Research Education and Clinical Center Workgroup, Veterans Affairs Million Veteran Program, "Identification of Novel, Replicable Genetic Risk Loci for Suicidal Thoughts and Behaviors Among US Military Veterans", JAMA Psychiatry, February 1, 2023, 80:100-191, doi: 10.1001/jamapsychiatry.2022.3896
Hector G. Martin, Tijana Radivojevic, Jeremy Zucker, Kristofer Bouchard, Jess Sustarich, Sean Peisert, Dan Arnold, Nathan Hillson, Gyorgy Babnigg, Jose M. Marti, Christopher J. Mungall, Gregg T. Beckham, Lucas Waldburger, James Carothers, ShivShankar Sundaram, Deb Agarwal, Blake A. Simmons, Tyler Backman, Deepanwita Banerjee, Deepti Tanjore, Lavanya Ramakrishnan, Anup Singh, "Perspectives for Self-Driving Labs in Synthetic Biology", Current Opinion in Biotechnology, February 2023, doi: 10.1016/j.copbio.2022.102881
I. Srivastava, D. R. Ladiges, A. Nonaka, A. L. Garcia, J. B. Bell, "Staggered Scheme for the Compressible Fluctuating Hydrodynamics of Multispecies Fluid Mixtures", Physical Review E, January 24, 2023, 107:015305, doi: 10.1103/PhysRevE.107.015305
Paul H. Hargrove, Dan Bonachea, Johnny Corbino, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'23)", Poster at Exascale Computing Project (ECP) Annual Meeting 2023, January 2023,
The Pagoda project is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. The first component is GASNet-EX, a portable, high-performance, global-address-space communication library. The second component is UPC++, a C++ template library. Together, these libraries enable agile, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
GASNet-EX is a portable, high-performance communications middleware library which leverages hardware support to implement Remote Memory Access (RMA) and Active Message communication primitives. GASNet-EX supports a broad ecosystem of alternative HPC programming models, including UPC++, Legion, Chapel and multiple implementations of UPC and Fortran Coarrays. GASNet-EX is implemented directly over the native APIs for networks of interest in HPC. The tight semantic match of GASNet-EX APIs to the client requirements and hardware capabilities often yields better performance than competing libraries.
UPC++ provides high-level productivity abstractions appropriate for Partitioned Global Address Space (PGAS) programming such as: remote memory access (RMA), remote procedure call (RPC), support for accelerators (e.g. GPUs), and mechanisms for aggressive asynchrony to hide communication costs. UPC++ implements communication using GASNet-EX, delivering high performance and portability from laptops to exascale supercomputers. HPC application software using UPC++ includes: MetaHipMer2 metagenome assembler, SIMCoV viral propagation simulation, NWChemEx TAMM, and graph computation kernels from ExaGraph.
George Michelogiannakis, A Case for Intra-Rack Resource Disaggregation for HPC, HiPEAC conference 2023, January 17, 2023,
- Download File: disaggregation.pptx.pdf (pdf: 1.3 MB)
Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Gay Bautista, George Michelogiannakis, "PaST-NoC: A Packet-Switched Superconducting Temporal NoC", IEEE Transactions on Applied Superconductivity, January 2023,
H-C. Yang, L. Jin, A. Lazar, A. Todd-Blick, A. Sim, K. Wu, Q. Chen, C. A. Spurlock, Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective, Transportation Research Board 102nd Annual Meeting,, 2023,
J. Bang, A. Sim, G. Lockwood, H. Eom, H. Sung, "Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems", IEEE Access, 2023, doi: 10.1109/ACCESS.2022.3233829
"Singleton Sieving: Overcoming the Memory/Speed Trade-Off in Exascale k-mer Analysis", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23), January 1, 2023, doi: 10.25344/S4TP4T
2022
Monica Hernandez, Quantum Computing Workshop Brings Classical Control Systems Into Focus, News release, December 20, 2022,
V. Cirigliano, Z. Davoudi, J. Engel, R. J. Furnstahl, G. Hagen, U. Heinz, H. Hergert, M. Horoi, C. W. Johnson, A. Lovato, E. Mereghetti, W. Nazarewicz, A. Nicholson, T. Papenbrock, S. Pastore, M. Plumlee, D. R. Phillips, P. E. Shanahan, S. R. Stroberg, F. Viens, A. Walker-Loud, K. A. Wendt, S. M. Wild, "Towards Precise and Accurate Calculations of Neutrinoless Double-Beta Decay", Journal of Physics G: Nuclear and Particle Physics, 2022, 49:120502, doi: 10.1088/1361-6471/aca03e
Daniel Martin, Samuel Kachuck, Joanna Millstein, Brent Minchew, "Examining the Sensitivity of Ice Sheet Models to Updates in Rheology (n=4)", AGU Fall Meeting, December 15, 2022,
- Download File: AGU2022-1.pdf (pdf: 508 KB)
"Jumpstarting the Future Quantum Workforce", Monica Hernandez, Feature, December 13, 2022,
S. S. Sawant, Z. Yao, R. Jambunathan, A. Nonaka, "Characterization of Transmission Lines in Microelectronic Circuits Using the ARTEMIS Solver", IEEE Journal on Multiscale and Multiphysics Computational Techniques, December 12, 2022, 8:31-39,
"Berkeley Lab’s Networking Middleware GASNet Turns 20: Now, GASNet-EX is Gearing Up for the Exascale Era", Linda Vu, HPCWire (Lawrence Berkeley National Laboratory CS Area Communications), December 7, 2022, doi: 10.25344/S4BP4G
GASNet Celebrates 20th Anniversary
For 20 years, Berkeley Lab’s GASNet has been fueling developers’ ability to tap the power of massively parallel supercomputers more effectively. The middleware was recently upgraded to support exascale scientific applications.
Noah Goss, Alexis Morvan, Brian Marinelli, Bradley K Mitchell, Long B Nguyen, Ravi K Naik, Larry Chen, Christian J{\"u}nger, John Mark Kreikebaum, David I Santiago, others, "High-fidelity qutrit entangling gates for superconducting circuits", Nature Communications, 2022, 13:7481, doi: 10.1038/s41467-022-34851-z
Ammar Haydari, Chen-Nee Chuah, Michael Zhang, Jane Macfarlane, Sean Peisert, "Differentially Private Map Matching for Mobility Trajectories", Proceedings of the 2022 Annual Computer Security Applications Conference (ACSAC), Austin, TX, ACM, December 2022, doi: 0.1145/3564625.3567974
D. Fan, D. E. Willcox, C. DeGrendele, M. Zingale, and A. Nonaka, "Neural Networks for Nuclear Reactions in MAESTROeX", he Astrophysical Journal, November 29, 2022, 940,
Melissa L. Graham, Robert A. Knop, Thomas Kennedy, Peter E. Nugent, Eric Bellm, Márcio Catelan, Avi Patel, Hayden Smotherman, Monika Soraisam, Steven Stetzler, Lauren N. Aldoroty, Autumn Awbrey, Karina Baeza-Villagra, Pedro H. Bernardinelli, Federica Bianco, Dillon Brout, Riley Clarke, William I. Clarkson, Thomas Collett, James R. A. Davenport, Shenming Fu, John E. Gizis, Ari Heinze, Lei Hu, Saurabh W. Jha, Mario Jurić, J. Bryce Kalmbach, Alex Kim, Chien-Hsiu Lee, Chris Lidman, Mark Magee, Clara E. Martínez-Vázquez, Thomas Matheson, Gautham Narayan, Antonella Palmese, Christopher A. Phillips, Markus Rabus, Armin Rest, Nicolás Rodríguez-Segovia, Rachel Street, A. Katherina Vivas, Lifan Wang, Nicholas Wolf, Jiawen Yang, "Deep drilling in the time domain with DECam: Survey characterization", Monthly Notices of the Royal Astronomical Society, November 2022,
X. Li, Y. Liu, P. Lin, P. Sao, "Newly released capabilities in distributed-memory SuperLU sparse direct solver", ACM Transactions on Mathematical Software, November 19, 2022,
- Download File: 3577197.pdf (pdf: 1.1 MB)
D. R. Ladiges, J. G. Wang, I. Srivastava, S. P. Carney, A. Nonaka, A. L. Garcia, A. Donev, J. B. Bell, "Modeling Electrokinetic Flows with the Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm", Physical Review E, November 19, 2022, 106:035104, doi: 10.1103/PhysRevE.106.035104
Taylor Groves, Chris Daley, Rahulkumar Gayatri, Hai Ah Nam, Nan Ding, Lenny Oliker, Nicholas J. Wright, Samuel Williams, "A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures", PMBS, November 2022,
- Download File: PMBS22_GPU_final.pdf (pdf: 719 KB)
Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, LeAnn Lindsey, Christopher Daley, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, Methodology for Evaluating the Potential of Disaggregated Memory Systems, https://resdis.github.io/ws/2022/sc/, November 18, 2022,
- Download File: RESDIS22_Disaggregated_memory_Nan.pdf (pdf: 3.8 MB)
Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", RESDIS, https://resdis.github.io/ws/2022/sc/, November 18, 2022,
- Download File: Methodology-for-Evaluating-the-Potential-of-Disaggregated-Memory-Systems.pdf (pdf: 5.1 MB)
Jean Luca Bez, Visualizing I/O bottlenecks with DXT Explorer 2.0, Analyzing Parallel I/O (BoF) is held in conjunction with SC22, 2022,
Andrew Adams, Emily K. Adams, Dan Gunter, Ryan Kiser, Mark Krenz, Sean Peisert, John Zage, "Roadmap for Securing Operational Technology in NSF Scientific Research", Trusted CI Report, November 16, 2022, doi: 10.5281/zenodo.7327987
Julian Bellavita, Alex Sim (advisor), John Wu (advisor), "Predicting Scientific Dataset Popularity Using dCache Logs", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC), Second place winner, 2022,
The dCache installation is a storage management system that acts as a disk cache for high-energy physics (HEP) data. Storagespace on dCache is limited relative to persistent storage devices, therefore, a heuristic is needed to determine what data should be kept in the cache. A good cache policy would keep frequently accessed data in the cache, but this requires knowledge of future dataset popularity. We present methods for forecasting the number of times a dataset stored on dCache will be accessed in the future. We present a deep neural network that can predict future dataset accesses accurately, reporting a final normalized loss of 4.6e-8. We present a set of algorithms that can forecast future dataset accesses given an access sequence. Included are two novel algorithms, Backup Predictor and Last N Successors, that outperform other file prediction algorithms. Findings suggest that it is possible to anticipate dataset popularity in advance.
Katherine Rasmussen, Damian Rouson, Naje George, Dan Bonachea, Hussain Kadhem, Brian Friesen, "Agile Acceleration of LLVM Flang Support for Fortran 2018 Parallel Programming", Research Poster at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), November 2022, doi: 10.25344/S4CP4S
The LLVM Flang compiler ("Flang") is currently Fortran 95 compliant, and the frontend can parse Fortran 2018. However, Flang does not have a comprehensive 2018 test suite and does not fully implement the static semantics of the 2018 standard. We are investigating whether agile software development techniques, such as pair programming and test-driven development (TDD), can help Flang to rapidly progress to Fortran 2018 compliance. Because of the paramount importance of parallelism in high-performance computing, we are focusing on Fortran’s parallel features, commonly denoted “Coarray Fortran.” We are developing what we believe are the first exhaustive, open-source tests for the static semantics of Fortran 2018 parallel features, and contributing them to the LLVM project. A related effort involves writing runtime tests for parallel 2018 features and supporting those tests by developing a new parallel runtime library: the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine).
C. Sim, C. Guok (advisor), A. Sim (advisor), K. Wu (advisor), "Data Throughput Performance Trends of Regional Scientific Data Cache", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC), 2022,
Jean Luca Bez, Hammad Ather, Suren Byna, "Drishti: Guiding End-Users in the I/O Optimization Journey", PDSW 2022, held in conjunction with SC22, 2022,
Paul H. Hargrove, Dan Bonachea, "GASNet-EX RMA Communication Performance on Recent Supercomputing Systems", 5th Annual Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM'22), November 2022, doi: 10.25344/S40C7D
Partitioned Global Address Space (PGAS) programming models, typified by systems such as Unified Parallel C (UPC) and Fortran coarrays, expose one-sided Remote Memory Access (RMA) communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.
GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in emerging exascale machines. The library is an evolution of the popular GASNet communication system, building upon 20 years of lessons learned. We present microbenchmark results which demonstrate the RMA performance of GASNet-EX is competitive with MPI implementations on four recent, high-impact, production HPC systems. These results are an update relative to previously published results on older systems. The networks measured here are representative of hardware currently used in six of the top ten fastest supercomputers in the world, and all of the exascale systems on the U.S. DOE road map.
Rajeev Jain, Houjun Tang, Akash Dhruv, J Austin Harris, Suren Byna, "Accelerating flash-x simulations with asynchronous I/O", https://ieeexplore.ieee.org/abstract/document/10026923/, November 13, 2022, doi: 10.1109/PDSW56643.2022.00008
Sian Jin, Dingwen Tao, Houjun Tang, Sheng Di, Suren Byna, Zarija Lukic, Franck Cappello, "Accelerating parallel write via deeply integrating predictive lossy compression with HDF5", SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, November 13, 2022, doi: 10.1109/SC41404.2022.00066
Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams, "Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations", MCHPC, November 2022,
- Download File: MCHPC22_final.pdf (pdf: 401 KB)
Mathias Weiden, Justin Kalloor, John Kubiatowicz, Ed Younis, Costin Iancu, "Wide Quantum Circuit Optimization with Topology Aware Synthesis", Third International Workshop on Quantum Computing Software, November 13, 2022,
Unitary synthesis is an optimization technique that can achieve optimal gate counts while mapping quantum circuits to restrictive qubit topologies. Synthesis algorithms are limited in scalability by their exponentially growing run times. Application to wide circuits requires partitioning into smaller components. In this work, we explore methods to reduce depth and multi-qubit gate count of wide, mapped quantum circuits using synthesis. We present TopAS, a topology aware synthesis tool that preconditions quantum circuits before mapping. Partitioned subcircuits are optimized and fitted to sparse subtopologies to balance the opposing demands of synthesis and mapping algorithms. Compared to state of the art wide circuit synthesis algorithms, TopAS is able to reduce depth on average by 35.2% and CNOT count by 11.5% for mesh topologies. Compared to the optimization and mapping algorithms of Qiskit and Tket, TopAS is able to reduce CNOT counts by 30.3% and depth by 38.2% on average.
Damian Rouson, Dan Bonachea, "Caffeine: CoArray Fortran Framework of Efficient Interfaces to Network Environments", Proceedings of the Eighth Annual Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC2022), Dallas, Texas, USA, IEEE, November 2022, doi: 10.25344/S4459B
This paper provides an introduction to the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine), a parallel runtime library built atop the GASNet-EX exascale networking library. Caffeine leverages several non-parallel Fortran features to write type- and rank-agnostic interfaces and corresponding procedure definitions that support parallel Fortran 2018 features, including communication, collective operations, and related services. One major goal is to develop a runtime library that can eventually be considered for adoption by LLVM Flang, enabling that compiler to support the parallel features of Fortran. The paper describes the motivations behind Caffeine's design and implementation decisions, details the current state of Caffeine's development, and previews future work. We explain how the design and implementation offer benefits related to software sustainability by lowering the barrier to user contributions, reducing complexity through the use of Fortran 2018 C-interoperability features, and high performance through the use of a lightweight communication substrate.
M. Wang, Y. Liu, P. Ghysels, A. C. Yucel, "VoxImp: Impedance Extraction Simulator for Voxelized Structures", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, November 2, 2022, doi: 10.1109/TCAD.2022.3218768
Mestan Firat Celiktug, M. Ozan Karsavuran, Seher Acer, Cevdet Aykanat, "Simultaneous Computational and Data Load Balancing in Distributed-Memory Setting", SIAM Journal on Scientific Computing, November 2022, 44(6):C399-C424, doi: 10.1137/22M1485772
Nabil Abubaker, M. Ozan Karsavuran, Cevdet Aykanat, "Scalable Unsupervised ML: Latency Hiding in Distributed Sparse Tensor Decomposition", IEEE Transactions on Parallel and Distributed Systems, November 2022, 33(11):3028-3040, doi: 10.1109/TPDS.2021.3128827
J. Wang, K. Wu, A. Sim, S. Hwangbo, "Feature Engineering and Classification Models for Partial Discharge in Power Transformers", arXiv, 2022, doi: 10.48550/arXiv.2210.12216
Yilun Xu, Gang Huang, Jan Balewski, Alexis Morvan, Kasra Nowrouzi, David I. Santiago, Ravi K. Naik, Brad Mitchell, Irfan Siddiqi, "Automatic Qubit Characterization and Gate Optimization with QubiC", ACM Transactions on Quantum Computing, 2022, doi: 10.1145/3529397
George Michelogiannakis, Intra-Rack Resource Disaggregation Using Emerging Photonics, OCP global summit, October 19, 2022,
- Download File: disaggregation_2022.pdf (pdf: 953 KB)
John Shalf, George Michelogiannakis, Heterogeneous Integration for HPC, OCP global summit, October 19, 2022,
- Download File: chiplets_2022.pdf (pdf: 1.2 MB)
Oluwamayowa O. Amusat, Tim Barthlomew, Adam A. Atia, Cost optimization of desalination systems using WaterTAP incorporating detailed water chemistry models, 2022 INFORMS Annual Meeting, 2022,
"The Sparks That Ignited Curiosity: How Quantum Researchers Found Their Path", Monica Hernandez, Feature, October 14, 2022,
"La curiosidad por la informática cuántica: Cómo cinco científicos encontraron su especialización", Monica Hernandez, Feature in Spanish, October 14, 2022,
"El Advanced Quantum Testbed en Berkeley Lab lidera avances científicos para la computación cuántica", Monica Hernandez, Feature in Spanish, October 14, 2022,
"How Berkeley Lab’s Advanced Quantum Testbed Paves Breakthroughs for Quantum Computing", Monica Hernandez, Feature, October 14, 2022,
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-Flux Shift Register for Race Logic and Its Applications", IEEE Transactions on Circuits and Systems I: Regular Papers, October 2022,
Jean Luca Bez, Where's the Bottleneck?, Berkeley Lab Research SLAM, October 7, 2022,
Yang Liu, Jian Song, Robert Burridge, Jianliang Qian, "A Fast Butterfly-compressed Hadamard-Babich Integrator for High-Frequency Helmholtz Equations in Inhomogeneous Media with Arbitrary Sources", SIAM Multiscale Modeling and Simulation, October 6, 2022,
- Download File: 2210-v2.02698.pdf (pdf: 38 MB)
William F. Godoy, Ritu Arora, Keith Beattie, David E. Bernholdt, Sarah E. Bratt, Daniel S. Katz, Ignacio Laguna, Amiya K. Maji, Addi Malviya-Thakur, Rafael M. Mudafort, Nitin Sukhija, Damian Rouson, Cindy Rubio-Gonzalez, Karan Vahi, "Giving Research Software Engineers a Larger Stage Through the Better Scientific Software Fellowship", Computing in Science & Engineering, October 2022, 24 (5):6-13, doi: 10.1109/MCSE.2023.3253847
The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. The BSSwF’s vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). Case studies from several of the program’s participants illustrate the diverse ways the BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as the BSSwF can help recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations, and ideas for a larger audience.
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001480, doi: 10.25344/S4M59P
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power, "SoK: Limitations of Confidential Computing via TEEs for High-Performance Compute Systems", Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), September 2022,
Mateusz Pusz, Gašper Ažman, Bengt Gustafsson, Colin MacLean, Corentin Jabot, "Universal Template Parameters", ISO C++ Standard Mailing, September 2022,
This paper proposes a unified model for universal template parameters (UTPs) and dependent names, enabling more comprehensive and consistent template metaprogramming. Universal template parameters allow for a generic apply and other higher-order template metafunctions, including certain type traits.
Chenxu Niu, Wei Zhang, Suren Byna, Yong Chen, "Kv2vec: A Distributed Representation Method for Key-value Pairs from Metadata Attributes", 2022 IEEE Conference on High Performance Extreme Computing (HPEC), September 19, 2022, doi: 10.1109/HPEC55821.2022.9926389
Venkitesh Ayyar, Robert Knop, Autumn Awbrey, Alexis Andersen, Peter Nugent, "Identifying Transient Candidates in the Dark Energy Survey Using Convolutional Neural Networks", Publications of the Astronomical Society of the Pacific, September 2022, 134:094501,
The ability to discover new transient candidates via image differencing without direct human intervention is an important task in observational astronomy. For these kind of image classification problems, machine learning techniques such as Convolutional Neural Networks (CNNs) have shown remarkable success. In this work, we present the results of an automated transient candidate identification on images with CNNs for an extant data set from the Dark Energy Survey Supernova program, whose main focus was on using Type Ia supernovae for cosmology. By performing an architecture search of CNNs, we identify networks that efficiently select non-artifacts (e.g., supernovae, variable stars, AGN, etc.) from artifacts (image defects, mis-subtractions, etc.), achieving the efficiency of previous work performed with random Forests, without the need to expend any effort in feature identification. The CNNs also help us identify a subset of mislabeled images. Performing a relabeling of the images in this subset, the resulting classification with CNNs is significantly better than previous results, lowering the false positive rate by 27% at a fixed missed detection rate of 0.05.
J. M. Monti, J. T. Clemmer, I. Srivastava, L. E. Silbert, G. S. Grest, J. B. Lechman, "Large-Scale Frictionless Jamming with Power-Law Particle Size Distributions", Physical Review E, September 2, 2022, 106:034901, doi: 10.1103/PhysRevE.106.034901
Alvin Oliver Glova, Yukai Yang, Yiyao Wan, Zhizhou Zhang, George Michelogiannakis, Jonathan Balkind, Timothy Sherwood, "Establishing Cooperative Computation with Hardware Embassies", IEEE International Symposium on Secure and Private Execution Environment Design, September 2022,
"How the Five National Quantum Information Science Research Centers Harness the Quantum Revolution", Hannah Adams, Pete Genzer, Monica Hernandez, Leah Hesla, Scott Jones, Elizabeth Rosenthal, Denise Yazak, August 26, 2022,
M. Zingale, M. P. Katz, A. Nonaka, and M. Rasmussen, "An Improved Method for Coupling Hydrodynamics with Astrophysical Reaction Networks", Astrophysical Journal, August 25, 2022, 936,
"QIS Innovation Across the Growing R&D Ecosystem", Monica Hernandez, Feature, August 25, 2022,
A. P. Santos, I. Srivastava, L. E. Silbert, J. B. Lechman, G. S. Grest, "Fluctuations and power-law scaling of dry, frictionless granular rheology near the hard-particle limit", Physical Review Fluids, August 19, 2022, 7:084303, doi: 10.1103/PhysRevFluids.7.084303
Gregory Wallace, Zhe Bai, Robbie Sadre, Talita Perciano, Nicola Bertelli, Syun'ichi Shiraiwa, Wes Bethel, John Wright, "Towards fast and accurate predictions of radio frequency power deposition and current profile via data-driven modelling: applications to lower hybrid current drive", Journal of Plasma Physics, August 18, 2022, 88:895880401, doi: 10.1017/S0022377822000708
Liou J-Y, Awan M, Hofmeyr S, Forrest S, Wu C-J, "Understanding the Power of Evolutionary Computation for GPU Code Optimization", 2022 IEEE International Symposium on Workload Characterization (IISWC), August 11, 2022, doi: 10.1109/IISWC55918.2022.00025
Ozge Surer, Filomena M. Nunes, Matthew Plumlee, Stefan M. Wild, "Uncertainty Quantification in Breakup Reactions", Physical Review C, 2022, 106:024607, doi: 10.1103/PhysRevC.106.024607
Monica Hernandez, Optimizing SWAP Networks for Quantum Computing, News release, August 4, 2022,
Jean Luca Bez, Suren Byna, April 2019 Darshan counters from the Cori supercomputer [Data set], Zenodo, 2022, doi: 10.5281/zenodo.6476501
Anne M. Felden, Daniel F. Martin, Esmond G. Ng, "SUHMO: an AMR SUbglacial Hydrology MOdel v1.0", Geosci. Model Dev. Discuss., July 27, 2022,
- Download File: gmd-2022-190.pdf (pdf: 5.5 MB)
Yize Chen, Yuanyuan Shi, Daniel Arnold, Sean Peisert, "SAVER: Safe Learning-Based Controller for Real-Time Voltage Regulation", Proceedings of the 2022 IEEE Power Engineering Society (PES) General Meeting, Denver, CO, July 2022,
M.F. Adams, D.P. Brennan, M.G. Knepley, P. Wang, "Landau collision operator in the CUDA programming model applied to thermal quench plasmas", 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 15, 2022, doi: 10.1109/IPDPS53621.2022.00020
- Download File: d9e4ee12-a919-480f-bafe-db3c81602b4d.pdf (pdf: 1.6 MB)
Emily K. Adams, Daniel Gunter, Ryan Kiser, Mark Krenz, Sean Peisert, Susan Sons, John Zage, "Findings of the 2022 Trusted CI Study on the Security of Operational Technology in NSF Scientific Research", Trusted CI Report, July 15, 2022, doi: doi.org/10.5281/zenodo.6828675
"QSA Scientists Participated in ‘QIS For Everyone’ Briefing", Monica Hernandez, Feature, July 13, 2022,
Xiange Wang, Rafael Zamora-Resendiz, Courtney D. Shelley, Carrie Manore, Xinlian Liu, David W. Oslin, Benjamin McMahon, Jean C. Beckham, Nathan A. Kimbrel, Silvia Crivelli, "An examination of the association between altitude and suicide deaths, suicide attempts, and suicidal ideation among veterans at both the patient and geospatial level", Journal of Psychiatric Research, July 11, 2022,
Akel Hashim, Rich Rines, Victory Omole, Ravi K. Naik, John Mark Kreikebaum, David I. Santiago, Frederic T. Chong, Irfan Siddiqi, Pranav Gokhale, "Optimized SWAP networks with equivalent circuit averaging for QAOA", Phys. Rev. Research, 2022, 033028, doi: 10.1103/PhysRevResearch.4.033028
V. Cirigliano, Z. Davoudi, J. Engel, R. J. Furnstahl, G. Hagen, U. Heinz, H. Hergert, M. Horoi, C. W. Johnson, A. Lovato, E. Mereghetti, W. Nazarewicz, A. Nicholson, T. Papenbrock, S. Pastore, M. Plumlee, D. R. Phillips, P. E. Shanahan, S. R. Stroberg, F. Viens, A. Walker-Loud, K. A. Wendt, S. M. Wild, "Towards Precise and Accurate Calculations of Neutrinoless Double-Beta Decay: Project Scoping Workshop Report", 2022, doi: 10.48550/ARXIV.2207.01085
Destinee Morrow, Rafael Zamora-Resendiz, Jean C Beckham, Nathan A Kimbrel, David W Oslin, Suzanne Tamang, Million Veteran Program Suicide Exemplar Workgroup, Silvia Crivelli, "A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes", Journal of Psychiatric Research, July 1, 2022, 151:328-338, doi: 10.1016/j.jpsychires.2022.04.009
L. Jin, A. Lazar, C. Brown, V. Garikapati, B. Sun, S. Ravulaparthy, Q. Chen, A. Sim, K. Wu, T. Wenzel, T. Ho, C. A. Spurlock, "What Makes You Hold onto That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions", Frontiers in Future Transportation, Connected Mobility and Automation, 2022, 3:894654, doi: 10.3389/ffutr.2022.894654
Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son, "Design and implementation of dynamic I/O control scheme for large scale distributed file systems", Cluster Computing, 2022, 25(6):1--16, doi: 10.1007/s10586-022-03640-0
- Download File: wu2022.bib (bib: 22 KB)
R. Han, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, J. Balcas, H. Newman, "Access Trends of In-network Cache for Scientific Data", 5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA), in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534110
J. Bellavita, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, "Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches", 5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534111
R. Shao, J. Kim A. Sim, K. Wu, "Predicting Slow Connections in Scientific Computing", 5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2022, doi: 10.1145/3526064.3534112
J. Kim, M. Cafaro, J. Chou, A. Sim, "SNTA’22: The 5th Workshop on Systems and Network Telemetry and Analytics", In the proceedings of The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'22), 2022, doi: 10.1145/3502181.3535108
Jean Luca Bez, Ahmad Maroof Karimi, Arnab K. Paul, Bing Xie, Suren Byna, Philip Carns, Sarp Oral, Feiyi Wang, Jesse Hanley, "Access Patterns and Performance Behaviors of Multi-layer Supercomputer I/O Subsystems under Production Load", 31st International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC '22), Association for Computing Machinery, June 27, 2022, 43–55, doi: 10.1145/3502181.3531461
Bin Dong, Alex Popescu, Veronica Rodriguez Tribaldos, Suren Byna, Jonathan Ajo-Franklin, Kesheng Wu, "Real-time and post-hoc compression for data from Distributed Acoustic Sensing", Computers \& Geosciences, June 24, 2022, 105181,
- Download File: wu2022.bib (bib: 22 KB)
Jonathan Ajo‐Franklin, Verónica Rodríguez Tribaldos, Avinash Nayak, Feng Cheng, Robert Mellors, Benxin Chi, Todd Wood, Michelle Robertson, Cody Rotermund, Eric Matzel, Dennise C. Templeton, Christina Morency, Kesheng Wu, Bin Dong, Patrick Dobson;, "The Imperial Valley Dark Fiber Project: Toward Seismic Studies Using DAS and Telecom Infrastructure for Geothermal Applications", Seismological Research Letters, June 24, 2022,
Runzhou Han, Suren Byna, Houjun Tang, Bin Dong, and Mai Zheng,, "PROV-IO: An I/O-Centric Provenance Framework for Scientific Data on HPC Systems", HPDC 2022, June 23, 2022,
Srivatsan Chakram, Kevin He, Akash V. Dixit, Andrew E. Oriani, Ravi K. Naik, Nelson Leung, Hyeokshin Kwon, Wen-Long Ma, Liang Jiang, David I. Schuster, "Multimode photon blockade", Nature Physics, 2022, doi: 10.1038/s41567-022-01630-y
D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, W. Arndt, J. Blaschke, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, T. Lehman, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, L. Stephey, R. Thomas, G. Torok, "LBNL Superfacility Project Report", Lawrence Berkeley National Laboratory, 2022, doi: 10.48550/arXiv.2206.11992
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, Kylie Huch, George Michelogiannakis, "Superconducting Digital DIT Butterfly Unit for Fast Fourier Transform Using Race Logic", 2022 20th IEEE Interregional NEWCAS Conference (NEWCAS), IEEE, June 2022, 441-445,
Dan Bonachea, Paul H. Hargrove, An Introduction to GASNet-EX for Chapel Users, 9th Annual Chapel Implementers and Users Workshop (CHIUW 2022), June 10, 2022,
Have you ever typed "export CHPL_COMM=gasnet"? If you’ve used Chapel with multi-locale support on a system without "Cray" in the model name, then you’ve probably used GASNet. Did you ever wonder what GASNet is? What GASNet should mean to you? This talk aims to answer those questions and more. Chapel has system-specific implementations of multi-locale communication for Cray-branded systems including the Cray XC and HPE Cray EX lines. On other systems, Chapel communication uses the GASNet communication library embedded in third-party/gasnet. In this talk, that third-party will introduce itself to you in the first person.
Daniel Arnold, Sy-Toan Ngo, Ciaran Roberts, Yize Chen, Anna Scaglione, Sean Peisert, "Adam-based Augmented Random Search for Control Policies for Distributed Energy Resource Cyber Attack Mitigation", Proceedings of the 2022 American Control Conference (ACC), June 2022,
Hengrui Luo, Younghyun Cho, James W. Demmel, Xiaoye S. Li, Yang Liu, "Hybrid models for mixed variables in Bayesian optimization", June 6, 2022,
K. Ibrahim, L. Oliker,, "Preprocessing Pipeline Optimization for Scientific Deep-Learning Workloads", IPDPS 22, June 3, 2022,
- Download File: SciML-optimization-12.pdf (pdf: 17 MB)
Xiaoxia Zhang, Degang Chen, Hong Yu, Guoyin Wang, Houjun Tang, Kesheng Wu, "Improving nonnegative matrix factorization with advanced graph regularization", Information Sciences, June 1, 2022, 597:125-143, doi: 10.1016/j.ins.2022.03.008
Jean Luca Bez, Suren Byna, Understanding I/O Behavior with Interactive Darshan Log Analysis, Exascale Computing Project (ECP) Community Days BoF, 2022,
Qiang Du, Dan Wang, Tong Zhou, Antonio Gilardi, Mariam Kiran, Bashir Mohammed, Derun Li, and Russell Wilcox, "Experimental beam combining stabilization using machine learning trained while phases drift", Advanced Solid State Lasers 2022, © 2022 Optica Publishing Group, June 1, 2022, Vol. 30,:pp. 12639-, doi: https://doi.org/10.1364/OE.450255
Yujing Ma, Florin Rusu, Kesheng Wu, Alexander Sim, 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Pages: 1088--1097 2022, doi: 10.1109/IPDPSW55747.2022.00177
- Download File: wu2022.bib (bib: 22 KB)
Yang Liu, "A comparative study of butterfly-enhanced direct integral and differential equation solvers for high-frequency electromagnetic analysis involving inhomogeneous dielectrics", May 29, 2022,
- Download File: comparative_study-v2.pdf (pdf: 3.3 MB)
Monica Hernandez, Breakthrough in Quantum Universal Gate Sets: A High-Fidelity iToffoli Gate, News release, May 24, 2022,
J. Kim, M. Jin, Y. Homma, A. Sim, W. Kroeger, K. Wu, "Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific Workflow", arXiv, 2022, doi: 10.48550/arXiv.2205.09703
Huihuo Zheng, Venkatram Vishwanath, Quincey Koziol, Houjun Tang, John Ravi, John Mainzer, Suren Byna, "HDF5 Cache VOL: Efficient and scalable parallel I/O through caching data on node-local storage", 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), May 16, 2022, doi: 10.1109/CCGrid54584.2022.00015
K. Wang, S. Lee, J. Balewski, A. Sim, P. Nugent, A. Agrawal, A. Choudhary, K. Wu, W-K. Liao, "Using Multi-resolution Data to Accelerate Neural Network Training in Scientific Applications", 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2022), 2022, doi: 10.1109/CCGrid54584.2022.00050
M. G. Amankwah, D. Camps, E. W. Bethel, R. Van Beeumen, T. Perciano, "Quantum pixel representations and compression for N-dimensional images", Nature Scientific Reports, May 11, 2022, 12:7712, doi: 10.1038/s41598-022-11024-y
Paul H. Hargrove, Dan Bonachea, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'22)", Poster at Exascale Computing Project (ECP) Annual Meeting 2022, May 5, 2022,
We present UPC++ and GASNet-EX, distributed libraries which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Procedure Call (RPC) and for Remote Memory Access (RMA) to host and GPU memories. The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems.
Yosep Kim, Alexis Morvan, Long B Nguyen, Ravi K Naik, Christian J\ unger, Larry Chen, John Mark Kreikebaum, David I Santiago, Irfan Siddiqi, "High-fidelity three-qubit iToffoli gate for fixed-frequency superconducting qubits", Nature Physics, 2022, 1--6, doi: 10.1038/s41567-022-01590-3
JaeHyuk Kwack, ROOFLINE PERFORMANCE ANALYSIS W/ INTEL ADVISOR ON INTEL CPUS & GPUS, ECP Annual Meeting, May 2022,
- Download File: ECP22-Roofline-4-Intel-and-ALCF.pdf (pdf: 14 MB)
Neil Mehta, Roofline on NVIDIA at NERSC, ECP Annual Meeting, May 2022,
- Download File: ECP22-Roofline-2-NVIDIA-and-NERSC.pdf (pdf: 2.6 MB)
Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, May 2022,
Mark Adams, Satish Balay, Oana Marin, Lois Curfman McInnes, Richard Tran Mills, Todd Munson, Hong Zhang, Junchao Zhang, Jed Brown, Victor Eijkhout, Jacob Faibussowitsch, Matthew Knepley, Fande Kong, Scott Kruger, Patrick Sanan, Barry F. Smith, Hong Zhang, "The PETSc Community as Infrastructure", May 1, 2022, 24, doi: 10.1109/MCSE.2022.3169974
- Download File: PetscInfrusturcure.pdf (pdf: 1.3 MB)
The communities that develop and support open-source scientific software packages are crucial to the utility and success of such packages. Moreover, they form an important part of the human infrastructure that enables scientific progress. This article discusses aspects of the Portable Extensible Toolkit for Scientific Computation community, its organization, and technical approaches that enable community members to help each other efficiently and effectively.
B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, "Enhancing IoT Anomaly Detection Performance for Federated Learning", Digital Communications and Networks, Special Issue on Edge Computation and Intelligence, 2022, doi: 10.1016/j.dcan.2022.02.007
Maximilian Bremer, John Bachan, Cy Chan, Clint Dawson, "Adaptive total variation stable local timestepping for conservation laws", Journal of Computational Physics, April 21, 2022,
"Inspiring High Schoolers to Learn Quantum Computing", Monica Hernandez, Feature, April 14, 2022,
"Meet QSA’s Early-Career Researchers Advancing the QIS Frontier", Monica Hernandez, Feature, April 14, 2022,
"AQT-Zurich Instruments Partnership Enables Groundbreaking Quantum Information Science", Monica Hernandez, Feature, April 14, 2022,
Monica Hernandez, "Advanced Quantum Testbed 2021 Progress Report", Progress Report, April 14, 2022,
S. Zhang, R. Sadre, B. A. Legg, H. Pyles, T. Perciano, E. W. Bethel, D. Baker, O. Rübel, J. J. D. Yoreo, "Rotational dynamics and transition mechanisms of surface-adsorbed proteins", Proceedings of the National Academy of Sciences, April 11, 2022, 119:e202024211, doi: 10.1073/pnas.2020242119
Lipeng Wan, Axel Huebl, Junmin Gu, Franz Poeschel, Ana Gainaru, Ruonan Wang, Jieyang Chen, Xin Liang, Dmitry Ganyushin, Todd Munson, Ian Foster, Jean-Luc Vay, Norbert Podhorszki, Kesheng Wu, Scott Klasky, "Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization", IEEE Transactions on Parallel and Distributed Systems, 2022, 33:878-890, doi: 10.1109/TPDS.2021.3100784
Houjun Tang, Quincey Koziol, John Ravi, and Suren Byna,, "Transparent Asynchronous Parallel I/O using Background Threads", IEEE Transactions on Parallel and Distributed Systems, April 4, 2022, 33, doi: 10.1109/TPDS.2021.3090322
Meyer F, Fritz A, Deng Z-L, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh H-J, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC, "Critical Assessment of Metagenome Interpretation: the second round of challenges", Nature Methods, April 1, 2022, doi: 10.1038/S41592-022-01431-4
W. D. Fullmer, R. Porcu, J. Musser, A. S. Almgren, I. Srivastava, "The Divergence of Nearby Trajectories in Soft-Sphere DEM", Particuology, April 1, 2022, 63:1 - 8, doi: 10.1016/j.partic.2021.06.008
M. Avaylon, R. Sadre, Z. Bai, T. Perciano, "Adaptable Deep Learning and Probabilistic Graphical Model System for Semantic Segmentation", Advances in Artificial Intelligence and Machine Learnin, March 31, 2022, 2:288--302, doi: 10.54364/AAIML.2022.1119
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001452, doi: 10.25344/S4530J
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
J. Bell, A. Nonaka, A. L. Garcia, G. Eyink, "Thermal Fluctuations in the Dissipation Range of Homogeneous Isotropic Turbulence", J. Fluid Mech, March 24, 2022, 939,
Adrián P. Diéguez, Margarita Amor, Ramón Doallo, Akira Nukada, Satoshi Matsuoka, "Efficient high-precision integer multiplication on the GPU", The International Journal of High Performance Computing Applications, March 2022, 36:356-369, doi: 10.1177/10943420221077964
A. Sim, E. Kissel, C. Guok, "Deploying in-network caches in support of distributed scientific data sharing", arXiv whitepaper, 2022, doi: /10.48550/arXiv.2203.06843
George Michelogiannakis, Madeleine Glick, John Shalf, Keren Bergman, Photonics as a Means to Implement Intra-rack Resource Disaggregation, SPIE photonics west, March 2022,
Samuel B. Kachuck, Morgan Whitcomb, Jeremy N. Bassis, Daniel F. Martin, Stephen F. Price, "Simulating ice-shelf extent using damage mechanics", Journal of Glaciology, March 7, 2022, 68(271):987-998, doi: 10.1017/jog.2022.12
George Michelogiannakis, Madeleine Glick, John Shalf, Keren Bergman, "Photonics as a means to implement intra-rack resource disaggregation", Proceedings Volume 12027, Metro and Data Center Optical Networks and Short-Reach Links V, March 2022, doi: https://doi.org/10.1117/12.2607317
Paolo Calafiura and others, Artificial Intelligence for High Energy Physics, edited by Paolo Calafiura, David Rousseau, Kazuhiro Terao, (World Scientific: March 1, 2022) doi: 10.1142/12200
Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, George Michelogiannakis, Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators, 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), February 2022,
- Download File: asplos2022-presentation.pdf (pdf: 1.7 MB)
Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, George Michelogiannakis, "Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators", 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22), ACM, February 2022,
- Download File: asplos2022.pdf (pdf: 1.9 MB)
Monica Hernandez, Joe Chew, Open Sourced Control Hardware for Quantum Computers, News release, February 24, 2022,
X. Zhu, Y. Liu, P. Ghysels, D. Bindal, X. S. Li, "GPTuneBand: multi-task and multi-fidelity Bayesian optimization for autotuning large-scale high performance computing applications", SIAM PP, February 23, 2022,
- Download File: GPTuneBand.pdf (pdf: 1.4 MB)
"QSA’s Science Breakthroughs in 2021", Monica Hernandez, Feature, February 17, 2022,
Jean Luca Bez, Towards Understanding I/O Behavior with Interactive Exploration, Berkeley Lab’s Computing Sciences Area 2022 Postdoc Symposium, 2022,
George Michelogiannakis, Benjamin Klenk, Brandon Cook, Min Yee Teh, Madeleine Glick, Larry Dennison, Keren Bergman, John Shalf, "A Case For Intra-Rack Resource Disaggregation in HPC", ACM Transactions on Architecture and Code Optimization, February 2022,
Aleksandra Ciprijanovic, Diana Kafkes, Gregory Snyder, F. Javier Sanchez, Gabriel Nathan Perdue, Kevin Pedro, Brian Nord, Sandeep Madireddy, Stefan M. Wild, "DeepAdversaries: Examining the Robustness of Deep Learning Models for Galaxy Morphology Classification", Machine Learning: Science and Technology, 2022, 3:035007, doi: 10.1088/2632-2153/ac7f1a
Hannah Klion, Alexander Tchekhovskoy, Daniel Kasen, Adithan Kathirgamaraju, Eliot Quataert, Rodrigo Fernandez, "The impact of r-process heating on the dynamics of neutron star merger accretion disc winds and their electromagnetic radiation", Monthly Notices of the RAS, 2022, 510:2968-2979, doi: 10.1093/mnras/stab3583
John Wu, Ben Brown, Paolo Calafiura, Quincey Koziol, Dongeun Lee, Alex Sim, Devesh Tiwari, Support for In-Flight Data Analyses in Scientific Workflows, DOE ASCR Workshop on the Management and Storage of Scientific Data, 2022, doi: 10.2172/1843500
John Wu, Bin Dong, Alex Sim, Automating Data Management Through Unified Runtime Systems, DOE ASCR Workshop on the Management and Storage of Scientific Data, 2022, doi: 10.2172/1843500
A. Pereira, A. Sim, K. Wu, S. Yoo, H. Ito, "Data access pattern analysis for dCache storage system", International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022), 2022,
Ling Jin, Alina Lazar, Caitlin Brown, Bingrong Sun, Venu Garikapati, Srinath Ravulaparthy, Qianmiao Chen, Alexander Sim, Kesheng Wu, Tin Ho, Thomas Wenzel, C. Anna Spurlock, What Makes You Hold on to That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions, Transportation Research Board 101st Annual Meeting, 2022,
- Download File: wu2022.bib (bib: 22 KB)
Z. Yao, R. Jambunathan, Y. Zeng, and A. Nonaka, "A Massively Parallel Time-Domain Coupled Electrodynamics-Micromagnetics Solver", International Journal of High Performance Computing Applications, January 10, 2022, accepted,
J. V. Pusztay, M. G. Knepley, and M. F. Adams, "Conservative Projection Between FEM and Particle Bases", SIAM Journal on Scientific Computing, January 1, 2022, doi: https://doi.org/10.1137/21M145407
- Download File: ffce2dc7-07bf-41ec-b97c-7971797b7cc5.pdf (pdf: 782 KB)
Stephen Hudson, Jeffrey Larson, John-Luke Navarro, Stefan M. Wild, "libEnsemble: A Library to Coordinate the Concurrent Evaluation of Dynamic Ensembles of Calculations", IEEE Transactions on Parallel and Distributed Systems, 2022, 33:977--988, doi: 10.1109/TPDS.2021.3082815
Bin Dong, Kesheng Wu, Suren Byna, User-Defined Tensor Data Analysis, SpringerBrief, (January 1, 2022)
Alina Lazar, others, Accelerating the Inference of the Exa.TrkX Pipeline, 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded - Towards Sustainable, Diverse, Performant and Effective Scientific Computing, 2022,
Chun-Yi Wang, others, Reconstruction of Large Radius Tracks with the Exa.TrkX pipeline, 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded - Towards Sustainable, Diverse, Performant and Effective Scientific Computing, 2022,
Sunanda Banerjee, others, Detector and Beamline Simulation for Next-Generation High Energy Physics Experiments, 2022 Snowmass Summer Study, 2022,
Meghna Bhattacharya, others, Portability: A Necessary Approach for Future Scientific Software, 2022 Snowmass Summer Study, 2022,
Christopher D. Jones, Kyle Knoepfel, Paolo Calafiura, Charles Leggett, Vakhtang Tsulaia, Evolution of HEP Processing Frameworks, 2022 Snowmass Summer Study, 2022,
Savannah Thais, Paolo Calafiura, Grigorios Chachamis, Gage DeZoort, Javier Duarte, Sanmay Ganguly, Michael Kagan, Daniel Murnane, Mark S. Neubauer, Kazuhiro Terao, Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges, 2022 Snowmass Summer Study, 2022,
E. Wes Bethel, Burlen Loring, Utkarsh Ayachit, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, Dave Pugmire, Silvio Rizzi, Thompson, Will Usher, Gunther H. Weber, Brad Whitlock, Wolf, Kesheng Wu, "Proximity Portability and In Transit, M-to-N Data Partitioning and Movement in SENSEI", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_20
E. Wes Bethel, Burlen Loring, Utkarsh Ayatchit, David Camp, P. N. Duque, Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, David Pugmire, Silvio Rizzi, Thompson, Gunther H. Weber, Brad Whitlock, Matthew Wolf, Kesheng Wu, "The SENSEI Generic In Situ Interface: Tool and Processing Portability at Scale", In Situ Visualization for Computational Science, ( 2022) doi: 10.1007/978-3-030-81627-8_13
Sugeerth Murugesan, Mariam Kiran, Bernd Hamann, Gunther H. Weber, "Netostat: Analyzing Dynamic Flow Patterns in High-Speed Networks", Cluster Computing, 2022, doi: 10.1007/s10586-022-03543-0
H Weierbach, AR Lima, JD Willard, VC Hendrix, DS Christianson, M Lubich, C Varadharajan, Stream Temperature Predictions for River Basin Management in the Pacific Northwest and Mid-Atlantic Regions Using Machine Learning, Water (Switzerland), 2022, doi: 10.3390/w14071032
M Galloway, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, BeyondPlanck III. Commander3, 2022,
M Galloway, M Reinecke, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, BeyondPlanck VIII. Efficient Sidelobe Convolution and Correction through Spin Harmonics, 2022,
TL Svalheim, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, M Galloway, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, A Zonca, BeyondPlanck X. Bandpass and beam leakage corrections, 2022,
D Herman, B Hensley, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, M Galloway, S Gerakakis, E Gjerløw, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, BeyondPlanck XVI. Limits on Large-Scale Polarized Anomalous Microwave Emission from Planck LFI and WMAP, 2022,
KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, M Galloway, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, M Tomasi, DJ Watts, IK Wehus, A Zacchei, BeyondPlanck XIV. Intensity foreground sampling, degeneracies and priors, 2022,
L Collaboration, E Allys, K Arnold, J Aumont, R Aurlien, S Azzoni, C Baccigalupi, AJ Banday, R Banerji, RB Barreiro, N Bartolo, L Bautista, D Beck, S Beckman, M Bersanelli, F Boulanger, M Brilenkov, M Bucher, E Calabrese, P Campeti, A Carones, FJ Casas, A Catalano, V Chan, K Cheung, Y Chinone, SE Clark, F Columbro, G D Alessandro, PD Bernardis, TD Haan, EDL Hoz, MD Petris, SD Torre, P Diego-Palazuelos, T Dotani, JM Duval, T Elleflot, HK Eriksen, J Errard, T Essinger-Hileman, F Finelli, R Flauger, C Franceschet, U Fuskeland, M Galloway, K Ganga, M Gerbino, M Gervasi, RT Génova-Santos, T Ghigna, S Giardiello, E Gjerløw, J Grain, F Grupp, A Gruppuso, JE Gudmundsson, NW Halverson, P Hargrave, T Hasebe, M Hasegawa, M Hazumi, S Henrot-Versillé, B Hensley, LT Hergt, D Herman, E Hivon, RA Hlozek, AL Hornsby, Y Hoshino, J Hubmayr, K Ichiki, T Iida, H Imada, H Ishino, G Jaehnig, N Katayama, A Kato, R Keskitalo, T Kisner, Y Kobayashi, A Kogut, K Kohri, E Komatsu, K Komatsu, K Konishi, N Krachmalnicoff, CL Kuo, L Lamagna, M Lattanzi, AT Lee, C Leloup, F Levrier, E Linder, G Luzzi, J Macias-Perez, B Maffei, D Maino, S Mandelli, E Martínez-González, S Masi, M Massa, S Matarrese, FT Matsuda, T Matsumura, L Mele, M Migliaccio, Y Minami, A Moggi, J Montgomery, L Montier, G Morgante, B Mot, Y Nagano, T Nagasaki, R Nagata, R Nakano, T Namikawa, F Nati, P Natoli, S Nerval, F Noviello, K Odagiri, S Oguri, H Ohsaki, L Pagano, A Paiella, D Paoletti, A Passerini, G Patanchon, F Piacentini, M Piat, G Polenta, D Poletti, T Prouvé, G Puglisi, D Rambaud, C Raum, S Realini, M Reinecke, M Remazeilles, A Ritacco, G Roudil, JA Rubino-Martin, M Russell, H Sakurai, Y Sakurai, M Sasaki, D Scott, Y Sekimoto, K Shinozaki, M Shiraishi, P Shirron, G Signorelli, F Spinella, S Stever, R Stompor, S Sugiyama, RM Sullivan, A Suzuki, TL Svalheim, E Switzer, R Takaku, H Takakura, Y Takase, A Tartari, Y Terao, J Thermeau, H Thommesen, KL Thompson, M Tomasi, M Tominaga, M Tristram, M Tsuji, M Tsujimoto, L Vacher, P Vielva, N Vittorio, W Wang, K Watanuki, IK Wehus, J Weller, B Westbrook, J Wilms, EJ Wollack, J Yumoto, M Zannoni, Probing Cosmic Inflation with the LiteBIRD Cosmic Microwave Background Polarization Survey, 2022,
DJ Watts, M Galloway, HT Ihle, KJ Andersen, R Aurlien, R Banerji, A Basyrov, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, JR Eskilt, MK Foss, C Franceschet, U Fuskeland, S Galeotta, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, JB Jewell, A Karakci, E Keihänen, R Keskitalo, JGS Lunde, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, M San, NO Stutzer, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, IK Wehus, A Zacchei, From BeyondPlanck to Cosmoglobe: Preliminary WMAP Q-band analysis, 2022,
P Diego-Palazuelos, JR Eskilt, Y Minami, M Tristram, RM Sullivan, AJ Banday, RB Barreiro, HK Eriksen, KM Górski, R Keskitalo, E Komatsu, E Martínez-González, D Scott, P Vielva, IK Wehus, "Cosmic Birefringence from the Planck Data Release 4", Physical review letters, 2022, 128:091302, doi: 10.1103/physrevlett.128.091302
C Varadharajan, AP Appling, B Arora, DS Christianson, VC Hendrix, V Kumar, AR Lima, J Müller, S Oliver, M Ombadi, T Perciano, JM Sadler, H Weierbach, JD Willard, Z Xu, J Zwart, "Can machine learning accelerate process understanding and decision-relevant predictions of river water quality?", Hydrological Processes, January 1, 2022, 36, doi: 10.1002/hyp.14565
S. Dhawan, A. Goobar, M. Smith, J. Johansson, M. Rigault, J. Nordin, R. Biswas, D. Goldstein, P. Nugent, Y. -L. Kim, A. A. Miller, M. J. Graham, M. Medford, M. M. Kasliwal, S. R. Kulkarni, Dmitry A. Duev, E. Bellm, P. Rosnet, R. Riddle, J. Sollerman, The Zwicky Transient Facility Type Ia supernova survey: first data release and results, Monthly Notices of the RAS, Pages: 2228-2241 2022, doi: 10.1093/mnras/stab3093
Yuan Qi Ni, Dae-Sik Moon, Maria R. Drout, Abigail Polin, David J. Sand, Santiago Gonz\ alez-Gait\ an, Sang Chul Kim, Youngdae Lee, Hong Soo Park, D. Andrew Howell, Peter E. Nugent, Anthony L. Piro, Peter J. Brown, Llu\ \is Galbany, Jamison Burke, Daichi Hiramatsu, Griffin Hosseinzadeh, Stefano Valenti, Niloufar Afsariardchi, Jennifer E. Andrews, John Antoniadis, Iair Arcavi, Rachael L. Beaton, K. Azalee Bostroem, Raymond G. Carlberg, S. Bradley Cenko, Sang-Mok Cha, Yize Dong, Avishay Gal-Yam, Joshua Haislip, Thomas W. -S. Holoien, Sean D. Johnson, Vladimir Kouprianov, Yongseok Lee, Christopher D. Matzner, Nidia Morrell, Curtis McCully, Giuliano Pignata, Daniel E. Reichart, Jeffrey Rich, Stuart D. Ryder, Nathan Smith, Samuel Wyatt, Sheng Yang, Infant-phase reddening by surface Fe-peak elements in a normal type Ia supernova, Nature Astronomy, 2022, doi: 10.1038/s41550-022-01603-4
Melissa L. Graham, Christoffer Fremling, Daniel A. Perley, Rahul Biswas, Christopher A. Phillips, Jesper Sollerman, Peter E. Nugent, Sarafina Nance, Suhail Dhawan, Jakob Nordin, Ariel Goobar, Adam Miller, James D. Neill, Xander J. Hall, Matthew J. Hankins, Dmitry A. Duev, Mansi M. Kasliwal, Mickael Rigault, Eric C. Bellm, David Hale, Przemek Mr\ oz, S. R. Kulkarni, Supernova siblings and their parent galaxies in the Zwicky Transient Facility Bright Transient Survey, Monthly Notices of the RAS, Pages: 241-254 2022, doi: 10.1093/mnras/stab3802
MB Simmonds, WJ Riley, DA Agarwal, X Chen, S Cholia, R Crystal-Ornelas, ET Coon, D Dwivedi, VC Hendrix, M Huang, A Jan, Z Kakalia, J Kumar, CD Koven, L Li, M Melara, L Ramakrishnan, DM Ricciuto, AP Walker, W Zhi, Q Zhu, C Varadharajan, Guidelines for Publicly Archiving Terrestrial Model Data to Enhance Usability, Intercomparison, and Synthesis, Data Science Journal, 2022, doi: 10.5334/dsj-2022-003
C Varadharajan, VC Hendrix, DS Christianson, M Burrus, C Wong, SS Hubbard, DA Agarwal, BASIN-3D: A brokering framework to integrate diverse environmental data, Computers and Geosciences, 2022, doi: 10.1016/j.cageo.2021.105024
B Faybishenko, R Versteeg, G Pastorello, D Dwivedi, C Varadharajan, D Agarwal, Challenging problems of quality assurance and quality control (QA/QC) of meteorological time series data, Stochastic Environmental Research and Risk Assessment, Pages: 1049--1062 2022, doi: 10.1007/s00477-021-02106-w
Sean Peisert, Unsafe at Any Clock Speed: the Insecurity of Computer System Design, Implementation, and Operation [From the Editors], IEEE Security & Privacy, Pages: 4-9 January 2022, doi: 10.0.4.85/MSEC.2021.3127086
Hengjie Wang, Robert Planas, Aparna Chandramowlishwaran, Ramin Bostanabad, "Mosaic flows: A transferable deep learning framework for solving PDEs on unseen domains", Computer Methods in Applied Mechanics and Engineering, 2022, 389:114424,
F Molz, B Faybishenko, D Agarwal, A broad exploration of nonlinear dynamics in microbial systems motivated by chemostat experiments producing deterministic chaos., 2022,
2021
Zhe Bai, Liqian Peng, "Non-intrusive nonlinear model reduction via machine learning approximations to low-dimensional operators", Advanced Modeling and Simulation in Engineering Sciences, 2021, 8:28, doi: 10.1186/s40323-021-00213-5
Melanie E. Moses, Steven Hofmeyr, Judy L Cannon, Akil Andrews, Rebekah Gridley, Monica Hinga, Kirtus Leyba, Abigail Pribisova, Vanessa Surjadidjaja, Humayra Tasnim, Stephanie Forrest, "Spatially distributed infection increases viral load in a computational model of SARS-CoV-2 lung infection", PLOS Computational Biology, December 2021, 17(12), doi: 10.1371/journal.pcbi.1009735
J. T. Clemmer, I. Srivastava, G. S. Grest, J. B. Lechman, "Shear is Not Always Simple: Rate-Dependent Effects of Loading Geometry on Granular Rheology", Physical Review Letters, December 22, 2021, 127:268003, doi: 10.1103/PhysRevLett.127.268003
"Advancing Quantum Engineering: A Must-Do for Quantum Computing", Monica Hernandez, Feature, December 20, 2021,
Y. Cho, J. W. Demmel, X. S. Li, Y. Liu, H. Luo, "Enhancing autotuning capability with a history database", IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), December 20, 2021,
- Download File: GPTuneHistoryDB.pdf (pdf: 390 KB)
Qiao Kang, Scot Breitenfeld, Kaiyuan Hou, Wei-keng Liao, Robert Ross, and Suren Byna,, "Optimizing Performance of Parallel I/O Accesses to Non-contiguous Blocks in Multiple Array Variables", IEEE BigData 2021 conference, December 19, 2021,
I. Srivastava, L. E. Silbert, J. B. Lechman, G. S. Grest, "Flow and Arrest in Stressed Granular Materials", Soft Matter, December 17, 2021, doi: 10.1039/D1SM01344K
J. Bang, C. Kim, K. Wu, A. Sim, S. Byna, H. Sung, H. Eom, "An In-Depth I/O Pattern Analysis in HPC Systems", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021, doi: 10.1109/HiPC53243.2021.00056
S. Lee, Q. Kang, K. Wang, J. Balewski, A. Sim, A. Agrawal, A. Choudhary, P. Nugent, K. Wu, W-K. Liao, "Asynchronous I/O Strategy for Large-Scale Deep Learning Applications", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021, doi: 10.1109/HiPC53243.2021.00046
Samuel Benjamin Kachuck, Morgan Whitcomb, Jeremy N Bassis, Daniel F Martin, and Stephen F Price,, "When are (simulations of) ice shelves stable? Stabilizing forces in fracture-permitting models", AGU Fall Meeting, December 16, 2021,
Daniel F. Martin, Stephen L. Cornford, Esmond G. Ng, Impact of Improved Bedrock Geometry and Basal Friction Relations on Antarctic Vulnerability to Regional Ice Shelf Collapse, Americal Geophysical Union Fall Meeting, December 15, 2021,
A. Lazar, L. Jin, C. Brown, C. A. Spurlock, A. Sim, K. Wu, "Performance of the Gold Standard and Machine Learning in Predicting Vehicle Transactions", the 3rd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD 2021), 2021, doi: 10.1109/BigData52589.2021.9671286
"How the Advanced Quantum Testbed Prepares the New Quantum Workforce", Monica Hernandez, Feature, December 14, 2021,
Andrew Adams, Kay Avila, Elisa Heymann, Mark Krenz, Jason R. Lee, Barton Miller, Sean Peisert, "Guide to Securing Scientific Software", Trusted CI Report, December 14, 2021, doi: 10.5281/zenodo.5777646
James R. Clavin, Yue Huang, Xin Wang, Pradeep M. Prakash, Sisi Duan, Jianwu Wang, Sean Peisert, "A Framework for Evaluating BFT", Proceedings of the IEEE International Conference on Parallel and Distributed Systems (ICPADS), IEEE, December 2021,
Courtney Shafer, Daniel F Martin and Esmond G Ng, "Comparing the Shallow-Shelf and L1L2 Approximations using BISICLES in the Context of MISMIP+ with Buttressing Effects", AGU Fall Meeting, December 13, 2021,
Anne M. Felden, Daniel F. Martin, Esmond G. Ng, SUHMO: An SUbglacial Hydrology MOdel based on the Chombo AMR framework, American Geophysical Union Fall Meeting, December 13, 2021,
Ammar Haydari, Michael Zhang, Chen-Nee Chuah, Jane Macfarlane, Sean Peisert, Adaptive Differential Privacy Mechanism for Aggregated Mobility Dataset, arXiv preprint arXiv:2112.08487, December 10, 2021,
Monica Hernandez, Crucial Leap in Error Mitigation for Quantum Computers, News release, December 9, 2021,
Shen Sheng, Mariam Kiran, Bashir Mohammed, "DynamicDeepFlow: An Approach for Identifying Changes in Network Traffic Flow Using Unsupervised Clustering", (BEST PAPER) 4th International Conference on Machine Learning for Networking (MLN'2021), December 6, 2021,
R. Mills, M.F. Adams, S. Balay, J. Brown, A. Dener, M. Knepley, S. Kruger, H. Morgan, T. Munson, K. Rupp, B. Smith, S. Zampini, H. Zhang, J. Zhang, Junchao, "Toward performance-portable PETSc for GPU-based exascale systems", Parallel Computing, December 1, 2021, 108, doi: 10.1016/j.parco.2021.102831
The Portable Extensible Toolkit for Scientific computation (PETSc) library delivers scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization. The PETSc design for performance portability addresses fundamental GPU accelerator challenges and stresses flexibility and extensibility by separating the programming model used by the application from that used by the library, and it enables application developers to use their preferred programming model, such as Kokkos, RAJA, SYCL, HIP, CUDA, or OpenCL, on upcoming exascale systems. A blueprint for using GPUs from PETSc-based codes is provided, and case studies emphasize the flexibility and high performance achieved on current GPU-based systems.
Andrew Myers, Ann Almgren, Diana Almorim, John Bell, Luca Fedeli, Lixin Ge, Kevin Gott, David Grote, Mark Hogan, Axel Huebl, Revathi Jambunathan, Remi Lehe, Cho Ng, Michael Rowan, Olga Shapoval, Maxence Thevenet, Jean-Luc Vay, Henri Vincenti, Eloise Yang, Neil Zaim, Weiqun Zhang, Yin Zhao, Edoardo Zoni, "Porting WarpX to GPU-accelerated platforms", Parallel Computing, December 1, 2021,
Yize Chen, Yuanyuan Shi, Daniel Arnold, Sean Peisert, SAVER: Safe Learning-Based Controller for Real-Time Voltage Regulation, arXiv preprint arXiv:2111.15152,, November 30, 2021,
Luca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin, Sean Peisert, W. Bradley Holtz, Anil Aswani, Dipankar Dwivedi, Haruko Wainwright, Ghanshyam Pilania, Benjamin Nachman, Babetta L. Marrone, Nicola Falco, Prabhat, Daniel Arnold, Alejandro Wolf-Yadlin, Sarah Powers, Sharlee Climer, Quinn Jackson, Ty Carlson, Michael Sohn, Petrus Zwart, Neeraj Kumar, Amy Justice, Claire Tomlin, Daniel Jacobson, Gos Micklem, Georgios V. Gkoutos, Peter J. Bickel, Jean-Baptiste Cazier, Juliane Müller, Bobbie-Jo Webb-Robertson, Rick Stevens, Mark Anderson, Ken Kreutz-Delgado, Michael W. Mahoney, James B. Brown,, Learning from Learning Machines: a New Generation of AI Technology to Meet the Needs of Science, arXiv preprint arXiv:2111.13786, November 27, 2021,
André Ramos Carneiro, Jean Luca Bez, Carla Osthoff, Lucas Mello Schnorr, Phillipe Olivier Alexandre Navaux, "HPC Data Storage at a Glance: The Santos Dumont Experience", IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, November 26, 2021, 157-166, doi: 10.1109/SBAC-PAD53543.2021.00027
Akel Hashim, Ravi K. Naik, Alexis Morvan, Jean-Loup Ville, Bradley Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin P. O Brien, Ian Hincks, Joel J. Wallman, Joseph Emerson, Irfan Siddiqi, "Randomized Compiling for Scalable Quantum Computing on a Noisy Superconducting Quantum Processor", Physical Review X, 2021, 11:041039, doi: 10.1103/PhysRevX.11.041039
Sachin Kadam, Anna Scaglione, Nikhil Ravi, Sean Peisert, Brent Lunghino, Aram Shumavon, Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets, arXiv preprint arXiv:2111.11661, November 23, 2021,
Nikhil Ravi, Anna Scaglione, Sachin Kadam, Reinhard Gentz, Sean Peisert, Brent Lunghino, Emmanuel Levijarvi, Aram Shumavon, Differentially Private K-means Clustering Applied to Meter Data Analysis and Synthesis, arXiv preprint arXiv:2112.03801, November 23, 2021,
Wei Zhang, Suren Byna, Hyogi Sim, Sangkeun Lee, Sudharshan Vazhkudai, and Yong Chen,, "Exploiting User Activeness for Data Retention in HPC Systems", International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21), November 21, 2021, doi: https://doi.org/10.1145/3458817.3476201
- Download File: 3458817.3476201-2.pdf (pdf: 1.5 MB)
Cong Xu, Suparna Bhattacharya, Martin Foltin, Suren Byna, and Paolo Faraboschi, "Data-Aware Storage Tiering for Deep Learning", 6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21, November 21, 2021,
Houjun Tang, Bing Xie, Suren Byna, Phillip Carns, Quincey Koziol, Sudarsun Kannan, Jay Lofstead, and Sarp Oral,, "SCTuner: An Auto-tuner Addressing Dynamic I/O Needs on Supercomputer I/O Sub-systems", 6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21, November 21, 2021,
Daniel Waters, Colin A. MacLean, Dan Bonachea, Paul H. Hargrove, "Demonstrating UPC++/Kokkos Interoperability in a Heat Conduction Simulation (Extended Abstract)", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S4630V
We describe the replacement of MPI with UPC++ in an existing Kokkos code that simulates heat conduction within a rectangular 3D object, as well as an analysis of the new code’s performance on CUDA accelerators. The key challenges were packing the halos in Kokkos data structures in a way that allowed for UPC++ remote memory access, and streamlining synchronization costs. Additional UPC++ abstractions used included global pointers, distributed objects, remote procedure calls, and futures. We also make use of the device allocator concept to facilitate data management in memory with unique properties, such as GPUs. Our results demonstrate that despite the algorithm’s good semantic match to message passing abstractions, straightforward modifications to use UPC++ communication deliver vastly improved performance and scalability in the common case. We find the one-sided UPC++ version written in a natural way exhibits good performance, whereas the message-passing version written in a straightforward way exhibits performance anomalies. We argue this represents a productivity benefit for one-sided communication models.
Kenneth Rudinger, Craig W Hogle, Ravi K Naik, Akel Hashim, Daniel Lobser, David I Santiago, Matthew D Grace, Erik Nielsen, Timothy Proctor, Stefan Seritan, others, "Experimental Characterization of Crosstalk Errors with Simultaneous Gate Set Tomography", PRX Quantum, 2021, 2:040338, doi: 10.1103/PRXQuantum.2.040338
Dan Gunter, Oluwamayowa Amusat, Tim Bartholomew, Markus Drouven, "Santa Barbara Desalination Digital Twin Technical Report", LBNL Technical Report, 2021, LBNL LBNL-2001437,
Franz Poeschel, Juncheng E, William F. Godoy, Norbert Podhorszki, Scott Klasky, Greg Eisenhauer, Philip E. Davis, Lipeng Wan, Ana Gainaru, Junmin Gu, Fabian Koller, René Widera, Michael Bussmann, Axel Huebl, "Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2", Smoky Mountains Computational Sciences and Engineering Conference (SMC2021), 2021,
Amir Kamil, Dan Bonachea, "Optimization of Asynchronous Communication Operations through Eager Notifications", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S42C71
UPC++ is a C++ library implementing the Asynchronous Partitioned Global Address Space (APGAS) model. We propose an enhancement to the completion mechanisms of UPC++ used to synchronize communication operations that is designed to reduce overhead for on-node operations. Our enhancement permits eager delivery of completion notification in cases where the data transfer semantics of an operation happen to complete synchronously, for example due to the use of shared-memory bypass. This semantic relaxation allows removing significant overhead from the critical path of the implementation in such cases. We evaluate our results on three different representative systems using a combination of microbenchmarks and five variations of the the HPCChallenge RandomAccess benchmark implemented in UPC++ and run on a single node to accentuate the impact of locality. We find that in RMA versions of the benchmark written in a straightforward manner (without manually optimizing for locality), the new eager notification mode can provide up to a 25% speedup when synchronizing with promises and up to a 13.5x speedup when synchronizing with conjoined futures. We also evaluate our results using a graph matching application written with UPC++ RMA communication, where we measure overall speedups of as much as 11% in single-node runs of the unmodified application code, due to our transparent enhancements.
J. Cheung, A. Sim, J. Kim, K. Wu, "Performance Prediction of Large Data Transfers", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), ACM Student Research Competition (SRC), 2021,
Paul H. Hargrove, Dan Bonachea, Colin A. MacLean, Daniel Waters, "GASNet-EX Memory Kinds: Support for Device Memory in PGAS Programming Models", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21) Research Poster, November 2021, doi: 10.25344/S4P306
Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work includes two major components: UPC++ (a C++ template library) and GASNet-EX (a portable, high-performance communication library). This poster describes recent advances in GASNet-EX to efficiently implement Remote Memory Access (RMA) operations to and from memory on accelerator devices such as GPUs. Performance is illustrated via benchmark results from UPC++ and the Legion programming system, both using GASNet-EX as their communications library.
Jean Luca Bez, Visualizing Darshan Extended Traces, Analyzing Parallel I/O (BoF) is held in conjunction with SC21, 2021,
V. Dumont, C. Garner, A. Trivedi, C. Jones, V. Ganapati, J. Mueller, T. Perciano, M. Kiran, and M. Day, "HYPPO: A Surrogate-Based Multi-Level Parallelism Tool for Hyperparameter Optimization", 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), November 15, 2021,
Nikhil Ravi, Anna Scaglione, Sean Peisert, Colored Noise Mechanism for Differentially Private Clustering, arXiv preprint arXiv:2111.07850, November 15, 2021,
Bashir Mohammed, Mariam Kiran, Bjoern Enders, "NetGraf: An End-to-End Learning Network Monitoring Service", 2021 IEEE Workshop on Innovating the Network for Data-Intensive Science (INDIS), November 15, 2021, doi: 10.1109/INDIS54524.2021.00007
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, "Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows", 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), November 15, 2021, doi: 10.1109/WORKS54523.2021.00014
Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.
Katherine A. Yelick, Amir Kamil, Damian Rouson, Dan Bonachea, Paul H. Hargrove, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications (SC21), Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), November 15, 2021,
UPC++ is a C++ library supporting Partitioned Global Address Space (PGAS) programming. UPC++ offers low-overhead one-sided Remote Memory Access (RMA) and Remote Procedure Calls (RPC), along with future/promise-based asynchrony to express dependencies between computation and asynchronous data movement. UPC++ supports simple/regular data structures as well as more elaborate distributed applications where communication is fine-grained and/or irregular. UPC++ provides a uniform abstraction for one-sided RMA between host and GPU/accelerator memories anywhere in the system. UPC++'s support for aggressive asynchrony enables applications to effectively overlap communication and reduce latency stalls, while the underlying GASNet-EX communication library delivers efficient low-overhead RMA/RPC on HPC networks.
This tutorial introduces UPC++, covering the memory and execution models and basic algorithm implementations. Participants gain hands-on experience incorporating UPC++ features into application proxy examples. We examine a few UPC++ applications with irregular communication (metagenomic assembler and COVID-19 simulation) and describe how they utilize UPC++ to optimize communication performance.
Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,
- Download File: pmbs21-DL-final.pdf (pdf: 632 KB)
Tan Nguyen, Erich Strohmaier, John Shalf, "Facilitating CoDesign with Automatic Code Similarity Learning", 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), November 14, 2021,
Monica Hernandez, How a Novel Radio Frequency Control System Enhances Quantum Computers, News release, November 9, 2021,
Bradley K. Mitchell, Ravi K. Naik, Alexis Morvan, Akel Hashim, John Mark Kreikebaum, Brian Marinelli, Wim Lavrijsen, Kasra Nowrouzi, David I. Santiago, Irfan Siddiqi, "Hardware-Efficient Microwave-Activated Tunable Coupling between Superconducting Qubits", Physical Review Letters, 2021, 127:200502, doi: 10.1103/PhysRevLett.127.200502
Ran Cheng, Uday S. Goteti, Harrison Walker, Keith M. Krause, Luke Oeding, Michael C. Hamilton, "Toward Learning in Neuromorphic Circuits Based on Quantum Phase Slip Junctions", Frontiers in Neuroscience, November 8, 2021,
S. B. Sayed, Y. Liu, L. J. Gomez, A. C. Yucel, "A butterfly-accelerated volume integral equation solver for broad permittivity and large-scale electromagnetic analysis", arxiv-preprint, November 5, 2021,
"Rising Talent in Quantum Computing: Meet Early Career Researchers at QSA", Monica Hernandez, November 4, 2021,
B Mohammed, M Kiran; N Krishnaswamy; Keshang, Wu, "Predicting WAN Traffic Volumes using Fourier and Multivariate SARIMA Approach", International Journal of Big Data Intelligence, November 3, 2021, doi: 10.1504/IJBDI.2021.118742
A. Syal, A. Lazar, J. Kim, A. Sim, K. Wu, "Network traffic performance analysis from passive measurements using gradient boosting machine learning", International Journal of Big Data Intelligence, 2021, 8:13-30, doi: 10.1504/IJBDI.2021.118741
"K-12 Career Talk: A Day in the Life of an AQT scientist", Monica Hernandez, Feature, October 22, 2021,
"El Advanced Quantum Testbed avanza tecnologías y talento para la computación cuántica", Monica Hernandez, Feature in Spanish, October 13, 2021,
"The Advanced Quantum Testbed Propels Quantum Information Technologies and Talent", Monica Hernandez, Feature, October 13, 2021,
Y. Ma, F. Rusu, K. Wu, A. Sim, Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers, arXiv preprint arXiv:2110.07029, 2021,
Yize Chen, Daniel Arnold, Yuanyuan Shi, Sean Peisert, Understanding the Safety Requirements for Learning-based Power Systems Operations, arXiv preprint arXiv:2110.04983, October 11, 2021,
M Kiran, B Mohammed, Q Du, D Wang, S Shen, R Wilcox, "Controlling Laser Beam Combining via an Active Reinforcement Learning Algorithm", Advanced Solid State Lasers 2021, Washington, DC United States, October 4, 2021,
N.B. Bonnheim, M.F. Adams, T. Wu, T.M. Keaveny, "The Role of Vertebral Porosity and Implant Loading Mode on Bone-Tissue Stress in the Human Vertebral Body Following Lumbar Total Disc Arthroplasty", Spine, October 1, 2021, 1022-E1030, doi: 10.1097/BRS.0000000000004023
Pietro Benedusi, Michael L Minion, Rolf Krause, "An experimental comparison of a space-time multigrid method with PFASST for a reaction-diffusion problem", Computers & Mathematics with Applications, October 1, 2021,
- Download File: Benedusi-Minion-Krause.pdf (pdf: 372 KB)
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Yilun Xu, Gang Huang, Jan Balewski, Ravi Naik, Alexis Morvan, Bradley Mitchell, Kasra Nowrouzi, David I. Santiago, Irfan Siddiqi, "QubiC: An Open-Source FPGA-Based Control and Measurement System for Superconducting Quantum Information Processors", IEEE Transactions on Quantum Engineering, 2021, 2:1-11, doi: 10.1109/TQE.2021.3116540
Andrew Adams, Kay Avila, Elisa Heymann, Mark Krenz, Jason R. Lee, Barton Miller, Sean Peisert, "The State of the Scientific Software World: Findings of the 2021 Trusted CI Software Assurance Annual Challenge Interviews", Trusted CI Report, September 29, 2021,
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001425, doi: 10.25344/S4XK53
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
"Why QSA Advances 2D Materials for Quantum Computing", Monica Hernandez, Feature, September 28, 2021,
H. Luo, J.W. Demmel, Y. Cho, X. S. Li, Y. Liu, "Non-smooth Bayesian optimization in tuning problems", arxiv-preprint, September 21, 2021,
Monica Hernandez, Raising the Bar in Error Characterization for Qutrit-Based Quantum Computing, News release, September 20, 2021,
Md Abdul M Faysal, Shaikh Arifuzzaman, Cy Chan, Maximilian Bremer, Doru Popovici, John Shalf, "HyPC-Map: A Hybrid Parallel Community Detection Algorithm Using Information-Theoretic Approach", HPEC, September 20, 2021,
E. Copps, A. Sim (Advisor), K. Wu (Advisor), "Analyzing scientific data sharing patterns with in-network data caching", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2021), ACM Student Research Competition (SRC), 2021,
Bo Fang, Daoce Wang, Sian Jin, Quincey Koziol, Zhao Zhang, Qiang Guan, Suren Byna, Sriram Krishnamoorthy, and Dingwen Tao,, "Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights", IEEE Cluster 2021, September 1, 2021,
Marco Siracusa, Emanuele Del Sozzo, Marco Rabozzi, Lorenzo Di Tucci, Samuel Williams, Donatella Sciuto, Marco Domenico Santambrogio, "A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model", Transactions on Computers (TC), September 2021, doi: 10.1109/TC.2021.3111761
Srivatsan Chakram, Andrew E. Oriani, Ravi K. Naik, Akash V. Dixit, Kevin He, Ankur Agrawal, Hyeokshin Kwon, David I. Schuster, "Seamless High-Q Microwave Cavities for Multimode Circuit Quantum Electrodynamics", Physical Review Letters, 2021, 127:107701, doi: 10.1103/PhysRevLett.127.107701
G Koolstra, N Stevenson, S Barzili, L Burns, K Siva, S Greenfield, W Livingston, A Hashim, RK Naik, JM Kreikebaum, KP O'Brien, DI Santiago, J Dressel, I Siddiqi, "Monitoring fast superconducting qubit dynamics using a neural network", Preprint, August 2021,
Tommaso Buvoli, Michael Minion, "IMEX Runge-Kutta Parareal for Non-diffusive Equations", Springer Proceedings in Mathematics & Statistics, August 25, 2021,
Sebastian Götschel, Michael Minion, Daniel Ruprecht, Robert Speck, "Twelve Ways To Fool The Masses When Giving Parallel-In-Time Results Authors", Springer Proceedings in Mathematics & Statistics, August 25, 2021,
- Download File: Twelve-Ways.pdf (pdf: 847 KB)
Meriam Gay Bautista, Zhi Jackie Yao, Anastasiia Butko, Mariam Kiran, Mekena Metcalf, "Towards Automated Superconducting Circuit Calibration using Deep Reinforcement Learning", 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Tampa, FL, USA, IEEE, August 23, 2021, pp. 462-46, doi: 10.1109/ISVLSI51109.2021.00091
Tan Nguyen, Colin MacLean, Marco Siracusa, Douglas Doerfler, Nicholas J. Wright, Samuel Williams, "FPGA‐based HPC accelerators: An evaluation on performance and energy efficiency", CCPE, August 22, 2021, doi: 10.1002/cpe.6570
"How the Quantum Systems Accelerator Set A Shared Direction in Electronic Controls for Quantum Computing", Monica Hernandez, Feature, August 20, 2021,
I. Srivastava, S. A. Roberts, J. T. Clemmer, L. E. Silbert, J. B. Lechman, G. S. Grest, "Jamming of Bidisperse Frictional Spheres", Physical Review Research, August 13, 2021, 3:L032042, doi: 10.1103/PhysRevResearch.3.L032042
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-flux Shift Buffer for Race Logic", 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), August 2021,
Dan Bonachea, "UPC++ as_eager Working Group Draft, Revision 2020.6.2", Lawrence Berkeley National Laboratory Tech Report, August 9, 2021, LBNL 2001416, doi: 10.25344/S4FK5R
This draft proposes an extension for a new future-based completion variant that can be more effectively streamlined for RMA and atomic access operations that happen to be satisfied at runtime using purely node-local resources. Many such operations are most efficiently performed synchronously using load/store instructions on shared-memory mappings, where the actual access may only require a few CPU instructions. In such cases we believe it’s critical to minimize the overheads imposed by the UPC++ runtime and completion queues, in order to enable efficient operation on hierarchical node hardware using shared-memory bypass.
The new upcxx::{source,operation}_cx::as_eager_future() completion variant accomplishes this goal by relaxing the current restriction that future-returning access operations must return a non-ready future whose completion is deferred until a subsequent explicit invocation of user-level progress. This relaxation allows access operations that are completed synchronously to instead return a ready future, thereby avoiding most or all of the runtime costs associated with deferment of future completion and subsequent mandatory entry into the progress engine.
We additionally propose to make this new as_eager_future() completion variant the new default completion for communication operations that currently default to returning a future. This should encourage use of the streamlined variant, and may provide performance improvements to some codes without source changes. A mechanism is proposed to restore the legacy behavior on-demand for codes that might happen to rely on deferred completion for correctness.
Finally, we propose a new as_eager_promise() completion variant that extends analogous improvements to promise-based completion, and corresponding changes to the default behavior of as_promise().
Wei Zhang, Software Release: ActiveDR v1.0.6, August 7, 2021, doi: 10.5281/zenodo.5168853
Nan Ding, Muaaz Awan, Samuel Williams, "Instruction Roofline: An insightful visual performance model for GPUs", CCPE, August 4, 2021, doi: 10.1002/cpe.6591
Ran Cheng, Uday S. Goteti, Michael C. Hamilton, "High-Speed and Low-Power Superconducting Neuromorphic Circuits Based on Quantum Phase-Slip Junctions", IEEE Transactions on Applied Superconductivity, August 2021,
Suren Byna, Houjun Tang, and Quincey Koziol,, Automatic and Transparent Scientific Data Management with Object Abstractions, PASC 2021, in a Minisymposium on "Data Movement Orchestration on HPC Systems", July 31, 2021,
"Leading with Breakthrough Science at the Advanced Quantum Testbed User Program", Monica Hernandez, Feature, July 29, 2021,
Nan Ding, Samuel Williams, Yang Liu, Xiaoye S. Li, A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver, July 19, 2021,
- Download File: multiGPU_SpTRSV_ACDA21-v2.pdf (pdf: 3.7 MB)
Nan Ding, Yang Liu, Samuel Williams, Xiaoye S. Li, "A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), July 19, 2021,
- Download File: Multi-GPU-SpTRSV-ACDA21-.pdf (pdf: 897 KB)
Charlene Yang, Yunsong Wang, Thorsten Kurth, Steven Farrell, Samuel Williams, "Hierarchical Roofline Performance Analysis for Deep Learning Applications", Intelligent Computing, LNNS, July 15, 2021, doi: 10.1007/978-3-030-80126-7
Jean Sexton, Zarija Lukic, Ann Almgren, Chris Daley, Brian Friesen, Andrew Myers, and Weiqun Zhang, "Nyx: A Massively Parallel AMR Code for Computational Cosmology", The Journal Of Open Source Software, July 10, 2021,
M. Nakashima, A. Sim, Y. Kim, J. Kim, J. Kim, "Automated Feature Selection for Anomaly Detection in Network Traffic Data", ACM Transactions on Management Information Systems (TMIS), 2021, 12:1-28, doi: 10.1145/3446636
Drew Paine, Sarah Poon, Lavanya Ramakrishnan, "Investigating User Experiences with Data Abstractions on High Performance Computing Systems", June 29, 2021, LBNL LBNL-2001374,
Scientific exploration generates expanding volumes of data that commonly require High Performance Computing (HPC) systems to facilitate research. HPC systems are complex ecosystems of hardware and software that frequently are not user friendly. The Usable Data Abstractions (UDA) project set out to build usable software for scientific workflows in HPC environments by undertaking multiple rounds of qualitative user research. Qualitative research investigates how individuals accomplish their work and our interview-based study surfaced a variety of insights about the experiences of working in and with HPC ecosystems. This report examines multiple facets to the experiences of scientists and developers using and supporting HPC systems. We discuss how stakeholders grasp the design and configuration of these systems, the impacts of abstraction layers on their ability to successfully do work, and the varied perceptions of time that shape this work. Examining the adoption of the Cori HPC at NERSC we explore the anticipations and lived experiences of users interacting with this system's novel storage feature, the Burst Buffer. We present lessons learned from across these insights to illustrate just some of the challenges HPC facilities and their stakeholders need to account for when procuring and supporting these essential scientific resources to ensure their usability and utility to a variety of scientific practices.
Thomas M Evans, Andrew Siegel, Erik W Draeger,Jack Deslippe, Marianne M Francois, Timothy C Germann,William E Hart, Daniel F Martin, "A survey of software implementations used by application codes in the Exascale Computing Project", The International Journal of High Performance Computing Applications, June 25, 2021, doi: https://doi.org/10.1177/10943420211028940
- Download File: ijhpc-2021.pdf (pdf: 242 KB)
Élie Genois, Jonathan A. Gross, Agustin Di Paolo, Noah J. Stevenson, Gerwin Koolstra, Akel Hashim, Irfan Siddiqi, Alexandre Blais, "Quantum-tailored machine-learning characterization of a superconducting qubit", Preprint, June 24, 2021,
Robin J Dolleman, Debadi Chakraborty, Daniel R Ladiges, Herre SJ van der Zant, John E Sader, Peter G Steeneken, "Squeeze-film effect on atomically thin resonators in the high-pressure limit", Submitted to Nano Letters, June 24, 2021,
"The Quantum Systems Accelerator Hosts First Industry Roundtable", Monica Hernandez, Feature, June 22, 2021,
Yang Liu, Pieter Ghysels, Lisa Claus, Xiaoye Sherry Li, "Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations", SIAM J. Sci. Comput., June 22, 2021,
Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy, Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, "Experiences with Reproducibility: Case Studies from Scientific Workflows", (P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems, ACM, June 21, 2021, doi: 10.1145/3456287.3465478
Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.
A. Lazar, A. Sim, K. Wu, "GPU-based Classification for Wireless Intrusion Detection", 4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464445
Y. Wang, K. Wu, A. Sim, S. Yoo, S. Misawa, "Access Patterns of Disk Cache for Large Scientific Archive", 4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464444
E. Copps, H. Zhang, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, E. Fajardo, "Analyzing scientific data sharing patterns with in-network data caching", 4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464441
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Michael Beach, Drew Paine, Lavanya Ramakrishnan, "Science Capsule - Capturing the Data Life Cycle", Journal of Open Source Software, 2021, 6:2484, doi: 10.21105/joss.02484
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power,, "Enabling Design Space Exploration for RISC-V Secure Compute Environments", Proceedings of the Fifth Workshop on Computer Architecture Research with RISC-V (CARRV), (co-located with ISCA 2021), June 17, 2021,
"AQT Positions Itself as Hub for Quantum Computing Startups", Monica Hernandez, Feature, June 16, 2021,
E. W. Bethel, C. Heinemann, and T. Perciano, "Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel", Eurographics Symposium on Parallel Graphics and Visualization, June 14, 2021,
Weiqun Zhang, Andrew Myers, Kevin Gott, Ann Almgren and John Bell, "AMReX: Block-Structured Adaptive Mesh Refinement for Multiphysics Applications", The International Journal of High Performance Computing Applications, June 12, 2021,
Efficient scientific data discovery over self-describing file formats, Wei Zhang, June 1, 2021,
C Varadharajan, Z Kakalia, E Alper, EL Brodie, M Burrus, RWH Carroll, D Christianson, W Dong, V Hendrix, M Henderson, S Hubbard, D Johnson, R Versteeg, KH Williams, DA Agarwal, The Colorado East River Community Observatory Data Collection, Hydrological Processes 35(6), 2021, doi: 10.22541/au.161962485.54378235/v1
Bing Xie, Houjun Tang, Suren Byna, Jesse Hanley, Quincey Koziol, Tonglin Li, Sarp Oral,, "Battle of the Defaults: Extracting Performance Characteristics of HDF5 under Production Load", CCGrid 2021, May 31, 2021,
Ciaran Roberts, Sy-Toan Ngo, Alexandre Milesi, Anna Scaglione, Sean Peisert, Daniel Arnold, "Deep Reinforcement Learning for Mitigating Cyber-Physical DER Voltage Unbalance Attacks”", Proceedings of the 2021 American Control Conference (ACC), May 2021, doi: 10.23919/ACC50511.2021.9482815
Serges Love Teutu Talla, Isabelle Kemajou-Brown, Cy Chan, Bin Wang, "A Binary Multi-Subsystems Transportation Networks Estimation using Mobiliti Data", 2021 American Control Conference (ACC), May 25, 2021,
David McCallen, Houjun Tang, Suiwen Wu, Eric Eckert, Junfei Huang, N Anders Petersson, "Coupling of regional geophysics and local soil-structure models in the EQSIM fault-to-structure earthquake simulation framework", The International Journal of High Performance Computing Applications, May 25, 2021, doi: 10.1177/10943420211019118
Maximilian Bremer, John Bachan, Cy Chan, and Clint Dawson, "Speculative Parallel Execution for Local Timestepping", 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, May 21, 2021,
George Michelogiannakis, SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC, IEEE International Parallel and Distributed Processing Symposium, May 2021,
- Download File: ipdps-2021-2.pptx (pptx: 1.7 MB)
Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad, "Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale", 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021, doi: 10.1109/IPDPS49936.2021.00018
Y. Ma, F. Ruso, A. Sim, K. Wu, "Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU+GPU Architectures", Heterogeneity in Computing Workshop (HCW 2021), in conjunction with the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2021, doi: 10.1109/IPDPSW52791.2021.00012
Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert, "Performance Analysis of Scientific Computing Workloads on General Purpose TEEs", Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE, May 2021, doi: 10.1109/IPDPS49936.2021.00115
George Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Bautista, Dilip Vasudevan, Anastasiia Butko, "SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC", IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021,
L. Fedeli, A. Sainte-Marie, N. Zaim, M. Thevenet, J. L. Vay, A. Myers, F. Quere, and H. Vincenti, "Probing strong-field QED with Doppler-boosted petawatt-class lasers", Physical Review Letters, May 10, 2021,
Tamsin L. Edwards, Sophie Nowicki, Ben Marzeion, Regine Hock, Heiko Goelzer, Hélène Seroussi, Nicolas C. Jourdain, Donald A. Slater, Fiona E. Turner, Christopher J. Smith, Christine M. McKenna, Erika Simon, Ayako Abe-Ouchi, Jonathan M. Gregory, Eric Larour, William H. Lipscomb, Antony J. Payne, Andrew Shepherd, Cécile Agosta, Patrick Alexander, Torsten Albrecht, Brian Anderson, Xylar Asay-Davis, Andy Aschwanden, Alice Barthel, Andrew Bliss, Reinhard Calov, Christopher Chambers, Nicolas Champollion, Youngmin Choi, Richard Cullather, Joshua Cuzzone, Christophe Dumas, Denis Felikson, Xavier Fettweis, Koji Fujita, Benjamin K. Galton-Fenzi, Rupert Gladstone, Nicholas R. Golledge, Ralf Greve, Tore Hattermann, Matthew J. Hoffman, Angelika Humbert, Matthias Huss, Philippe Huybrechts, Walter Immerzeel, Thomas Kleiner, Philip Kraaijenbrink, Sébastien Le clec’h, Victoria Lee, Gunter R. Leguy, Christopher M. Little, Daniel P. Lowry, Jan-Hendrik Malles, Daniel F. Martin, Fabien Maussion, Mathieu Morlighem, James F. O’Neill, Isabel Nias, Frank Pattyn, Tyler Pelle, Stephen F. Price, Aurélien Quiquet, Valentina Radić, Ronja Reese, David R. Rounce, Martin Rückamp, Akiko Sakai, Courtney Shafer, Nicole-Jeanne Schlegel, Sarah Shannon, Robin S. Smith, Fiammetta Straneo, Sainan Sun, Lev Tarasov, Luke D. Trusel, Jonas Van Breedam, Roderik van de Wal, Michiel van den Broeke, Ricarda Winkelmann, Harry Zekollari, Chen Zhao, Tong Zhang, Thomas Zwinger, "Projected land ice contributions to twenty-first-century sea level rise", Nature, May 5, 2021, 593:74-82, doi: 10.1038/s41586-021-03302-y
- Download File: Edwards-et-al-2021-Nature-preprint.pdf (pdf: 40 MB)
David McCallen, Anders Petersson, Arthur Rodgers, Arben Pitarka, Mamun Miah, Floriana Petrone, Bjorn Sjogreen, Norman Abrahamson, Houjun Tang, "EQSIM—A multidisciplinary framework for fault-to-structure earthquake simulations on exascale computers part I: Computational models and workflow", Earthquake Spectra, May 1, 2021, 37:707-735, doi: 10.1177/8755293020970982
D. A. Agarwal, J. Damerow, C. Varadharajan, D. S. Christianson, G. Z. Pastorello, Y.-W. Cheah, L. Ramakrishnan, "Balancing the needs of consumers and producers for scientific data collections", Ecological Informatics, 2021, 62:101251, doi: 10.1016/j.ecoinf.2021.101251
Sean Peisert, "Trustworthy Scientific Computing", Communications of the ACM (CACM), May 2021, doi: 10.1145/3457191
Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, Aydın Buluç, "BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 2021, doi: 10.1101/464420
T. Groves, N. Ravichandrasekaran, B. Cook, N. Keen, D. Trebotich, N. Wright, B. Alverson, D. Roweth, K. Underwood, "Not All Applications Have Boring Communication Patterns: Profiling Message Matching with BMM", Concurrency and Computation: Practice and Experience, April 26, 2021, doi: 0.1002/cpe.6380
J. Kim, A. Sim, J. Kim, K, Wu, J. Hahm, Improving Botnet Detection with Recurrent Neural Network and Transfer Learning, arXiv preprint arXiv:2104.12602, 2021,
Douglas Doerfler, Farzad Fatollahi-Fard, Colin MacLean, Tan Nguyen, Samuel Williams, Nicholas J. Wright, Marco Siracusa, "Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs", International Workshop on OpenCL (iWOCL), April 2021, doi: 10.1145/3456669.3456671
J. Galen Wang, Roseanna N. Zia, "Vitrification is a spontaneous non-equilibrium transition driven by osmotic pressure", Journal of Physics: Condensed Matter, April 23, 2021, doi: 10.1088/1361-648x/abeec0
Sherwood Richers, Don E. Willcox, Nicole M. Ford, and Andrew Myers, "Particle-in-cell simulation of the neutrino fast flavor instabilit", Physical Review D, April 20, 2021,
Jordan Musser, Ann S Almgren, William D Fullmer, Oscar Antepara, John B Bell, Johannes Blaschke, Kevin Gott, Andrew Myers, Roberto Porcu, Deepak Rangarajan, Michele Rosso, Weiqun Zhang, and Madhava Syamlal, "MFIX:Exa: A Path Towards Exascale CFD-DEM Simulations", The International Journal of High Performance Computing Applications, April 16, 2021,
Jonathan Madsen, Roofline Instrumentation with TiMemory, ECP Annual Meeting, April 2021,
- Download File: ECP21-Roofline-7-TiMemory.pdf (pdf: 490 KB)
Khaled Ibrahim, Roofline on GPUs (advanced topics), ECP Annual Meeting, April 2021,
- Download File: ECP21-Roofline-6-advanced.pdf (pdf: 15 MB)
Jonathan Madsen, Roofline Model using NSight Compute, ECP Annual Meeting, April 2021,
- Download File: ECP21-Roofline-3-NERSC.pdf (pdf: 4 MB)
Samuel Williams, Roofline Analysis on NVIDIA GPUs, ECP Annual Meeting, April 2021,
- Download File: ECP21-Roofline-2-NVIDIA.pdf (pdf: 14 MB)
Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, April 2021,
- Download File: ECP21-Roofline-1-intro.pdf (pdf: 22 MB)
Paul H. Hargrove, Dan Bonachea, Max Grossman, Amir Kamil, Colin A. MacLean, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'21)", Poster at Exascale Computing Project (ECP) Annual Meeting 2021, April 2021,
We present UPC++ and GASNet-EX, which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Memory Access (RMA) and Remote Procedure Call (RPC). The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems
Marco Pritoni, Drew Paine, Gabriel Fierro, Cory Mosiman, Michael Poplawski, Joel Bender, Jessica Granderson, "Metadata Schemas and Ontologies for Building Energy Applications: A Critical Review and Use Case Analysis", Energies, April 6, 2021, doi: 10.3390/en14072024
Digital and intelligent buildings are critical to realizing efficient building energy operations and a smart grid. With the increasing digitalization of processes throughout the life cycle of buildings, data exchanged between stakeholders and between building systems have grown significantly. However, a lack of semantic interoperability between data in different systems is still prevalent and hinders the development of energy-oriented applications that can be reused across buildings, limiting the scalability of innovative solutions. Addressing this challenge, our review paper systematically reviews metadata schemas and ontologies that are at the foundation of semantic interoperability necessary to move toward improved building energy operations. The review finds 40 schemas that span different phases of the building life cycle, most of which cover commercial building operations and, in particular, control and monitoring systems. The paper’s deeper review and analysis of five popular schemas identify several gaps in their ability to fully facilitate the work of a building modeler attempting to support three use cases: energy audits, automated fault detection and diagnosis, and optimal control. Our findings demonstrate that building modelers focused on energy use cases will find it difficult, labor intensive, and costly to create, sustain, and use semantic models with existing ontologies. This underscores the significant work still to be done to enable interoperable, usable, and maintainable building models. We make three recommendations for future work by the building modeling and energy communities: a centralized repository with a search engine for relevant schemas, the development of more use cases, and better harmonization and standardization of schemas in collaboration with industry to facilitate their adoption by stakeholders addressing varied energy-focused use cases.
Fabio Massacci, Trent Jaeger, Sean Peisert, "SolarWinds and the Challenges of Patching: Can We Ever Stop Dancing With the Devil?", IEEE Security & Privacy, April 2021, 14-19, doi: 10.1109/MSEC.2021.3050433
Sean Peisert, Bruce Schneier, Hamed Okhravi, Fabio Massacci, Terry Benzel, Carl Landwehr, Mohammad Mannan, Jelena Mirkovic, Atul Prakash, James Bret Michael, "Perspectives on the SolarWinds Incident", IEEE Security & Privacy, April 2021, 7-13, doi: 10.1109/MSEC.2021.3051235
Daniel R. Ladiges, Sean P. Carney, Andrew Nonaka, Katherine Klymko, Guy C. Moore, Alejandro L. Garcia, Sachin R. Natesh, Aleksandar Donev, John B. Bell, "A Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm for Modeling Electrolytes", Physical Review Fluids, April 1, 2021, 6(4):044309,
Karol Kowalski, Raymond Bair, Nicholas P. Bauman, Jeffery S. Boschen, Eric J. Bylaska, Jeff Daily, Wibe A. de Jong, Thom Dunning, Niranjan Govind, Robert J. Harrison, Murat Keceli, Kristopher Keipert, Sriram Krishnamoorthy, Suraj Kumar, Erdal Mutlu, Bruce Palmer, Ajay Panyala, Bo Peng, Ryan M. Richard, T. P. Straatsma, Peter Sushko, Edward F. Valeev, Marat Valiev, Hubertus J. J. van Dam, Jonathan M. Waldrop, David B. Williams-Young, Chao Yang, Marcin Zalewski, Theresa L. Windus, "From NWChem to NWChemEx: Evolving with the Computational Chemistry Landscape", Chemical Reviews, March 31, 2021, doi: 10.1021/acs.chemrev.0c00998
J. Goings, H. Hu, C. Yang, X. Li, "Reinforcement Learning Configuration Interaction", March 31, 2021,
Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2021.3.0", Lawrence Berkeley National Laboratory Tech Report, March 31, 2021, LBNL 2001388, doi: 10.25344/S4K881
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
Georgios Tzimpragos, Jennifer Volk, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, John Shalf, Timothy Sherwood, "Temporal Computing With Superconductors", IEEE MIcro, March 2021, 41:71-79, doi: 10.1109/MM.2021.3066377
Ed Younis, Koushik Sen, Katherine Yelick, Costin Iancu, QFAST: Quantum Synthesis Using a Hierarchical Continuous Circuit Space, Bulletin of the American Physical Society, March 2021,
We present QFAST, a quantum synthesis tool designed to produce short circuits and to scale well in practice. Our contributions are: 1) a novel representation of circuits able to encode placement and topology; 2) a hierarchical approach with an iterative refinement formulation that combines "coarse-grained" fast optimization during circuit structure search with a good, but slower, optimization stage only in the final circuit instantiation. When compared against state-of-the-art techniques, although not always optimal, QFAST can reduce circuits for "time-dependent evolution" algorithms, as used by domain scientists, by 60x in depth. On typical circuits, it provides 4x better depth reduction than the widely used Qiskit and UniversalQ compilers. We also show the composability and tunability of our formulation in terms of circuit depth and running time. For example, we show how to generate shorter circuits by plugging in the best available third party synthesis algorithm at a given hierarchy level. Composability enables portability across chip architectures, which is missing from similar approaches.
QFAST is integrated with Qiskit and available at github.com/bqskit.
Akel Hashim, Ravi Naik, Alexis Morvan, Jean-Loup Ville, Brad Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin O Brien, Ian Hincks, Joel Wallman, Joseph V Emerson, David Ivan Santiago, Irfan Siddiqi, Scalable Quantum Computing on a Noisy Superconducting Quantum Processor via Randomized Compiling, Bulletin of the American Physical Society, 2021,
Coherent errors in quantum hardware severely limit the performance of quantum algorithms in an unpredictable manner, and mitigating their impact is necessary for realizing reliable, large-scale quantum computations. Randomized compiling achieves this goal by converting coherent errors into stochastic noise, dramatically reducing unpredictable errors in quantum algorithms and enabling accurate predictions of aggregate performance via cycle benchmarking estimates. In this work, we demonstrate significant performance gains under randomized compiling for both the four-qubit quantum Fourier transform algorithm and for random circuits of variable depth on a superconducting quantum processor. We also validate solution accuracy using experimentally-measured error rates. Our results demonstrate that randomized compiling can be utilized to maximally-leverage and predict the capabilities of modern-day noisy quantum processors, paving the way forward for scalable quantum computing.
Dan Bonachea, GASNet-EX: A High-Performance, Portable Communication Library for Exascale, Berkeley Lab – CS Seminar, March 10, 2021,
- Download File: GASNet-2021-LBL-seminar-slides.pdf (pdf: 9.1 MB)
Partitioned Global Address Space (PGAS) models, pioneered by languages such as Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.
GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in future exascale machines. The library is an evolution of the popular GASNet communication system, building on 20 years of lessons learned. We describe several features and enhancements that have been introduced to address the needs of modern runtimes and exploit the hardware capabilities of emerging systems. Microbenchmark results demonstrate the RMA performance of GASNet-EX is competitive with several MPI implementations on current systems. GASNet-EX provides communication services that help to deliver speedups in HPC applications written using the UPC++ library, enabling new science on pre-exascale systems.
Yang Liu, Xin Xing, Han Guo, Eric Michielssen, Pieter Ghysels, Xiaoye Sherry Li, "Butterfly factorization via randomized matrix-vector multiplications", SIAM J. Sci. Comput., March 9, 2021,
George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman, Maximizing The Impact of Emerging Photonic Switches At The System Level, SPIE photonics west, March 2021,
- Download File: photonics-west-2021.pdf (pdf: 770 KB)
George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman, "Maximizing the impact of emerging photonic switches at the system level", SPIE 11692, Optical Interconnects XXI, 116920Z, March 2021,
M. Ozan Karsavuran, Seher Acer, Cevdet Aykanat, Medium-Grain Partitioning for Sparse Tensor Decomposition, SIAM Conference on Computational Science and Engineering (CSE21), 2021,
Thijs Steel, Daan Camps, Karl Meerbergen, Raf Vandebril, "A Multishift, Multipole Rational QZ Method with Aggressive Early Deflation", SIAM Journal on Matrix Analysis and Applications, February 19, 2021, 42:753-774, doi: 10.1137/19M1249631
In the article “A Rational QZ Method” by D. Camps, K. Meerbergen, and R. Vandebril [SIAM J. Matrix Anal. Appl., 40 (2019), pp. 943--972], we introduced rational QZ (RQZ) methods. Our theoretical examinations revealed that the convergence of the RQZ method is governed by rational subspace iteration, thereby generalizing the classical QZ method, whose convergence relies on polynomial subspace iteration. Moreover the RQZ method operates on a pencil more general than Hessenberg---upper triangular, namely, a Hessenberg pencil, which is a pencil consisting of two Hessenberg matrices. However, the RQZ method can only be made competitive to advanced QZ implementations by using crucial add-ons such as small bulge multishift sweeps, aggressive early deflation, and optimal packing. In this paper we develop these techniques for the RQZ method. In the numerical experiments we compare the results with state-of-the-art routines for the generalized eigenvalue problem and show that the presented method is competitive in terms of speed and accuracy.
Y. Liu, W. M. Sid-Lakhdar, O. Marques, X. Zhu, C. Meng, J. W. Demmel, X. S. Li, "GPTune: multitask learning for autotuning exascale applications", PPoPP, February 17, 2021, doi: 10.1145/3437801.3441621
J-L Vay, Ann Almgren, LD Amorim, John Bell, L Fedeli, L Ge, K Gott, DP Grote, M Hogan, A Huebl, R Jambunathan, R Lehe, A Myers, C Ng, M Rowan, O Shapoval, M Thevenet, H Vincenti, E Yang, N Zaim, W Zhang, Y Zhao and E Zoni, "Modeling of a chain of three plasma accelerator stages with the WarpX electromagnetic PIC code on GPUs", Physics of Plasmas, February 9, 2021,
Tuowen Zhao, Mary Hall, Hans Johansen, Samuel Williams, "Improving Communication by Optimizing On-Node Data Movement with Data Layout", PPoPP, February 2021,
- Download File: PPoPP-Bricks-MPI-final.pdf (pdf: 864 KB)