2025
D. Sung, S. Kim, S. Lee, H. Tang, A. Sim, K. Wu, Y. Son,
"TSALA: Improving Performance Prediction in Large-Scale Systems through Temporal System and Application Log Analysis",
39th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2025),
2025,
S. Kim, S. Kim, C. Kim, A. Sim, K. Wu, H. Tang,
"SWIFTN: Accelerating Quantum Circuit Simulation Through Tensor Optimization",
25th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid 2025),
2025,
Show Details
The Fortran programming language standard added features supporting single-program, multiple data (SPMD) parallel programming and loop parallelism beginning with Fortran 2008. In Fortran, SPMD programming involves the creation of a fixed number of images (instances) of a program that execute asynchronously in shared or distributed memory, except where a program uses specific synchronization mechanisms. Fortran’s “coarray’’ distributed data structures offer a subscripted, multidimensional array notation defining a partitioned global address space (PGAS). One image can use this notation for one-sided access to another image’s slice of a coarray.
The CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine) provides a runtime library that supports Fortran’s SPMD features. Caffeine implements inter-process communication by building atop the GASNet-EX exascale networking middleware library. Caffeine is the first implementation of the compiler- and runtime-agnostic Parallel Runtime Interface for Fortran (PRIF) specification. Any compiler that targets PRIF can use any runtime that supports PRIF. Caffeine supports researching the novel approach of writing most of a compiler’s parallel runtime library in the language being compiled: Caffeine is primarily implemented using Fortran’s non-parallel features, with a thin C-language layer that invokes the external GASNet-EX communication library. Exploring this approach in open source lowers a barrier to contributions from the compiler’s users: Fortran programmers. Caffeine also facilitates research such as investigating various optimization opportunities that exploit specific hardware such as shared memory or specific interconnects.
Show Details
As of Fortran 2023, the collective subroutine intrinsics
(CO_BROADCAST, CO_MAX, CO_MIN, CO_REDUCE, and CO_SUM) may only be
executed over the current team, as defined by the CHANGE TEAM
construct. This becomes very awkward when one needs to execute such a
collective over an ancestor team; because there is no way to directly
express that without closing the CHANGE TEAM construct, and invoking
END TEAM may have undesired side-effects such as deallocating
team-specific coarrays. It would also be convenient to allow
collectives directly over a child team without forcing the
synchronization side effects associated with a CHANGE TEAM to that
child team.
The collective subroutines of Fortran should support execution in a
specified team that is not the current team.
Paper PASSED by roll call vote at INCITS/US Fortran Programming Language Standards Technical Committee meeting #235
Gary Klimowicz, Dan Bonachea, Aury Shafran,
"Fortran preprocessor requirements",
INCITS/US Fortran Programming Language Standards Technical Committee (J3/25-114r2),
February 2025,
Show Details
Many existing Fortran projects make extensive use of C preprocessor
directives and macro expansion, despite the lack of an FPP standard.
This is usually done to tailor the code to specific environments, such
as target compilers or machines.
Unfortunately, more complex use cases fail to be portable between
different implementations. This is enough of a problem that WG 5 raised
this as the number 2 issue to address in Fortran 202y, behind generics.
This is not a new problem, as evidenced by the J3 discussions from the
mid 1990s. The introduction of CoCo in Fortran 95 did not solve the
problem, either, because it was not a mandatory part of the standard and
because it was not compatible with the preprocessor syntax used by many
existing Fortran projects.
This document attempts to define the requirements for a mandatory
Fortran preprocessor based on the preprocessor syntax already in common
use today. The guiding principle is to promote Fortran program
portability by defining consistent syntax and semantics of a useful
subset of CPP. Some FPP behavior will be slightly different from CPP, in
order to accommodate some Fortran idiosyncrasies.
A major overarching goal of this effort is to standardize de facto
current practice for preprocessing in Fortran compilers and code. It is
the standard's responsibility to standardize syntax in order to settle
minor divergences that have arisen amongst pre-standard FPP
implementations, to the detriment of portability for end users.
Paper PASSED by unanimous consent at INCITS/US Fortran Programming Language Standards Technical Committee meeting #235
J. Kim, A. Sim, K. Wu, J. Kim,
"Improving Slow Transfer Predictions: Generative Methods Compared",
IEEE International Conference on Computing, Networking and Communications (ICNC 2025),
2025,
George Michelogiannakis,
Reliable Novel Compute Methods for Unreliable Environments,
Georgia Tech CRNCH summit,
February 13, 2025,
B. Fan, A. Sim, K. Wu, J. Kim,
"Conditional Recurrent Neural Networks for Enhancing Throughput Prediction and Slow File Transfers Detection in Large Science Workflows",
22nd IEEE Consumer Communications & Networking Conference (CCNC 2025),
2025,
Show Details
Modern accelerators use hierarchical parallel programming models that enable massive multithreading within a processing element (PE), with multiple PEs per device driven by traditional processes. Batching is a technique for exposing PE-level parallelism in algorithms that have traditionally run on MPI processes or multiple threads within a single process. Opportunities for batching arise in, for example, kinetic discretizations of magnetized plasmas where collisions are advanced in velocity space at each spatial point independently.
This paper builds on previous work on a high-performance, fully nonlinear, Landau collision operator by batching the linear solver, as well as batching the spatial point problems and adding new support for multiple grids for multiscale, multi-species problems. An anisotropic relaxation verification test that agrees well with previous published results and analytical models is presented. The performance results from NVIDIA A100 and AMD MI250X nodes are presented with hardware utilization analysis for each architecture. The entire implicit Landau operator time advance is implemented in Kokkos for performance portability, running entirely on the device and is available in the PETSc numerical library.
2024
Show Details
This document specifies an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is primarily responsible for implementing coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, teams and collective subroutines. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF subroutines. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler's own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.
I. Mahmud, P. Zuk, C. Wang, M. Kiran, K. Wu, K. Thareja, K. Raghavan, A. Mandal, E. Deelman,
"DISTRI: Development and Integration of Simulation Tools for Resilient Infrastructure",
5th International Workshop on Big Data & AI Tools, Models, and Use Cases for Innovative Scientific Discovery (BTSD),
2024,
B. Dong, A. Nayak, K. Wu, V. Tribaldos, J. Ajo-Franklin, Q. Zhang, S. Byna,… more authors » F. Guo, P. Dobson, A. Sim, « fewer authors
"TensorSearch: Parallel Similarity Search on Tensors",
IEEE International Conference on Big Data (BigData),
2024,
Hyunju Oh, Wei Zhang, Christopher D. Rickett, Sreenivas R. Sukumar, Suren Byna,
"Evaluating Performance Trade-offs of Caching Strategies for AI-Powered Querying Systems",
2024 IEEE International Conference on Big Data (IEEE BigData 2024),
Washington DC, USA,
2024,
doi: 10.1109/BigData62323.2024.10825819
Show Details
With the rapid growth of accumulated data from
various scientific domains, traditional data management systems
face challenges in supporting complicated queries, such as pattern
search, on massive amounts of data. To serve sophisticated
queries through capturing precise features from data, recent
data management systems seek to use artificial intelligence
(AI) within the querying process. However, the characteristic
of AI inference workflow within the querying process, such as
intensive computation and expensive requirements for computing
resources, becomes a bottleneck of the AI-powered query systems.
In this paper, we provide a generalization of AI inference
workflow in the context of AI-powered data discovery and we
introduce three different caching strategies corresponding to
each stage in the AI inference workflow. We provide in-depth
performance evaluation on the impact of these caching strategies
through a series of strong scaling experiments. Our experimental
results show that the AI-powered data querying performance can
be significantly improved by applying different caching strategies.
Xuan Jiang, Raja Sengupta, James Demmel, Samuel Williams,
"Large scale multi-GPU based parallel traffic simulation for accelerated traffic assignment and propagation",
Transportation Research Part C: Emerging Technologies,
December 2024,
169:104873,
doi: 10.1016/j.trc.2024.104873
Jean Luca Bez,
Analyzing Parallel I/O,
ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), BoF,
2024,
P. Zuk, H. Jin, I. Mahmud, K. Raghavan, K. Thareja, S. Wu, P. Balaprakash,… more authors » F, Cappello, Z. Chen, E. Deelman, S. Di, A. Hamade, M. Kiran, A. Mandal, E. Scott, C. Wang, K. Wu, « fewer authors
SWARM: Scientific Workflow Applications on Resilient Metasystem,
ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), BoF,
2024,
Jean Luca Bez,
Drishti: I/O Insights for All,
ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24),
2024,
P. Zuk, H. Jin, I. Mahmud, K. Raghavan, K. Thareja, S. Wu, P. Balaprakash,… more authors » F, Cappello, Z. Chen, E. Deelman, S. Di, A. Hamade, M. Kiran, A. Mandal, E. Scott, C. Wang, K. Wu, « fewer authors
"SWARM: Scientific Workflow Applications on Resilient Metasystem",
ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24),
2024,
Jean Luca Bez,
IO500: The High-Performance Storage Community,
ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24), BoF,
2024,
Rajeev Jain, Houjun Tang, Akash Dhruv, Suren Byna,
"Enabling Data Reduction for Flash-X Simulations",
10th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD),
2024,
Junmin Gu, John Wu, Paul Lin, CS Chang, Seong-Hoe Ku, Stephane Ethier, Jong Choi,
Accurate in-situ in-transit analysis of particle diffusion for large-scale tokamak simulation,
ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’24),
2024,
V. Lakshminarayana, C. Oguchi, A. Sim, K. Wu, D. Ghosal,
"A Study of a Deterministic Networking Framework for Latency Critical Large Scientific Data Transfers",
11th Annual International Workshop on Innovating the Network for Data-Intensive Science (INDIS 2024),
2024,
Show Details
Fortran compilers that provide support for Fortran’s native parallel features often do so with a runtime library that depends on details of both the compiler implementation and the communication library, while others provide limited or no support at all. This paper introduces a new generalized interface that is both compiler- and runtime-library-agnostic, providing flexibility while fully supporting all of Fortran’s parallel features. The Parallel Runtime Interface for Fortran (PRIF) was developed to be portable across shared- and distributed-memory systems, with varying operating systems, toolchains and architectures. It achieves this by defining a set of Fortran procedures corresponding to each of the parallel features defined in the Fortran standard that may be invoked by a Fortran compiler and implemented by a runtime library. PRIF aims to be used as the solution for LLVM Flang to provide parallel Fortran support. This paper also briefly describes our PRIF prototype implementation: Caffeine.
Talk Slides
Jordan A. Welsman, Gunther H. Weber, Oluwamayowa O. Amusat, Anna Giannakou, Lavanya Ramakrishnan,
"Enhancing Electron Microscopy Image Classification Using Data Augmentation",
SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis,
IEEE,
November 17, 2024,
64-71,
doi: 10.1109/SCW63240.2024.00016
Jean Luca Bez, Suren Byna,
"Exploring the Proactive Data Containers Runtime System in VAST - A Case Study",
9th International Parallel Data Systems Workshop (PDSW),
2024,
Wei Zhang, Houjun Tang, Suren Byna,
"BULKI - Binary Unified Layout for Key-value Interchange",
9th International Parallel Data Systems Workshop (PDSW),
2024,
Damian Rouson, Baboucarr Dibba, Katherine Rasmussen, Brad Richardson,… more authors » David Torres, Yunhao Zhang, Ethan Gutmann, Kareem Ergawy, Michael Klemm, Sameer Shende, « fewer authors
Just Write Fortran: Experiences with a Language-Based Alternative to MPI+X,
Talk at IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM),
November 2024,
doi: 10.25344/S4H88D
Show Details
Fortran 2023, with its "do concurrent" and coarray parallel programming features, displaces many uses of extra-language parallel programming models such as MPI, OpenMP, and OpenACC. The Cray, Intel, LFortran, LLVM, and NVIDIA compilers automatically parallelize do concurrent in shared memory. The Cray, Intel, and GNU compilers support coarrays in shared- and distributed-memory, while the NAG compiler supports coarrays in shared memory. Thus, language-based parallelism is emerging as a portable alternative to MPI+X.
This talk will present experiences with automatic "do concurrent" parallelization in the deep learning library Inference-Engine and coarray communication in the Intermediate Complexity Atmospheric Research (ICAR), respectively.
PAW-ATM24
M. Schreyer, T. Sattarov, A. Sim, K. Wu,
"Imb-FinDiff: Conditional Diffusion Models for Class Imbalance Synthesis of Financial Tabular Data",
5th ACM International Conference on AI in Finance (ICAIF'24),
2024,
doi: 10.1145/3677052.3698659
Sterling Smith, Zichuan Anthony Xing, Torrin Bechtel, Severin Denk, Earl… more authors » DeShazer, Orso Meneghini, Tom Neiser, Laurie Stephey, Oscar Antepara, Christopher Mitchell Clark, Eli Dart, Pengfei Ding, Sean Flanagan, Raffi Nazikian, David Schissel, Christine Simpson, Nicholas Tyler, Thomas D. Uram, Samuel Williams, « fewer authors
"Expediting Higher Fidelity Plasma State Reconstructions for the DIII-D National Fusion Facility Using Leadership Class Computing Resources",
Extreme-Scale Experiment-in-the-Loop Computing (XLOOP),
November 2024,
Oscar Antepara, Samuel Williams, Max Carlson, Jerry Watkins,
"Performance Portable Optimizations of an Ice-sheet Modeling Code on GPU-supercomputers",
Performance, Portability & Productivity in HPC (P3HPC),
November 2024,
Oscar Antepara, Samuel Williams, Hans Johansen, Mary Hall,
"High-Performance, Scalable Geometric Multigrid via Fine-Grain Data Blocking for GPUs",
Performance, Portability & Productivity in HPC (P3HPC),
November 10, 2024,
Brian Austin, Dhruva Kulkarni, Brandon Cook, Samuel Williams, Nicholas J. Wright,
"System-Wide Roofline Profiling - a Case Study on NERSC’s Perlmutter Supercomputer",
Performance Modeling, Benchmarking, and Simulation (PMBS),
November 2024,
Shashank Subramanian, Ermal Rrapaj, Peter Harrington, Smeet Chheda, Steven… more authors » Farrell, Brian Austin, Samuel Williams, Nicholas Wright, Wahid Bhimji, « fewer authors
"Comprehensive Performance Modeling and System Design Insights for Foundation Models",
Performance Modeling, Benchmarking, and Simulation (PMBS),
November 2024,
Sean R Miller, Matthew Schipper, Lars G Fritsche, Ralph Jiang, Garth… more authors » Strohbehn, Erkin Ötleş, Benjamin H McMahon, Silvia Crivelli, Rafael Zamora‐Resendiz, Nithya Ramnath, Shinjae Yoo, Xin Dai, Kamya Sankar, Donna M Edwards, Steven G Allen, Michael D Green, Alex K Bryant, « fewer authors
"Pan‐Cancer Survival Impact of Immune Checkpoint Inhibitors in a National Healthcare System",
November 7, 2024,
A. Sim, E. Wang, R. Monga, J. Balcas, K. Wu, C. Guok, I. Monga, D. Davila,… more authors » F. Wurthwein, H. Newman, « fewer authors
Comparing Cache Utilization Trends for Regional Data Caches,
27th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2024),
2024,
J. Aldrich, A. Sim, K. Wu, S. Yoo, H. Ito, V. Garonne, E. Lancon,
"Exploring Data Caching Policy with Data Access Patterns from dCache Logs",
27th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2024),
2024,
Noah Goss, Samuele Ferracin, Akel Hashim, Arnaud Carignan-Dugas, John Mark… more authors » Kreikebaum, Ravi K Naik, David I Santiago, Irfan Siddiqi, « fewer authors
"Extending the computational reach of a superconducting qutrit processor",
npj Quantum Information,
2024,
10:101,
doi: 10.1038/s41534-024-00892-z
Show Details
This poster explores native parallel features in Fortran 2023 through the lens of supporting applications with libraries, compilers, and parallel runtimes. The language revision informally named Fortran 2008 introduced parallelism in the form of Single Program Multiple Data (SPMD) execution with two broad feature sets: (1) loop-level parallelism via do concurrent and (2) a Partitioned Global Address Space (PGAS) comprised of distributed “coarray” data structures. Fortran’s native parallelism has demonstrated high performance [1] and reduced the burden of inserting what sometimes amounts to more directives than code. Several compilers support both feature sets, typically by translating do concurrent into serial do loops annotated by parallel directives and by translating SPMD/PGAS features into direct calls to a communication library. Our research focuses primarily on two questions: (1) can the compiler’s parallel runtime library be developed in the language being compiled (Fortran) and (2) can we define an interface to the runtime that liberates compilers from being hardwired to one runtime and vice versa. We are answering these questions by developing the Parallel Runtime Interface for Fortran (PRIF) [2] and the Co-Array Fortran Framework of Efficient Interfaces to Network Environments (Caffeine) [3]. Caffeine is initially targeting adoption by LLVM Flang, a new open-source Fortran compiler developed by a broad community in industry, academia, and government labs. We are also exploring the use of these features in Inference-Engine, a deep learning library designed to facilitate neural network training and inference for high-performance computing applications written in modern Fortran.
CARLA'2024
Mahesh Lakshminarasimhan, Oscar Antepara, Tuowen Zhao, Benjamin Sepanski,… more authors » Protonu Basu, Hans Johansen, Mary Hall, Samuel Williams, « fewer authors
"Bricks: A high-performance portability layer for computations on block-structured grids",
The International Journal of High Performance Computing Applications (IJHPCA),
August 19, 2024,
doi: 10.1177/10943420241268288
Shakila Shafiq, Md. Sazzadur Rahman, Shamim Ahmed Shaon, Imtiaz Mahmud, A.… more authors » S. M. Sanwar Hosen, « fewer authors
"A Review on Software-Defined Networking for Internet of Things Inclusive of Distributed Computing, Blockchain, and Mobile Network Technology: Basics, Trends, Challenges, and Future Research Potentials",
International Journal of Distributed Sensor Networks,
August 13, 2024,
doi: 10.1155/2024/9006405
Mahesh Lakshminarasimhan, Mary Hall, Samuel Williams, Oscar Antepara,
"BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs",
Proceedings of the 53rd International Conference on Parallel Processing (ICPP),
August 12, 2024,
Show Details
Correlation coefficients and linear regression values computed from group averages can differ from correlation coefficients and linear regression values computed using individual scores. This observation known as the ecological fallacy often assumes that all the individual scores are available from a population. In many situations, one must use a sample from the larger population. In such cases, the computed correlation coefficient and linear regression values will depend on the sample that is chosen and the underlying sampling distribution. The sampling distribution of correlation coefficients and linear regression values for group averages will be identical to the sampling distribution for individuals for normally distributed variables for random samples drawn from infinitely large continuous distributions. However, data that is acquired in practice is often acquired when sampling without replacement from a finite population. Our objective is to demonstrate through Monte Carlo simulations that the sampling distributions for correlation and linear regression will also be similar for individuals and group averages when sampling without replacement from normally distributed variables. These simulations suggest that when a random sample from a population is selected, the correlation coefficients and linear regression values computed from individual scores will not be more accurate in estimating the entire population values compared to samples when group averages are used as long as the sample size is the same.
David McCallen, Arben Pitarka, Houjun Tang, Ramesh Pankajakshan, Anders Petersson, Mamun Miah,
"Transformational Regional-Scale Earthquake Simulations with the DOE EarthQuake SIMulation Exascale Framework",
Scientific Impact of the Exascale Computing Project (ECP),
August 1, 2024,
doi: 10.1109/MCSE.2024.3397768
Samuele Ferracin, Akel Hashim, Jean-Loup Ville, Ravi Naik, Arnaud Carignan-… more authors »Dugas, Hammam Qassim, Alexis Morvan, David I. Santiago, Irfan Siddiqi, Joel J. Wallman, « fewer authors
"Efficiently improving the performance of noisy quantum computers",
Quantum,
2024,
8:1410,
doi: 10.22331/q-2024-07-15-1410
Show Details
This document specifies an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is responsible for coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, and teams. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF procedures. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler’s own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.
Hiniduma, K., Byna, S., Bez, J. L., Madduri, R.,
"AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI",
36th International Conference on Scientific and Statistical Database Management (SSDBM 2024),
2024,
J. Gu, P. Lin, K. Wu, S.-H. Ku, C.S Chang, R. Hager, A. Scheinberg, J. Choi,
"Efficient Streaming Analysis of High-Resolution Plasma Transport",
36th International Conference on Scientific and Statistical Database Management (SSDBM 2024),
2024,
Egersdoerfer, C., Sareen, Arnav., Bez, J. L., Byna, S., Dai, D.,
"ION: Navigating HPC I/O Optimization Journey using Large Language Models",
16th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage'24),
2024,
doi: 10.1145/3655038.3665950
Show Details
Fortran 2023 natively supports single-program, multiple-data parallel programming with a partitioned global address space and collective subroutines, synchronization, atomics, locks, and more. Each of the four actively developed compilers that support Fortran’s parallel features uses its own parallel runtime library. The Parallel Runtime Interface for Fortran (PRIF) proposes to liberate compiler development from reliance on a single runtime and empower runtime developers to support more than one compiler. PRIF also aims to broaden the community of runtime developers to include the Fortran compiler’s users: Fortran programmers. PRIF does so by specifying the interface in Fortran, which makes it attractive to write the parallel runtime library in Fortran. Additionally, PRIF has been designed to be portable across both shared and distributed memory, varying architectures, as well as different operating systems. In this talk, I will describe the motivation behind the development of PRIF, describe the design of the interface itself and the benefits of adopting it. I will also provide a brief status report on the first PRIF implementation: Caffeine.
PASC'24 site
Nan Ding, Pieter Maris, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, LeAnn… more authors » Lindsey, Christopher Daley, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, Samuel Williams, « fewer authors
"Evaluating the potential of disaggregated memory systems for HPC applications",
Concurrency and Computation, Practice and Experience (CCPE),
May 2024,
doi: https://doi.org/10.1002/cpe.8147
Show Details
In 1951, Harlem Renaissance poet Langston Hughes asked this talk's titular question at the outset of a poem entitled "Harlem." Six years later, IBM mathematician John Backus developed Fortran, the world's first widely used high-level programming language. Backus went on to explore functional programming and to highlight the functional style in his Turing Award lecture in 1977, a year that also demarcates what one might consider the end of the classical era of Fortran. This talk will demonstrate how modern Fortran began to deliver on Backus's functional programming dream, starting with pure procedures in the 1995 standard. The talk will further demonstrate how this style culminated in a powerful and flexible facility for expressing independent iterations via the "do concurrent" construct, which the Fortran standard committee included in Fortran 2008 with the intention to facilitate automatic Graphics Processing Unit (GPU) programming. Fortran 2008 was published in 2010, but it took another decade for compilers to deliver on the promise of automatic GPU offloading. This talk will detail the trials and tribulations of Berkeley Lab's Fortran team in chasing the automatic offloading dream in our Inference-Engine deep learning library and Matcha high-performance computing (HPC) application.
Bin Dong, Kesheng Wu, Suren Byna,
"The Art of Sparsity: Mastering High-Dimensional Tensor Storage",
2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW),
May 27, 2024,
Hammad Ather, Jean Luca Bez, Yankun Xia, Suren Byna,
"Drilling Down I/O Bottlenecks with Cross-layer I/O Profile Exploration",
38th IEEE International Parallel & Distributed Processing Symposium,
San Francisco, CA, USA,
May 27, 2024,
Neeraj Rajesh, Keith Bateman, Jean Luca Bez, Suren Byna, Anthony Kougkas, Xian-He Sun,
"TunIO: An AI-powered Framework for Optimizing HPC I/O",
38th IEEE International Parallel & Distributed Processing Symposium,
San Fransicso, CA, US,
May 27, 2024,
D.K. Sung, Y. Son, A. Sim, K. Wu, S. Byna, H. Tang, H. Eom, C. Kim, S. Kim,
"A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis",
38th IEEE International Parallel & Distributed Processing Symposium (IPDPS2024),
2024,
Show Details
GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in emerging exascale systems. It provides network-independent, high-performance communication primitives including Remote Memory Access (RMA) and Active Messages (AM). GASNet-EX is an evolution of the popular GASNet communication system, building upon over 20 years of lessons learned, and the primary goals are high performance, interface portability, and expressiveness. The library has been used to implement parallel programming models and libraries such as UPC, UPC++, Fortran coarrays, Legion, Chapel, and many others.
This anthology collects together the four separate volumes that currently comprise the GASNet-EX specification, as of the 2024.5.0 release of GASNet-EX.
Show Details
This document specifies an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is responsible for coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, and teams. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF procedures. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler’s own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.
Patricia Gonzalez-Guerrero, Κylie Huch, Nirmalendu Patra, Thom Popovici, George Michelogiannakis,
"Towards practical superconducting accelerators for machine learning using U-SFQ",
ACM Journal on Emerging Technologies in Computing Systems,
April 2024,
Ankur Agrawal, Akash V. Dixit, Tanay Roy, Srivatsan Chakram, Kevin He,… more authors » Ravi K. Naik, David I. Schuster, Aaron Chou, « fewer authors
"Stimulated Emission of Signal Photons from Dark Matter Waves",
Physical Review Letters,
2024,
132:140801,
doi: 10.1103/PhysRevLett.132.140801
Hofmeyr S, Buluç A, Riley R, Egan R, Selvitopi O, Oliker L, Yelick K,… more authors » Shakya M, Youtsey B, Azad A, « fewer authors
"Exabiome: Advancing Microbial Science through Exascale Computing",
Computing in Science & Engineering,
April 1, 2024,
doi: 10.1109/MCSE.2024.3402546
Show Details
Most parallel scientific programs contain compiler directives (pragmas) such as those from OpenMP, explicit calls to runtime library procedures such as those implementing the Message Passing Interface (MPI), or compiler-specific language extensions such as those provided by CUDA. By contrast, the recent Fortran standards empower developers to express parallel algorithms without directly referencing lower-level parallel programming models. Fortran’s parallel features place the language within the Partitioned Global Address Space (PGAS) class of programming models. When writing programs that exploit data parallelism, application developers often find it straightforward to develop custom parallel algorithms. Problems involving complex, heterogeneous, staged calculations, however, pose much greater challenges. Such applications require careful coordination of tasks in a manner that respects dependencies prescribed by a directed acyclic graph. When rolling one’s own solution proves difficult, extending a customizable framework becomes attractive. The paper presents the design, implementation, and use of the Framework for Extensible Asynchronous Task Scheduling (FEATS), which we believe to be the first task scheduling tool written in modern Fortran. We describe the benefits and compromises associated with choosing Fortran as the implementation language, and we propose ways in which future Fortran standards can best support the use case in this paper.
L. Zhou, Q. Lin, K. Chowdhury, S. Masood, A. Eichenberger, H. Min, A. Sim,… more authors » J. Wang, Y. Wang, K. Wu, B. Yuan, J. Zou, « fewer authors
"Serving Deep Learning Model in Relational Databases",
27th International Conference on Extending Database Technology (EDBT2024),
2024,
Oluwamayowa Amusat, Adam Atia, Timothy Bartholomew, Alexander Dudchenko,
Cost-Optimization of Process-Scale Desalination Systems Incorporating Surrogate-based Water Chemistry Models,
INFORMS Optimization Society Conference,
March 22, 2024,
Imtiaz Mahmud, Mariam Kiran, Ewa Deelman, Anirban Mandal, Prasanna… more authors » Balaprakash, Krishnan Raghavan, Hongwei Jin, Cong Wang, Komal Thareja, George Papadimitriou, « fewer authors
Investigating BBRv3’s Performance in Large Science File Transfer on FABRIC,
KNIT’8 Workshop, San Diego, CA, USA,
March 19, 2024,
R. Frehner, K. Wu, A. Sim, J. Kim, K. Stockinger,
"Detecting Anomalies in Time Series Using Kernel Density Approaches",
IEEE Access,
2024,
doi: 10.1109/ACCESS.2024.3371891
R. Han, M, Zheng, S. Byna, H. Tang, B. Dong, D. Dai, Y. Chen, D. Kim, J.… more authors » Hassoun, D. Thorsley, M. Wolf, « fewer authors
"PROV-IO: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems",
IEEE Transactions on Parallel and Distributed Systems,
March 14, 2024,
Jan Balewski, Mercy G Amankwah, Roel Van Beeumen, E Wes Bethel, Talita Perciano, Daan Camps,
"Quantum-parallel vectorized data encodings and computations on trapped-ion and transmon QPUs",
Journal,
February 10, 2024,
14,
doi: https://doi.org/10.1038/s41598-024-53720-x
George Michelogiannakis, John Shalf,
Chiplets for HPC,
OCP Summit,
February 6, 2024,
Jean Luca Bez, Houjun Tang, Scot Breitenfeld, Huihuo Zheng, Wei-Keng Liao,… more authors » Kaiyuan Hou, Zanhua Huang, Suren Byna, « fewer authors
"h5bench: Exploring HDF5 Access Patterns Performance in Pre-Exascale Platforms",
Concurrency and Computation: Practice and Experience (CCPE),
January 31, 2024,
Sayera Dhaubhadel, Kumkum Ganguly, Ruy M Ribeiro, Judith D Cohn, James M… more authors » Hyman, Nicolas W Hengartner, Beauty Kolade, Anna Singley, Tanmoy Bhattacharya, Patrick Finley, Drew Levin, Haedi Thelen, Kelly Cho, Lauren Costa, Yuk-Lam Ho, Amy C Justice, John Pestian, Daniel Santel, Rafael Zamora-Resendiz, Silvia Crivelli, Suzanne Tamang, Susana Martins, Jodie Trafton, David W Oslin, Jean C Beckham, Nathan A Kimbrel, Benjamin H McMahon, « fewer authors
"High dimensional predictions of suicide risk in 4.2 million US Veterans using ensemble transfer learning",
scientific reports,
January 20, 2024,
Long B Nguyen, Yosep Kim, Akel Hashim, Noah Goss, Brian Marinelli, Bibek… more authors » Bhandari, Debmalya Das, Ravi K Naik, John Mark Kreikebaum, Andrew N Jordan, others, « fewer authors
"Programmable Heisenberg interactions between Floquet qubits",
Nature Physics,
2024,
20:240-246,
doi: 10.1038/s41567-023-02326-7
Oliver T, Varghese N, Roux S, Schulz F, Huntemann M, Clum A, Foster B,… more authors » Foster B, Riley R, LaButti K, Egan R, Hajek P, Mukherjee S, Ovchinnikova G, Reddy TBK, Calhoun S, Hayes RD, Rohwer RR, Zhou Z, Daum C, Copeland A, Chen I-MA, Ivanova NN, Kyrpides NC, Mouncey NJ, del Rio TG, Grigoriev IV, Hofmeyr S, Oliker L, Yelick K, Anantharaman K, McMahon KD, Woyke T, Eloe-Fadrosh EA, « fewer authors
"Coassembly and binning of a twenty-year metagenomic time-series from Lake Mendota",
Nature Scientific Data,
January 1, 2024,
doi: 10.1038/S41597-024-03826-8
2023
Show Details
This design document proposes an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is responsible for coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, and teams. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF procedures. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler’s own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.
Show Details
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
C. M. Oguchi, D. Ghosal, A. Sim, K. Wu,
"Counterfactual Analysis: A Case Study on Impact of External Events on Building Energy Consumption",
International Workshop on Big Data Analytics for Sustainability (BDA4S),
2023,
A, Sharma, X. Li, H. Guan, G. Sun, L. Zhang, L. Wang, K. Wu, L. Cao, E.… more authors » Zhu, A. Sim, T. Wu, J. Zou, « fewer authors
"Automatic Data Transformation Using Large Language Model – An Experimental Study on Building Energy Data",
IEEE International Conference on Big Data (BigData),
2023,
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman,… more authors » Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, « fewer authors
"UPC++ v1.0 Programmer’s Guide, Revision 2023.9.0",
Lawrence Berkeley National Laboratory Tech Report LBNL-2001560,
December 2023,
doi: 10.25344/S4P01J
Show Details
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes. UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Duncan Carpenter, Anjali Sandip, Samuel Kachuck, Daniel Martin,
"Does Damaged Ice affect Ice Sheet Evolution?",
American Geophysical Union Fall Meeting,
December 14, 2023,
J. W. Chung, A. Sim, B. Quiter, Y. Wu, W. Zhao, K. Wu,
"Preparing Spectral Data for Machine Learning: A Study of Geological Classification from Aerial Surveys",
Machine Learning and the Physical Sciences Workshop (ML4PS),
2023,
Hamza Errahmouni Barkam, Sanggeon Yun, Hanning Chen, Paul Gensler, Albi… more authors » Mema, Andrew Ding, George Michelogiannakis, Hussam Amrouch, Mohsen Imani, « fewer authors
"Reliable hyperdimensional reasoning on unreliable emerging technologies",
IEEE/ACM International Conference on Computer Aided Design (ICCAD),
November 2023,
Jordan Hines, Marie Lu, Ravi K. Naik, Akel Hashim, Jean-Loup Ville, Brad… more authors » Mitchell, John Mark Kriekebaum, David I. Santiago, Stefan Seritan, Erik Nielsen, Robin Blume-Kohout, Kevin Young, Irfan Siddiqi, Birgitta Whaley, Timothy Proctor, « fewer authors
"Demonstrating Scalable Randomized Benchmarking of Universal Gate Sets",
Phys. Rev. X,
2023,
041030,
doi: 10.1103/PhysRevX.13.041030
J. Gu, P. Lin, K. Wu, S-H. Ku, C.S. Chang, R.M. Churchill, J. Choi, N. Podhorszki, S. Klasky,
"Unraveling Diffusion in Fusion Plasma: A Case Study of In Situ Processing and Particle Sorting",
In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV'23),
2023,
I. Mahmud, G. Papadimitriou, G. Wang, M. Kiran, A. Mandal, E. Deelman,
"Elephants Sharing the Highway: Studying TCP Fairness in Large Transfers over High Throughput Links",
10th International Workshop on Innovating the Network for Data Intensive Science (INDIS 2023),
2023,
doi: 10.1145/3624062.3624594
Yang Liu, Nan Ding, Piyush Sao, Samuel Williams, Xiaoye Sherry Li,
"Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters",
Supercomputing (SC),
November 2023,
Oscar Antepara, Samuel Williams, Scott Kruger, Torrin Bechtel, Joseph McClenaghan, Lang Lao,
"Performance-Portable GPU Acceleration of the EFIT Tokamak Plasma Equilibrium Reconstruction Code",
Workshop on Accelerator Programming and Directives (WACCPD),
November 2023,
Oscar Antepara, Hans Johansen, Samuel Williams, Tuowen Zhao, Samantha… more authors » Hirsch, Priya Goyal, Mary Hall, « fewer authors
"Performance portability evaluation of blocked stencil computations on GPUs",
International Workshop on Performance, Portability & Productivity in HPC (P3HPC),
November 2023,
Show Details
Sparse symmetric positive definite systems of equations are ubiquitous in scientific workloads and applications. Parallel sparse Cholesky factorization is the method of choice for solving such linear systems. Therefore, the development of parallel sparse Cholesky codes that can efficiently run on today’s large-scale heterogeneous distributed-memory platforms is of vital importance. Modern supercomputers offer nodes that contain a mix of CPUs and GPUs. To fully utilize the computing power of these nodes, scientific codes must be adapted to offload expensive computations to GPUs.
We present symPACK, a GPU-capable parallel sparse Cholesky solver that uses one-sided communication primitives and remote procedure calls provided by the UPC++ library. We also utilize the UPC++ "memory kinds" feature to enable efficient communication of GPU-resident data. We show that on a number of large problems, symPACK outperforms comparable state-of-the-art GPU-capable Cholesky factorization codes by up to 14x on the NERSC Perlmutter supercomputer.
R. Monga, A. Sim (advisor), K. Wu (advisor),
"Comparative Study of the Cache Utilization Trends for Regional Scientific Data Caches",
ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’23), ACM Student Research Competition (SRC), First place winner,
2023,
Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian Jin, Houjun… more authors » Tang, Jean Sexton, Sheng Di, Kai Zhao, Bo Fang, Zarija Lukić, Franck Cappello, James Ahrens, Dingwen Tao, « fewer authors
"AMRIC: A novel in situ lossy compression framework for efficient I/O in adaptive mesh refinement applications",
SC23: International Conference for High Performance Computing, Networking, Storage and Analysis,
November 12, 2023,
doi: 10.1145/3581784.3613212
Jakob Luettgau, Shane Snyder, Tyler Reddy, Nikolaus Awtrey, Kevin Harms,… more authors » Jean Luca Bez, Rui Wang, Rob Latham, Philip Carns, « fewer authors
"Enabling Agile Analysis of I/O Performance Data with PyDarshan",
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis,
Denver, CO, USA,
Association for Computing Machinery,
November 12, 2023,
1380–1391,
doi: 10.1145/3624062.3624207
Show Details
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models.
The tutorial is targeted for users with little-to-no parallel programming experience, but everyone is welcome. A partial differential equation example will be demonstrated in all three programming models. That example and others will be provided to attendees in a virtual environment. Attendees will be shown how to compile and run these programming examples, and the virtual environment will remain available to attendees throughout the conference, along with Slack-based interactive tech support.
Come join us to learn about some productive and performant parallel programming models!
SC23 event page
E Wes Bethel, Mercy G Amankwah, Jan Balewski, Roel Van Beeumen, Daan Camps,… more authors » Daniel Huang, Talita Perciano, « fewer authors
"Quantum computing and visualization: A disruptive technological change ahead",
Journal,
November 6, 2023,
43,
doi: https://doi.org/10.1109/MCG.2023.3316932
Meriam Gay Bautista, Darren Lyles, Kylie Huch, Patricia Gonzalez-Guerrero, George Michelogiannakis,
"Area Efficient Asynchronous SFQ Pulse Round-Robin Distribution Network",
IEEE Transactions on Circuits and Systems I: Regular Papers,
November 2023,
George Michelogiannakis, Yehia Arafa, Brandon Cook, Liang Yuan Dai, Abdel-… more authors »Hameed Badawy, Madeleine Glick, Keren Bergman, John Shalf, « fewer authors
Efficient Intra-Rack Resource Disaggregation in HPC Using Co-Packaged DWDM Photonics,
IEEE Cluster 2023,
November 1, 2023,
Tong Wu, Anna Scaglione, Adrian Petru Surani, Daniel Arnold, Sean Peisert,
"Network-Constrained Reinforcement Learning for Optimal EV Charging Control",
Proceedings of the IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm),
October 2023,
C.S. Chang, S-H. Ku, R. Hager, J. Choi, D. Pugmire, S. Klasky, Scott, A.… more authors » Loarte, R. Pitts, J. Gu, J. Wu, « fewer authors
The role of turbulent separatrix tangle in the improvement of the integrated pedestal/heat exhaust issue for stationary operation in ITER and Fusion Reactors,
APS Division of Plasma Physics Meeting,
2023,
Maximilian Bremer, Nirmalendu Patra, Tan Nguyen, Dilip Vasudevan, Cy Chan,
"Benefits of Optimistic Parallel Discrete Event Simulation for Network-on-Chip Simulation",
2023 IEEE/ACM 27th International Symposium on Distributed Simulation and Real Time Applications (DS-RT),
Singapore,
October 2, 2023,
doi: 10.1109/DS-RT58998.2023.00013
Robert Currie, Sean Peisert, Anna Scaglione, Aram Shumavon, Nikhil Ravi,
"Data Privacy for the Grid: Toward a Data Privacy Standard for Inverter-Based and Distributed Energy Resources",
IEEE Power & Energy Magazine,
October 1, 2023,
Show Details
Computing at large scales has become extremely challenging due to increasing heterogeneity in both hardware and software. More and more scientific workflows must tackle a range of scales and use machine learning and AI intertwined with more traditional numerical modeling methods, placing more demands on computational platforms. These constraints indicate a need to fundamentally rethink the way computational science is done and the tools that are needed to enable these complex workflows. The current set of C++-based solutions may not suffice, and relying exclusively upon C++ may not be the best option, especially because several newer languages and boutique solutions offer more robust design features to tackle the challenges of heterogeneity. In June 2023, we held a mini symposium that explored the use of newer languages and heterogeneity solutions that are not tied to C++ and that offer options beyond template metaprogramming and Parallel. For for performance and portability. We describe some of the presentations and discussion from the mini symposium in this article.
Akel Hashim, Stefan Seritan, Timothy Proctor, Kenneth Rudinger, Noah Goss,… more authors » Ravi K Naik, John Mark Kreikebaum, David I Santiago, Irfan Siddiqi, « fewer authors
"Benchmarking quantum logic operations relative to thresholds for fault tolerance",
npj Quantum Information,
2023,
9:109,
doi: 10.1038/s41534-023-00764-y
André Ramos Carneiro, Jean Luca Bez, Carla Osthoff, Lucas Mello Schnorr, Philippe O.A. Navaux,
"Uncovering I/O demands on HPC platforms: Peeking under the hood of Santos Dumont",
Journal of Parallel and Distributed Computing,
August 18, 2023,
182,
doi: https://doi.org/10.1016/j.jpdc.2023.104744
Riley R, Bowers RM, Camargo AP, Campbell A, Egan R, Eloe-Fadrosh EA,… more authors » Foster B, Hofmeyr S, Huntemann M, Kellom M, Kimbrel JA, Oliker L, Yelick K, Pett-Ridge J, Salamov A, Varghese NJ, Clum A, « fewer authors
"Terabase-Scale Coassembly of a Tropical Soil Microbiome",
Microbiology Spectrum,
August 17, 2023,
doi: 10.1128/SPECTRUM.00200-23
GM Wallace, Z Bai, N Bertelli, EW Bethel, T Perciano, S Shiraiwa, JC Wright,
"Towards Fast, Accurate Predictions of RF Simulations via Data-driven Modeling: Forward and Lateral Models",
Conference,
AIP Publishing,
August 1, 2023,
2984,
doi: https://doi.org/10.1063/5.0162422
Hao Li, Han Cai, Joseph Forman, Ran Cheng, et al.,
"Transport Properties of NbN Thin Films Patterned With a Focused Helium Ion Beam",
IEEE Transactions on Applied Superconductivity,
August 2023,
Ran Cheng, Christoph Kirst, Dilip Vasudevan,
"Superconducting-Oscillatory Neural Network With Pixel Error Detection for Image Recognition",
IEEE Transaction on Applied Superconductivity,
August 2023,
33:1-7,
Show Details
A majority of HPC system users utilize scripting languages such as Python to prototype their computations, coordinate their large executions, and analyze the data resulting from their computations. Python is great for these many uses, but it frequently falls short when significantly scaling up the amount of data and computation, as required to fully leverage HPC system resources. In this tutorial, we show how example computations such as heat diffusion, k-mer counting, file processing, and distributed maps can be written to efficiently leverage distributed computing resources in the Chapel, UPC++, and Fortran parallel programming models. This tutorial should be accessible to users with little-to-no parallel programming experience, and everyone is welcome. A partial differential equation example will be demonstrated in all three programming models along with performance and scaling results on big machines. That example and others will be provided in a cloud instance and Docker container. Attendees will be shown how to compile and run these programming examples, and provided opportunities to experiment with different parameters and code alternatives while being able to ask questions and share their own observations. Come join us to learn about some productive and performant parallel programming models!
Secondary tutorial sites by event sponsors:
Sachin Kadam, Anna Scaglione, Nikhil Ravi, Sean Peisert, Brent Lunghino, Aram Shumavon,
"Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets",
Proceedings of the 2023 IEEE International Conference on Smart Applications, Communications and Networking (SmartNets),
Istanbul, Turkey,
July 25, 2023,
H-C. Yang, L. Jin, A. Lazar, A. Todd-Blick, A. Sim, K. Wu, Q. Chen, C. A. Spurlock,
"Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective",
Systems,
2023,
11(6):314,
doi: 10.3390/systems11060314
R. Shao, A. Sim, K. Wu, J. Kim,
"Leveraging History to Predict Abnormal Transfers in Distributed Workflows",
Sensors,
2023,
23(12):5485,
doi: 10.3390/s23125485
Bin Dong, Jean Luca Bez, Suren Byna,
"AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis.",
In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’23),
June 16, 2023,
Show Details
Paul H. Hargrove has been involved in the world of Partitioned Global Address Space (PGAS) programming models since 1999, before he knew such a thing existed. Early involvement in the GASNet communications library as used in implementations of UPC, Titanium and Co-array Fortran convinced Paul that one could have productivity and performance without sacrificing one for the other. Since then he has been among the apostates who work to overturn the belief that message-passing is the only (or best) way to program for High-Performance Computing (HPC). Paul has been fortunate to witness the history of the PGAS community through several rare opportunities, including interactions made possible by the wide adoption of GASNet and through operating a PGAS booth at the annual SC conferences from 2007 to 2017. In this talk, Paul will share some highlights of his experiences across 24 years of PGAS history. Among these is the DARPA High Productivity Computing Systems (HPCS) project which helped give birth to Chapel.
CHIUW 2023 website
George Michelogiannakis,
Analyzing Resource Utilization in an HPC System: A Case Study of NERSC’s Perlmutter,
ISC High Performance,
May 2023,
Hammad Ather, Jean Luca Bez, Boyana Norris, Suren Byna,
"Illuminating the I/O Optimization Path of Scientific Applications",
High Performance Computing: 38th International Conference, ISC High Performance 2023, Hamburg, Germany, May 21–25, 2023, Proceedings,
Hamburg, Germany,
Springer-Verlag,
May 21, 2023,
22–41,
doi: https://doi.org/10.1007/978-3-031-32041-5_2
Show Details
The existing parallel I/O stack is complex and difficult to tune due to the interdependencies among multiple factors that impact the performance of data movement between storage and compute systems. When performance is slower than expected, end-users, developers, and system administrators rely on I/O profiling and tracing information to pinpoint the root causes of inefficiencies. Despite having numerous tools that collect I/O metrics on production systems, it is not obvious where the I/O bottlenecks are (unless one is an I/O expert), their root causes, and what to do to solve them. Hence, there is a gap between the currently available metrics, the issues they represent, and the application of optimizations that would mitigate performance slowdowns. An I/O specialist often checks for common problems before diving into the specifics of each application and workload. Streamlining such analysis, investigation, and recommendations could close this gap without requiring a specialist to intervene in every case. In this paper, we propose a novel interactive, user-oriented visualization, and analysis framework, called Drishti. This framework helps users to pinpoint various root causes of I/O performance problems and to provide a set of actionable recommendations for improving performance based on the observed characteristics of an application. We evaluate the applicability and correctness of Drishti using four use cases from distinct science domains and demonstrate its value to end-users, developers, and system administrators when seeking to improve an application’s I/O performance.
Popovici DT, Awan MG, Guidi G, Egan R, Hofmeyr S, Oliker L, Yelick K,
"Designing Efficient SIMD Kernels for High Performance Sequence Alignment",
2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW),
May 19, 2023,
doi: 10.1109/IPDPSW59300.2023.00038
John Ravi, Suren Byna, Quincey Koziol, Houjun Tang, Michela Becchi,
"Evaluating Asynchronous Parallel I/O on HPC Systems",
2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS),
May 15, 2023,
doi: 10.1109/IPDPS54959.2023.00030
Md Kamal Hossain Chowdhury, Houjun Tang, Jean Luca Bez, Purushotham V. Bangalore, Suren Byna,
"Efficient Asynchronous I/O with Request Merging",
2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW),
St. Petersburg, FL, USA,
IEEE,
2023,
628-636,
doi: 10.1109/IPDPSW59300.2023.00107
J. Bellavita, C. Sim, K. Wu, A. Sim, S. Yoo, H. Ito, V. Garonne, E. Lancon,
Understanding Data Access Patterns for dCache System,
26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023),
2023,
C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, F. Wurthwein, D. Davila, H. Newman, J. Balcas,
Predicting Resource Usage Trends with Southern California Petabyte Scale Cache,
26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023),
2023,
S. Kim, A. Sim, K. Wu, S. Byna, Y. Son, H. Eom,
"Design and Implementation of I/O Performance Prediction Scheme on HPC Systems through Large-scale Log Analysis",
Journal of Big Data,
2023,
10(65),
doi: 10.1186/s40537-023-00741-4
Alex Doe, Jane Doe, Dianna LaFerry, John Smith,
"Test Title for Sample Publication",
Conference,
April 22, 2023,
No.1:555-600,
Show Details
This is a test publication for the purposes of explaining the SilverStripe 4 local publications database. It is intended as a guidepost for users and does not contain any relevant scientific information. All authors, titles, and dates are fictitious.
Sean Peisert,
"The First 20 Years of IEEE Security & Privacy [From the Editors]",
IEEE Security & Privacy,
April 1, 2023,
21(2):4-6,
doi: 10.1109/MSEC.2023.3236420
George Cybenko, Carl Landwehr, Shari Lawrence Pfleeger, Sean Peisert,
A 20th Anniversary Episode Chat With S&P Editors,
IEEE Security & Privacy,
Pages: 9-16
April 2023,
doi: 10.1109/MSEC.2023.3239179
Show Details
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman,… more authors » Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, « fewer authors
"UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0",
Lawrence Berkeley National Laboratory Tech Report,
March 30, 2023,
LBNL 2001517,
doi: 10.25344/S43591
Show Details
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Raghu Bollapragada, Stefan M. Wild,
"Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization",
Mathematical Programming Computation,
2023,
15:327--364,
doi: 10.1007/s12532-023-00233-9
Damian Rouson,
Producing Software for Science with Class,
SIAM Conference on Computational Science and Engineering,
March 1, 2023,
Show Details
The Computer Languages and Systems Software (CLaSS) Group at Berkeley Lab researches and develops programming models, languages, libraries, and applications for parallel and quantum computing. The open-source software under development in CLaSS includes the GASNet-EX networking middleware, the UPC++ partitioned global address space (PGAS) template library, the Berkeley Quantum Synthesis Toolkit (BQSKit), and the MetaHipMer metagenome assembler. This talk will start with an overview of CLaSS software and the software sustainability practices commonly employed across the group. The talk will then dive more deeply into the our burgeoning contributions to the ecosystem supporting modern Fortran, including our test development for the LLVM Flang Fortran compiler. This presentation will demonstrate how agile software development techniques are helping to ensure robust front-end support for standard Fortran 2018 parallel programming features. The talk will also present several key insights that inspired our design and development of the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine) parallel runtime library, emphasizing the design choices that help to ensure sustainability. Lastly, the talk will demonstrate the productivity benefits associated with the first Caffeine application in Motility Analysis of T-Cell Histories in Activation (Matcha).
SIAM Session
McCoy H, Hofmeyr S, Yelick K, Pandey P,
"High-Performance Filters for GPUs",
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming,
February 25, 2023,
doi: 10.1145/3572848.3577507
C. Sim, K. Wu, A. Sim, I. Monga, C. Guok, F. Wurthwein, D. Davila, H. Newman, J. Balcas,
"Effectiveness and predictability of in-network storage cache for Scientific Workflows",
International Conference on Computing, Networking and Communication (ICNC 2023),
2023,
doi: 10.1109/ICNC57223.2023.10074058
Show Details
Most parallel scientific programs contain compiler directives (pragmas) such as those from OpenMP, explicit calls to runtime library procedures such as those implementing the Message Passing Interface (MPI), or compiler-specific language extensions such as those provided by CUDA. By contrast, the recent Fortran standards empower developers to express parallel algorithms without directly referencing lower-level parallel programming models. Fortran’s parallel features place the language within the Partitioned Global Address Space (PGAS) class of programming models. When writing programs that exploit data-parallelism, application developers often find it straightforward to develop custom parallel algorithms. Problems involving complex, heterogeneous, staged calculations, however, pose much greater challenges. Such applications require careful coordination of tasks in a manner that respects dependencies prescribed by a directed acyclic graph. When rolling one’s own solution proves difficult, extending a customizable framework becomes attractive. The paper presents the design, implementation, and use of the Framework for Extensible Asynchronous Task Scheduling (FEATS), which we believe to be the first task-scheduling tool written in modern Fortran. We describe the benefits and compromises associated with choosing Fortran as the implementation language, and we propose ways in which future Fortran standards can best support the use case in this paper.
J. Wang, K. Wu, A. Sim, S. Hwangbo,
"Locating Partial Discharges in Power Transformers with Convolutional Iterative Filtering",
Sensors,
2023,
23,
doi: 10.3390/s23041789
Nathan A. Kimbrel, Allison E. Ashley-Koch, Xue J. Qin, Jennifer H.… more authors » Lindquist, Melanie E. Garrett, Michelle F. Dennis, Lauren P. Hair, Jennifer E. Huffman, Daniel A. Jacobson, Ravi K. Madduri, Jodie A. Trafton, Hilary Coon, Anna R. Docherty, Niamh Mullins, Douglas M. Ruderfer, Philip D. Harvey, Benjamin H. McMahon, David W. Oslin, Jean C. Beckham, Elizabeth R. Hauser, Michael A. Hauser, Million Veteran Program Suicide Exemplar Workgroup, International Suicide Genetics Consortium, Veterans Affairs Mid-Atlantic Mental Illness Research Education and Clinical Center Workgroup, Veterans Affairs Million Veteran Program, « fewer authors
"Identification of Novel, Replicable Genetic Risk Loci for Suicidal Thoughts and Behaviors Among US Military Veterans",
JAMA Psychiatry,
February 1, 2023,
80:100-191,
doi: 10.1001/jamapsychiatry.2022.3896
Hector G. Martin, Tijana Radivojevic, Jeremy Zucker, Kristofer Bouchard,… more authors » Jess Sustarich, Sean Peisert, Dan Arnold, Nathan Hillson, Gyorgy Babnigg, Jose M. Marti, Christopher J. Mungall, Gregg T. Beckham, Lucas Waldburger, James Carothers, ShivShankar Sundaram, Deb Agarwal, Blake A. Simmons, Tyler Backman, Deepanwita Banerjee, Deepti Tanjore, Lavanya Ramakrishnan, Anup Singh, « fewer authors
"Perspectives for Self-Driving Labs in Synthetic Biology",
Current Opinion in Biotechnology,
February 2023,
doi: 10.1016/j.copbio.2022.102881
Show Details
The Pagoda project is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. The first component is GASNet-EX, a portable, high-performance, global-address-space communication library. The second component is UPC++, a C++ template library. Together, these libraries enable agile, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
GASNet-EX is a portable, high-performance communications middleware library which leverages hardware support to implement Remote Memory Access (RMA) and Active Message communication primitives. GASNet-EX supports a broad ecosystem of alternative HPC programming models, including UPC++, Legion, Chapel and multiple implementations of UPC and Fortran Coarrays. GASNet-EX is implemented directly over the native APIs for networks of interest in HPC. The tight semantic match of GASNet-EX APIs to the client requirements and hardware capabilities often yields better performance than competing libraries.
UPC++ provides high-level productivity abstractions appropriate for Partitioned Global Address Space (PGAS) programming such as: remote memory access (RMA), remote procedure call (RPC), support for accelerators (e.g. GPUs), and mechanisms for aggressive asynchrony to hide communication costs. UPC++ implements communication using GASNet-EX, delivering high performance and portability from laptops to exascale supercomputers. HPC application software using UPC++ includes: MetaHipMer2 metagenome assembler, SIMCoV viral propagation simulation, NWChemEx TAMM, and graph computation kernels from ExaGraph.
George Michelogiannakis,
A Case for Intra-Rack Resource Disaggregation for HPC,
HiPEAC conference 2023,
January 17, 2023,
Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Gay Bautista, George Michelogiannakis,
"PaST-NoC: A Packet-Switched Superconducting Temporal NoC",
IEEE Transactions on Applied Superconductivity,
January 2023,
H-C. Yang, L. Jin, A. Lazar, A. Todd-Blick, A. Sim, K. Wu, Q. Chen, C. A. Spurlock,
Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective,
Transportation Research Board 102nd Annual Meeting,,
2023,
J. Bang, A. Sim, G. Lockwood, H. Eom, H. Sung,
"Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems",
IEEE Access,
2023,
doi: 10.1109/ACCESS.2022.3233829
"Singleton Sieving: Overcoming the Memory/Speed Trade-Off in Exascale k-mer Analysis",
SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23),
January 1, 2023,
doi: 10.25344/S4TP4T
2022
V. Cirigliano, Z. Davoudi, J. Engel, R. J. Furnstahl, G. Hagen, U. Heinz,… more authors » H. Hergert, M. Horoi, C. W. Johnson, A. Lovato, E. Mereghetti, W. Nazarewicz, A. Nicholson, T. Papenbrock, S. Pastore, M. Plumlee, D. R. Phillips, P. E. Shanahan, S. R. Stroberg, F. Viens, A. Walker-Loud, K. A. Wendt, S. M. Wild, « fewer authors
"Towards Precise and Accurate Calculations of Neutrinoless Double-Beta Decay",
Journal of Physics G: Nuclear and Particle Physics,
2022,
49:120502,
doi: 10.1088/1361-6471/aca03e
Daniel Martin, Samuel Kachuck, Joanna Millstein, Brent Minchew,
"Examining the Sensitivity of Ice Sheet Models to Updates in Rheology (n=4)",
AGU Fall Meeting,
December 15, 2022,
Show Details
GASNet Celebrates 20th Anniversary
For 20 years, Berkeley Lab’s GASNet has been fueling developers’ ability to tap the power of massively parallel supercomputers more effectively. The middleware was recently upgraded to support exascale scientific applications.
Noah Goss, Alexis Morvan, Brian Marinelli, Bradley K Mitchell, Long B… more authors » Nguyen, Ravi K Naik, Larry Chen, Christian J{\"u}nger, John Mark Kreikebaum, David I Santiago, others, « fewer authors
"High-fidelity qutrit entangling gates for superconducting circuits",
Nature Communications,
2022,
13:7481,
doi: 10.1038/s41467-022-34851-z
Ammar Haydari, Chen-Nee Chuah, Michael Zhang, Jane Macfarlane, Sean Peisert,
"Differentially Private Map Matching for Mobility Trajectories",
Proceedings of the 2022 Annual Computer Security Applications Conference (ACSAC),
Austin, TX,
ACM,
December 2022,
doi: 0.1145/3564625.3567974
D. Fan, D. E. Willcox, C. DeGrendele, M. Zingale, and A. Nonaka,
"Neural Networks for Nuclear Reactions in MAESTROeX",
he Astrophysical Journal,
November 29, 2022,
940,
Melissa L. Graham, Robert A. Knop, Thomas Kennedy, Peter E. Nugent, Eric… more authors » Bellm, Márcio Catelan, Avi Patel, Hayden Smotherman, Monika Soraisam, Steven Stetzler, Lauren N. Aldoroty, Autumn Awbrey, Karina Baeza-Villagra, Pedro H. Bernardinelli, Federica Bianco, Dillon Brout, Riley Clarke, William I. Clarkson, Thomas Collett, James R. A. Davenport, Shenming Fu, John E. Gizis, Ari Heinze, Lei Hu, Saurabh W. Jha, Mario Jurić, J. Bryce Kalmbach, Alex Kim, Chien-Hsiu Lee, Chris Lidman, Mark Magee, Clara E. Martínez-Vázquez, Thomas Matheson, Gautham Narayan, Antonella Palmese, Christopher A. Phillips, Markus Rabus, Armin Rest, Nicolás Rodríguez-Segovia, Rachel Street, A. Katherina Vivas, Lifan Wang, Nicholas Wolf, Jiawen Yang, « fewer authors
"Deep drilling in the time domain with DECam: Survey characterization",
Monthly Notices of the Royal Astronomical Society,
November 2022,
Taylor Groves, Chris Daley, Rahulkumar Gayatri, Hai Ah Nam, Nan Ding,… more authors » Lenny Oliker, Nicholas J. Wright, Samuel Williams, « fewer authors
"A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures",
PMBS,
November 2022,
Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan,… more authors » LeAnn Lindsey, Christopher Daley, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, « fewer authors
Methodology for Evaluating the Potential of Disaggregated Memory Systems,
https://resdis.github.io/ws/2022/sc/,
November 18, 2022,
Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan,… more authors » Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, « fewer authors
"Methodology for Evaluating the Potential of Disaggregated Memory Systems",
RESDIS, https://resdis.github.io/ws/2022/sc/,
November 18, 2022,
Andrew Adams, Emily K. Adams, Dan Gunter, Ryan Kiser, Mark Krenz, Sean Peisert, John Zage,
"Roadmap for Securing Operational Technology in NSF Scientific Research",
Trusted CI Report,
November 16, 2022,
doi: 10.5281/zenodo.7327987
Julian Bellavita, Alex Sim (advisor), John Wu (advisor),
"Predicting Scientific Dataset Popularity Using dCache Logs",
ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC), Second place winner,
2022,
Show Details
Poster (PDF)
The dCache installation is a storage management system that acts as a disk cache for high-energy physics (HEP) data. Storagespace on dCache is limited relative to persistent storage devices, therefore, a heuristic is needed to determine what data should be kept in the cache. A good cache policy would keep frequently accessed data in the cache, but this requires knowledge of future dataset popularity. We present methods for forecasting the number of times a dataset stored on dCache will be accessed in the future. We present a deep neural network that can predict future dataset accesses accurately, reporting a final normalized loss of 4.6e-8. We present a set of algorithms that can forecast future dataset accesses given an access sequence. Included are two novel algorithms, Backup Predictor and Last N Successors, that outperform other file prediction algorithms. Findings suggest that it is possible to anticipate dataset popularity in advance.
Katherine Rasmussen, Damian Rouson, Naje George, Dan Bonachea, Hussain Kadhem, Brian Friesen,
"Agile Acceleration of LLVM Flang Support for Fortran 2018 Parallel Programming",
Research Poster at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22),
November 2022,
doi: 10.25344/S4CP4S
Show Details
The LLVM Flang compiler ("Flang") is currently Fortran 95 compliant, and the frontend can parse Fortran 2018. However, Flang does not have a comprehensive 2018 test suite and does not fully implement the static semantics of the 2018 standard. We are investigating whether agile software development techniques, such as pair programming and test-driven development (TDD), can help Flang to rapidly progress to Fortran 2018 compliance. Because of the paramount importance of parallelism in high-performance computing, we are focusing on Fortran’s parallel features, commonly denoted “Coarray Fortran.” We are developing what we believe are the first exhaustive, open-source tests for the static semantics of Fortran 2018 parallel features, and contributing them to the LLVM project. A related effort involves writing runtime tests for parallel 2018 features and supporting those tests by developing a new parallel runtime library: the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine).
Extended Abstract and Poster
Video presentation
C. Sim, C. Guok (advisor), A. Sim (advisor), K. Wu (advisor),
"Data Throughput Performance Trends of Regional Scientific Data Cache",
ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’22), ACM Student Research Competition (SRC),
2022,
Jean Luca Bez, Hammad Ather, Suren Byna,
"Drishti: Guiding End-Users in the I/O Optimization Journey",
PDSW 2022, held in conjunction with SC22,
2022,
Show Details
Partitioned Global Address Space (PGAS) programming models, typified by systems such as Unified Parallel C (UPC) and Fortran coarrays, expose one-sided Remote Memory Access (RMA) communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.
GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in emerging exascale machines. The library is an evolution of the popular GASNet communication system, building upon 20 years of lessons learned. We present microbenchmark results which demonstrate the RMA performance of GASNet-EX is competitive with MPI implementations on four recent, high-impact, production HPC systems. These results are an update relative to previously published results on older systems. The networks measured here are representative of hardware currently used in six of the top ten fastest supercomputers in the world, and all of the exascale systems on the U.S. DOE road map.
Talk Slides
Rajeev Jain, Houjun Tang, Akash Dhruv, J Austin Harris, Suren Byna,
"Accelerating flash-x simulations with asynchronous I/O",
https://ieeexplore.ieee.org/abstract/document/10026923/,
November 13, 2022,
doi: 10.1109/PDSW56643.2022.00008
Benjamin Sepanski, Tuowen Zhao, Hans Johansen, Samuel Williams,
"Maximizing Performance Through Memory Hierarchy-Driven Data Layout Transformations",
MCHPC,
November 2022,
Mathias Weiden, Justin Kalloor, John Kubiatowicz, Ed Younis, Costin Iancu,
"Wide Quantum Circuit Optimization with Topology Aware Synthesis",
Third International Workshop on Quantum Computing Software,
November 13, 2022,
Show Details
Unitary synthesis is an optimization technique that can achieve optimal gate counts while mapping quantum circuits to restrictive qubit topologies. Synthesis algorithms are limited in scalability by their exponentially growing run times. Application to wide circuits requires partitioning into smaller components. In this work, we explore methods to reduce depth and multi-qubit gate count of wide, mapped quantum circuits using synthesis. We present TopAS, a topology aware synthesis tool that preconditions quantum circuits before mapping. Partitioned subcircuits are optimized and fitted to sparse subtopologies to balance the opposing demands of synthesis and mapping algorithms. Compared to state of the art wide circuit synthesis algorithms, TopAS is able to reduce depth on average by 35.2% and CNOT count by 11.5% for mesh topologies. Compared to the optimization and mapping algorithms of Qiskit and Tket, TopAS is able to reduce CNOT counts by 30.3% and depth by 38.2% on average.
Show Details
This paper provides an introduction to the CoArray Fortran Framework of Efficient Interfaces to Network Environments (Caffeine), a parallel runtime library built atop the GASNet-EX exascale networking library. Caffeine leverages several non-parallel Fortran features to write type- and rank-agnostic interfaces and corresponding procedure definitions that support parallel Fortran 2018 features, including communication, collective operations, and related services. One major goal is to develop a runtime library that can eventually be considered for adoption by LLVM Flang, enabling that compiler to support the parallel features of Fortran. The paper describes the motivations behind Caffeine's design and implementation decisions, details the current state of Caffeine's development, and previews future work. We explain how the design and implementation offer benefits related to software sustainability by lowering the barrier to user contributions, reducing complexity through the use of Fortran 2018 C-interoperability features, and high performance through the use of a lightweight communication substrate.
Talk Slides
George Michelogiannakis,
Intra-Rack Resource Disaggregation Using Emerging Photonics,
OCP global summit,
October 19, 2022,
John Shalf, George Michelogiannakis,
Heterogeneous Integration for HPC,
OCP global summit,
October 19, 2022,
Oluwamayowa O. Amusat, Tim Barthlomew, Adam A. Atia,
Cost optimization of desalination systems using WaterTAP incorporating detailed water chemistry models,
2022 INFORMS Annual Meeting,
2022,
Show Details
The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. The BSSwF’s vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). Case studies from several of the program’s participants illustrate the diverse ways the BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as the BSSwF can help recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations, and ideas for a larger audience.
Show Details
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman,… more authors » Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, « fewer authors
"UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0",
Lawrence Berkeley National Laboratory Tech Report,
September 30, 2022,
LBNL 2001479,
doi: 10.25344/S4QW26
Show Details
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power,
"SoK: Limitations of Confidential Computing via TEEs for High-Performance Compute Systems",
Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED),
September 2022,
Mateusz Pusz, Gašper Ažman, Bengt Gustafsson, Colin MacLean, Corentin Jabot,
"Universal Template Parameters",
ISO C++ Standard Mailing,
September 2022,
Show Details
This paper proposes a unified model for universal template parameters (UTPs) and dependent names, enabling more comprehensive and consistent template metaprogramming. Universal template parameters allow for a generic apply and other higher-order template metafunctions, including certain type traits.
Show Details
The ability to discover new transient candidates via image differencing without direct human intervention is an important task in observational astronomy. For these kind of image classification problems, machine learning techniques such as Convolutional Neural Networks (CNNs) have shown remarkable success. In this work, we present the results of an automated transient candidate identification on images with CNNs for an extant data set from the Dark Energy Survey Supernova program, whose main focus was on using Type Ia supernovae for cosmology. By performing an architecture search of CNNs, we identify networks that efficiently select non-artifacts (e.g., supernovae, variable stars, AGN, etc.) from artifacts (image defects, mis-subtractions, etc.), achieving the efficiency of previous work performed with random Forests, without the need to expend any effort in feature identification. The CNNs also help us identify a subset of mislabeled images. Performing a relabeling of the images in this subset, the resulting classification with CNNs is significantly better than previous results, lowering the false positive rate by 27% at a fixed missed detection rate of 0.05.
Alvin Oliver Glova, Yukai Yang, Yiyao Wan, Zhizhou Zhang, George… more authors » Michelogiannakis, Jonathan Balkind, Timothy Sherwood, « fewer authors
"Establishing Cooperative Computation with Hardware Embassies",
IEEE International Symposium on Secure and Private Execution Environment Design,
September 2022,
Liou J-Y, Awan M, Hofmeyr S, Forrest S, Wu C-J,
"Understanding the Power of Evolutionary Computation for GPU Code Optimization",
2022 IEEE International Symposium on Workload Characterization (IISWC),
August 11, 2022,
doi: 10.1109/IISWC55918.2022.00025
Ozge Surer, Filomena M. Nunes, Matthew Plumlee, Stefan M. Wild,
"Uncertainty Quantification in Breakup Reactions",
Physical Review C,
2022,
106:024607,
doi: 10.1103/PhysRevC.106.024607
M.F. Adams, D.P. Brennan, M.G. Knepley, P. Wang,
"Landau collision operator in the CUDA programming model applied to thermal quench plasmas",
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS),
July 15, 2022,
doi: 10.1109/IPDPS53621.2022.00020
V. Cirigliano, Z. Davoudi, J. Engel, R. J. Furnstahl, G. Hagen, U. Heinz,… more authors » H. Hergert, M. Horoi, C. W. Johnson, A. Lovato, E. Mereghetti, W. Nazarewicz, A. Nicholson, T. Papenbrock, S. Pastore, M. Plumlee, D. R. Phillips, P. E. Shanahan, S. R. Stroberg, F. Viens, A. Walker-Loud, K. A. Wendt, S. M. Wild, « fewer authors
"Towards Precise and Accurate Calculations of Neutrinoless Double-Beta Decay: Project Scoping Workshop Report",
2022,
doi: 10.48550/ARXIV.2207.01085
Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son,
"Design and implementation of dynamic I/O control scheme for large scale distributed file systems",
Cluster Computing,
2022,
25(6):1--16,
doi: 10.1007/s10586-022-03640-0
R. Han, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, J. Balcas, H. Newman,
"Access Trends of In-network Cache for Scientific Data",
5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA), in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC),
2022,
doi: 10.1145/3526064.3534110
J. Bellavita, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila,
"Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches",
5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC),
2022,
doi: 10.1145/3526064.3534111
R. Shao, J. Kim A. Sim, K. Wu,
"Predicting Slow Connections in Scientific Computing",
5th ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2022, in conjunction with The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC),
2022,
doi: 10.1145/3526064.3534112
J. Kim, M. Cafaro, J. Chou, A. Sim,
"SNTA’22: The 5th Workshop on Systems and Network Telemetry and Analytics",
In the proceedings of The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'22),
2022,
doi: 10.1145/3502181.3535108
Bin Dong, Alex Popescu, Veronica Rodriguez Tribaldos, Suren Byna, Jonathan Ajo-Franklin, Kesheng Wu,
"Real-time and post-hoc compression for data from Distributed Acoustic Sensing",
Computers \& Geosciences,
June 24, 2022,
105181,
Runzhou Han, Suren Byna, Houjun Tang, Bin Dong, and Mai Zheng,,
"PROV-IO: An I/O-Centric Provenance Framework for Scientific Data on HPC Systems",
HPDC 2022,
June 23, 2022,
D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, W. Arndt,… more authors » J. Blaschke, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, T. Lehman, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, L. Stephey, R. Thomas, G. Torok, « fewer authors
"LBNL Superfacility Project Report",
Lawrence Berkeley National Laboratory,
2022,
doi: 10.48550/arXiv.2206.11992
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, Kylie Huch, George Michelogiannakis,
"Superconducting Digital DIT Butterfly Unit for Fast Fourier Transform Using Race Logic",
2022 20th IEEE Interregional NEWCAS Conference (NEWCAS),
IEEE,
June 2022,
441-445,
Dan Bonachea, Paul H. Hargrove,
An Introduction to GASNet-EX for Chapel Users,
9th Annual Chapel Implementers and Users Workshop (CHIUW 2022),
June 10, 2022,
Show Details
Have you ever typed "export CHPL_COMM=gasnet"? If you’ve used Chapel with multi-locale support on a system without "Cray" in the model name, then you’ve probably used GASNet. Did you ever wonder what GASNet is? What GASNet should mean to you? This talk aims to answer those questions and more. Chapel has system-specific implementations of multi-locale communication for Cray-branded systems including the Cray XC and HPE Cray EX lines. On other systems, Chapel communication uses the GASNet communication library embedded in third-party/gasnet. In this talk, that third-party will introduce itself to you in the first person.
Video Presentation
Yujing Ma, Florin Rusu, Kesheng Wu, Alexander Sim,
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW),
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW),
Pages: 1088--1097
2022,
doi: 10.1109/IPDPSW55747.2022.00177
Yang Liu,
"A comparative study of butterfly-enhanced direct integral and differential equation solvers for high-frequency electromagnetic analysis involving inhomogeneous dielectrics",
May 29, 2022,
Huihuo Zheng, Venkatram Vishwanath, Quincey Koziol, Houjun Tang, John Ravi,… more authors » John Mainzer, Suren Byna, « fewer authors
"HDF5 Cache VOL: Efficient and scalable parallel I/O through caching data on node-local storage",
2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid),
May 16, 2022,
doi: 10.1109/CCGrid54584.2022.00015
K. Wang, S. Lee, J. Balewski, A. Sim, P. Nugent, A. Agrawal, A. Choudhary,… more authors » K. Wu, W-K. Liao, « fewer authors
"Using Multi-resolution Data to Accelerate Neural Network Training in Scientific Applications",
22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2022),
2022,
doi: 10.1109/CCGrid54584.2022.00050
M. G. Amankwah, D. Camps, E. W. Bethel, R. Van Beeumen, T. Perciano,
"Quantum pixel representations and compression for N-dimensional images",
Nature Scientific Reports,
May 11, 2022,
12:7712,
doi: 10.1038/s41598-022-11024-y
Paul H. Hargrove, Dan Bonachea, Amir Kamil, Colin A. MacLean, Damian Rouson, Daniel Waters,
"UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'22)",
Poster at Exascale Computing Project (ECP) Annual Meeting 2022,
May 5, 2022,
Show Details
We present UPC++ and GASNet-EX, distributed libraries which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Procedure Call (RPC) and for Remote Memory Access (RMA) to host and GPU memories. The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems.
Yosep Kim, Alexis Morvan, Long B Nguyen, Ravi K Naik, Christian J\ unger,… more authors » Larry Chen, John Mark Kreikebaum, David I Santiago, Irfan Siddiqi, « fewer authors
"High-fidelity three-qubit iToffoli gate for fixed-frequency superconducting qubits",
Nature Physics,
2022,
1--6,
doi: 10.1038/s41567-022-01590-3
JaeHyuk Kwack,
ROOFLINE PERFORMANCE ANALYSIS W/ INTEL ADVISOR ON INTEL CPUS & GPUS,
ECP Annual Meeting,
May 2022,
Neil Mehta,
Roofline on NVIDIA at NERSC,
ECP Annual Meeting,
May 2022,
Samuel Williams,
Introduction to the Roofline Model,
ECP Annual Meeting,
May 2022,
Mark Adams, Satish Balay, Oana Marin, Lois Curfman McInnes, Richard Tran… more authors » Mills, Todd Munson, Hong Zhang, Junchao Zhang, Jed Brown, Victor Eijkhout, Jacob Faibussowitsch, Matthew Knepley, Fande Kong, Scott Kruger, Patrick Sanan, Barry F. Smith, Hong Zhang, « fewer authors
"The PETSc Community as Infrastructure",
May 1, 2022,
24,
doi: 10.1109/MCSE.2022.3169974
Show Details
The communities that develop and support open-source scientific software packages are crucial to the utility and success of such packages. Moreover, they form an important part of the human infrastructure that enables scientific progress. This article discusses aspects of the Portable Extensible Toolkit for Scientific Computation community, its organization, and technical approaches that enable community members to help each other efficiently and effectively.
B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu,
"Enhancing IoT Anomaly Detection Performance for Federated Learning",
Digital Communications and Networks, Special Issue on Edge Computation and Intelligence,
2022,
doi: 10.1016/j.dcan.2022.02.007
Lipeng Wan, Axel Huebl, Junmin Gu, Franz Poeschel, Ana Gainaru, Ruonan… more authors » Wang, Jieyang Chen, Xin Liang, Dmitry Ganyushin, Todd Munson, Ian Foster, Jean-Luc Vay, Norbert Podhorszki, Kesheng Wu, Scott Klasky, « fewer authors
"Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization",
IEEE Transactions on Parallel and Distributed Systems,
2022,
33:878-890,
doi: 10.1109/TPDS.2021.3100784
Meyer F, Fritz A, Deng Z-L, Koslicki D, Lesker TR, Gurevich A, Robertson G,… more authors » Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh H-J, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC, « fewer authors
"Critical Assessment of Metagenome Interpretation: the second round of challenges",
Nature Methods,
April 1, 2022,
doi: 10.1038/S41592-022-01431-4
M. Avaylon, R. Sadre, Z. Bai, T. Perciano,
"Adaptable Deep Learning and Probabilistic Graphical Model System for Semantic Segmentation",
Advances in Artificial Intelligence and Machine Learnin,
March 31, 2022,
2:288--302,
doi: 10.54364/AAIML.2022.1119
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove,… more authors » Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, « fewer authors
"UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0",
Lawrence Berkeley National Laboratory Tech Report,
March 2022,
LBNL 2001453,
doi: 10.25344/S41C7Q
Show Details
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Show Details
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
George Michelogiannakis, Madeleine Glick, John Shalf, Keren Bergman,
Photonics as a Means to Implement Intra-rack Resource Disaggregation,
SPIE photonics west,
March 2022,
Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, George Michelogiannakis,
Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators,
27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22),
February 2022,
X. Zhu, Y. Liu, P. Ghysels, D. Bindal, X. S. Li,
"GPTuneBand: multi-task and multi-fidelity Bayesian optimization for autotuning large-scale high performance computing applications",
SIAM PP,
February 23, 2022,
George Michelogiannakis, Benjamin Klenk, Brandon Cook, Min Yee Teh,… more authors » Madeleine Glick, Larry Dennison, Keren Bergman, John Shalf, « fewer authors
"A Case For Intra-Rack Resource Disaggregation in HPC",
ACM Transactions on Architecture and Code Optimization,
February 2022,
Aleksandra Ciprijanovic, Diana Kafkes, Gregory Snyder, F. Javier Sanchez,… more authors » Gabriel Nathan Perdue, Kevin Pedro, Brian Nord, Sandeep Madireddy, Stefan M. Wild, « fewer authors
"DeepAdversaries: Examining the Robustness of Deep Learning Models for Galaxy Morphology Classification",
Machine Learning: Science and Technology,
2022,
3:035007,
doi: 10.1088/2632-2153/ac7f1a
Hannah Klion, Alexander Tchekhovskoy, Daniel Kasen, Adithan Kathirgamaraju,… more authors » Eliot Quataert, Rodrigo Fernandez, « fewer authors
"The impact of r-process heating on the dynamics of neutron star merger accretion disc winds and their electromagnetic radiation",
Monthly Notices of the RAS,
2022,
510:2968-2979,
doi: 10.1093/mnras/stab3583
John Wu, Ben Brown, Paolo Calafiura, Quincey Koziol, Dongeun Lee, Alex Sim, Devesh Tiwari,
Support for In-Flight Data Analyses in Scientific Workflows,
DOE ASCR Workshop on the Management and Storage of Scientific Data,
2022,
doi: 10.2172/1843500
A. Pereira, A. Sim, K. Wu, S. Yoo, H. Ito,
"Data access pattern analysis for dCache storage system",
International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022),
2022,
J. V. Pusztay, M. G. Knepley, and M. F. Adams,
"Conservative Projection Between FEM and Particle Bases",
SIAM Journal on Scientific Computing,
January 1, 2022,
doi: https://doi.org/10.1137/21M145407
Stephen Hudson, Jeffrey Larson, John-Luke Navarro, Stefan M. Wild,
"libEnsemble: A Library to Coordinate the Concurrent Evaluation of Dynamic Ensembles of Calculations",
IEEE Transactions on Parallel and Distributed Systems,
2022,
33:977--988,
doi: 10.1109/TPDS.2021.3082815
Alina Lazar, others,
Accelerating the Inference of the Exa.TrkX Pipeline,
20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded - Towards Sustainable, Diverse, Performant and Effective Scientific Computing,
2022,
Chun-Yi Wang, others,
Reconstruction of Large Radius Tracks with the Exa.TrkX pipeline,
20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research: AI Decoded - Towards Sustainable, Diverse, Performant and Effective Scientific Computing,
2022,
Sunanda Banerjee, others,
Detector and Beamline Simulation for Next-Generation High Energy Physics Experiments,
2022 Snowmass Summer Study,
2022,
Meghna Bhattacharya, others,
Portability: A Necessary Approach for Future Scientific Software,
2022 Snowmass Summer Study,
2022,
Christopher D. Jones, Kyle Knoepfel, Paolo Calafiura, Charles Leggett, Vakhtang Tsulaia,
Evolution of HEP Processing Frameworks,
2022 Snowmass Summer Study,
2022,
Savannah Thais, Paolo Calafiura, Grigorios Chachamis, Gage DeZoort, Javier… more authors » Duarte, Sanmay Ganguly, Michael Kagan, Daniel Murnane, Mark S. Neubauer, Kazuhiro Terao, « fewer authors
Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges,
2022 Snowmass Summer Study,
2022,
E. Wes Bethel, Burlen Loring, Utkarsh Ayachit, P. N. Duque, Nicola Ferrier,… more authors » Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, Dave Pugmire, Silvio Rizzi, Thompson, Will Usher, Gunther H. Weber, Brad Whitlock, Wolf, Kesheng Wu, « fewer authors
"Proximity Portability and In Transit, M-to-N Data Partitioning and Movement in SENSEI",
In Situ Visualization for Computational Science,
(
2022)
doi: 10.1007/978-3-030-81627-8_20
E. Wes Bethel, Burlen Loring, Utkarsh Ayatchit, David Camp, P. N. Duque,… more authors » Nicola Ferrier, Joseph Insley, Junmin Gu, Kress, Patrick O’Leary, David Pugmire, Silvio Rizzi, Thompson, Gunther H. Weber, Brad Whitlock, Matthew Wolf, Kesheng Wu, « fewer authors
"The SENSEI Generic In Situ Interface: Tool and Processing Portability at Scale",
In Situ Visualization for Computational Science,
(
2022)
doi: 10.1007/978-3-030-81627-8_13
Sugeerth Murugesan, Mariam Kiran, Bernd Hamann, Gunther H. Weber,
"Netostat: Analyzing Dynamic Flow Patterns in High-Speed Networks",
Cluster Computing,
2022,
doi: 10.1007/s10586-022-03543-0
H Weierbach, AR Lima, JD Willard, VC Hendrix, DS Christianson, M Lubich, C Varadharajan,
Stream Temperature Predictions for River Basin Management in the Pacific Northwest and Mid-Atlantic Regions Using Machine Learning,
Water (Switzerland),
2022,
doi: 10.3390/w14071032
M Galloway, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M… more authors » Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, « fewer authors
BeyondPlanck III. Commander3,
2022,
M Galloway, M Reinecke, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S… more authors » Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, « fewer authors
BeyondPlanck VIII. Efficient Sidelobe Convolution and Correction through Spin Harmonics,
2022,
TL Svalheim, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco,… more authors » M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, M Galloway, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, A Zonca, « fewer authors
BeyondPlanck X. Bandpass and beam leakage corrections,
2022,
D Herman, B Hensley, KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S… more authors » Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, M Galloway, S Gerakakis, E Gjerløw, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, DJ Watts, IK Wehus, A Zacchei, « fewer authors
BeyondPlanck XVI. Limits on Large-Scale Polarized Anomalous Microwave Emission from Planck LFI and WMAP,
2022,
KJ Andersen, R Aurlien, R Banerji, M Bersanelli, S Bertocco, M Brilenkov,… more authors » M Carbone, LPL Colombo, HK Eriksen, MK Foss, C Franceschet, U Fuskeland, S Galeotta, M Galloway, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, HT Ihle, JB Jewell, A Karakci, E Keihänen, R Keskitalo, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, M Tomasi, DJ Watts, IK Wehus, A Zacchei, « fewer authors
BeyondPlanck XIV. Intensity foreground sampling, degeneracies and priors,
2022,
L Collaboration, E Allys, K Arnold, J Aumont, R Aurlien, S Azzoni, C… more authors » Baccigalupi, AJ Banday, R Banerji, RB Barreiro, N Bartolo, L Bautista, D Beck, S Beckman, M Bersanelli, F Boulanger, M Brilenkov, M Bucher, E Calabrese, P Campeti, A Carones, FJ Casas, A Catalano, V Chan, K Cheung, Y Chinone, SE Clark, F Columbro, G D Alessandro, PD Bernardis, TD Haan, EDL Hoz, MD Petris, SD Torre, P Diego-Palazuelos, T Dotani, JM Duval, T Elleflot, HK Eriksen, J Errard, T Essinger-Hileman, F Finelli, R Flauger, C Franceschet, U Fuskeland, M Galloway, K Ganga, M Gerbino, M Gervasi, RT Génova-Santos, T Ghigna, S Giardiello, E Gjerløw, J Grain, F Grupp, A Gruppuso, JE Gudmundsson, NW Halverson, P Hargrave, T Hasebe, M Hasegawa, M Hazumi, S Henrot-Versillé, B Hensley, LT Hergt, D Herman, E Hivon, RA Hlozek, AL Hornsby, Y Hoshino, J Hubmayr, K Ichiki, T Iida, H Imada, H Ishino, G Jaehnig, N Katayama, A Kato, R Keskitalo, T Kisner, Y Kobayashi, A Kogut, K Kohri, E Komatsu, K Komatsu, K Konishi, N Krachmalnicoff, CL Kuo, L Lamagna, M Lattanzi, AT Lee, C Leloup, F Levrier, E Linder, G Luzzi, J Macias-Perez, B Maffei, D Maino, S Mandelli, E Martínez-González, S Masi, M Massa, S Matarrese, FT Matsuda, T Matsumura, L Mele, M Migliaccio, Y Minami, A Moggi, J Montgomery, L Montier, G Morgante, B Mot, Y Nagano, T Nagasaki, R Nagata, R Nakano, T Namikawa, F Nati, P Natoli, S Nerval, F Noviello, K Odagiri, S Oguri, H Ohsaki, L Pagano, A Paiella, D Paoletti, A Passerini, G Patanchon, F Piacentini, M Piat, G Polenta, D Poletti, T Prouvé, G Puglisi, D Rambaud, C Raum, S Realini, M Reinecke, M Remazeilles, A Ritacco, G Roudil, JA Rubino-Martin, M Russell, H Sakurai, Y Sakurai, M Sasaki, D Scott, Y Sekimoto, K Shinozaki, M Shiraishi, P Shirron, G Signorelli, F Spinella, S Stever, R Stompor, S Sugiyama, RM Sullivan, A Suzuki, TL Svalheim, E Switzer, R Takaku, H Takakura, Y Takase, A Tartari, Y Terao, J Thermeau, H Thommesen, KL Thompson, M Tomasi, M Tominaga, M Tristram, M Tsuji, M Tsujimoto, L Vacher, P Vielva, N Vittorio, W Wang, K Watanuki, IK Wehus, J Weller, B Westbrook, J Wilms, EJ Wollack, J Yumoto, M Zannoni, « fewer authors
Probing Cosmic Inflation with the LiteBIRD Cosmic Microwave Background Polarization Survey,
2022,
DJ Watts, M Galloway, HT Ihle, KJ Andersen, R Aurlien, R Banerji, A… more authors » Basyrov, M Bersanelli, S Bertocco, M Brilenkov, M Carbone, LPL Colombo, HK Eriksen, JR Eskilt, MK Foss, C Franceschet, U Fuskeland, S Galeotta, S Gerakakis, E Gjerløw, B Hensley, D Herman, M Iacobellis, M Ieronymaki, JB Jewell, A Karakci, E Keihänen, R Keskitalo, JGS Lunde, G Maggio, D Maino, M Maris, S Paradiso, B Partridge, M Reinecke, M San, NO Stutzer, A-S Suur-Uski, TL Svalheim, D Tavagnacco, H Thommesen, IK Wehus, A Zacchei, « fewer authors
From BeyondPlanck to Cosmoglobe: Preliminary WMAP Q-band analysis,
2022,
P Diego-Palazuelos, JR Eskilt, Y Minami, M Tristram, RM Sullivan, AJ… more authors » Banday, RB Barreiro, HK Eriksen, KM Górski, R Keskitalo, E Komatsu, E Martínez-González, D Scott, P Vielva, IK Wehus, « fewer authors
"Cosmic Birefringence from the Planck Data Release 4",
Physical review letters,
2022,
128:091302,
doi: 10.1103/physrevlett.128.091302
C Varadharajan, AP Appling, B Arora, DS Christianson, VC Hendrix, V Kumar,… more authors » AR Lima, J Müller, S Oliver, M Ombadi, T Perciano, JM Sadler, H Weierbach, JD Willard, Z Xu, J Zwart, « fewer authors
"Can machine learning accelerate process understanding and decision-relevant predictions of river water quality?",
Hydrological Processes,
January 1, 2022,
36,
doi: 10.1002/hyp.14565
S. Dhawan, A. Goobar, M. Smith, J. Johansson, M. Rigault, J. Nordin, R.… more authors » Biswas, D. Goldstein, P. Nugent, Y. -L. Kim, A. A. Miller, M. J. Graham, M. Medford, M. M. Kasliwal, S. R. Kulkarni, Dmitry A. Duev, E. Bellm, P. Rosnet, R. Riddle, J. Sollerman, « fewer authors
The Zwicky Transient Facility Type Ia supernova survey: first data release and results,
Monthly Notices of the RAS,
Pages: 2228-2241
2022,
doi: 10.1093/mnras/stab3093
Yuan Qi Ni, Dae-Sik Moon, Maria R. Drout, Abigail Polin, David J. Sand,… more authors » Santiago Gonz\ alez-Gait\ an, Sang Chul Kim, Youngdae Lee, Hong Soo Park, D. Andrew Howell, Peter E. Nugent, Anthony L. Piro, Peter J. Brown, Llu\ \is Galbany, Jamison Burke, Daichi Hiramatsu, Griffin Hosseinzadeh, Stefano Valenti, Niloufar Afsariardchi, Jennifer E. Andrews, John Antoniadis, Iair Arcavi, Rachael L. Beaton, K. Azalee Bostroem, Raymond G. Carlberg, S. Bradley Cenko, Sang-Mok Cha, Yize Dong, Avishay Gal-Yam, Joshua Haislip, Thomas W. -S. Holoien, Sean D. Johnson, Vladimir Kouprianov, Yongseok Lee, Christopher D. Matzner, Nidia Morrell, Curtis McCully, Giuliano Pignata, Daniel E. Reichart, Jeffrey Rich, Stuart D. Ryder, Nathan Smith, Samuel Wyatt, Sheng Yang, « fewer authors
Infant-phase reddening by surface Fe-peak elements in a normal type Ia supernova,
Nature Astronomy,
2022,
doi: 10.1038/s41550-022-01603-4
Melissa L. Graham, Christoffer Fremling, Daniel A. Perley, Rahul Biswas,… more authors » Christopher A. Phillips, Jesper Sollerman, Peter E. Nugent, Sarafina Nance, Suhail Dhawan, Jakob Nordin, Ariel Goobar, Adam Miller, James D. Neill, Xander J. Hall, Matthew J. Hankins, Dmitry A. Duev, Mansi M. Kasliwal, Mickael Rigault, Eric C. Bellm, David Hale, Przemek Mr\ oz, S. R. Kulkarni, « fewer authors
Supernova siblings and their parent galaxies in the Zwicky Transient Facility Bright Transient Survey,
Monthly Notices of the RAS,
Pages: 241-254
2022,
doi: 10.1093/mnras/stab3802
MB Simmonds, WJ Riley, DA Agarwal, X Chen, S Cholia, R Crystal-Ornelas, ET… more authors » Coon, D Dwivedi, VC Hendrix, M Huang, A Jan, Z Kakalia, J Kumar, CD Koven, L Li, M Melara, L Ramakrishnan, DM Ricciuto, AP Walker, W Zhi, Q Zhu, C Varadharajan, « fewer authors
Guidelines for Publicly Archiving Terrestrial Model Data to Enhance Usability, Intercomparison, and Synthesis,
Data Science Journal,
2022,
doi: 10.5334/dsj-2022-003
C Varadharajan, VC Hendrix, DS Christianson, M Burrus, C Wong, SS Hubbard, DA Agarwal,
BASIN-3D: A brokering framework to integrate diverse environmental data,
Computers and Geosciences,
2022,
doi: 10.1016/j.cageo.2021.105024
B Faybishenko, R Versteeg, G Pastorello, D Dwivedi, C Varadharajan, D Agarwal,
Challenging problems of quality assurance and quality control (QA/QC) of meteorological time series data,
Stochastic Environmental Research and Risk Assessment,
Pages: 1049--1062
2022,
doi: 10.1007/s00477-021-02106-w
Sean Peisert,
Unsafe at Any Clock Speed: the Insecurity of Computer System Design, Implementation, and Operation [From the Editors],
IEEE Security & Privacy,
Pages: 4-9
January 2022,
doi: 10.0.4.85/MSEC.2021.3127086
Hengjie Wang, Robert Planas, Aparna Chandramowlishwaran, Ramin Bostanabad,
"Mosaic flows: A transferable deep learning framework for solving PDEs on unseen domains",
Computer Methods in Applied Mechanics and Engineering,
2022,
389:114424,
F Molz, B Faybishenko, D Agarwal,
A broad exploration of nonlinear dynamics in microbial systems motivated by chemostat experiments producing deterministic chaos.,
2022,
2021
Y. Cho, J. W. Demmel, X. S. Li, Y. Liu, H. Luo,
"Enhancing autotuning capability with a history database",
IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC),
December 20, 2021,
Qiao Kang, Scot Breitenfeld, Kaiyuan Hou, Wei-keng Liao, Robert Ross, and Suren Byna,,
"Optimizing Performance of Parallel I/O Accesses to Non-contiguous Blocks in Multiple Array Variables",
IEEE BigData 2021 conference,
December 19, 2021,
J. Bang, C. Kim, K. Wu, A. Sim, S. Byna, H. Sung, H. Eom,
"An In-Depth I/O Pattern Analysis in HPC Systems",
IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021),
2021,
doi: 10.1109/HiPC53243.2021.00056
S. Lee, Q. Kang, K. Wang, J. Balewski, A. Sim, A. Agrawal, A. Choudhary, P.… more authors » Nugent, K. Wu, W-K. Liao, « fewer authors
"Asynchronous I/O Strategy for Large-Scale Deep Learning Applications",
IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021),
2021,
doi: 10.1109/HiPC53243.2021.00046
A. Lazar, L. Jin, C. Brown, C. A. Spurlock, A. Sim, K. Wu,
"Performance of the Gold Standard and Machine Learning in Predicting Vehicle Transactions",
the 3rd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD 2021),
2021,
doi: 10.1109/BigData52589.2021.9671286
James R. Clavin, Yue Huang, Xin Wang, Pradeep M. Prakash, Sisi Duan,… more authors » Jianwu Wang, Sean Peisert, « fewer authors
"A Framework for Evaluating BFT",
Proceedings of the IEEE International Conference on Parallel and Distributed Systems (ICPADS),
IEEE,
December 2021,
R. Mills, M.F. Adams, S. Balay, J. Brown, A. Dener, M. Knepley, S. Kruger,… more authors » H. Morgan, T. Munson, K. Rupp, B. Smith, S. Zampini, H. Zhang, J. Zhang, Junchao, « fewer authors
"Toward performance-portable PETSc for GPU-based exascale systems",
Parallel Computing,
December 1, 2021,
108,
doi: 10.1016/j.parco.2021.102831
Show Details
The Portable Extensible Toolkit for Scientific computation (PETSc) library delivers scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization. The PETSc design for performance portability addresses fundamental GPU accelerator challenges and stresses flexibility and extensibility by separating the programming model used by the application from that used by the library, and it enables application developers to use their preferred programming model, such as Kokkos, RAJA, SYCL, HIP, CUDA, or OpenCL, on upcoming exascale systems. A blueprint for using GPUs from PETSc-based codes is provided, and case studies emphasize the flexibility and high performance achieved on current GPU-based systems.
Andrew Myers, Ann Almgren, Diana Almorim, John Bell, Luca Fedeli, Lixin Ge,… more authors » Kevin Gott, David Grote, Mark Hogan, Axel Huebl, Revathi Jambunathan, Remi Lehe, Cho Ng, Michael Rowan, Olga Shapoval, Maxence Thevenet, Jean-Luc Vay, Henri Vincenti, Eloise Yang, Neil Zaim, Weiqun Zhang, Yin Zhao, Edoardo Zoni, « fewer authors
"Porting WarpX to GPU-accelerated platforms",
Parallel Computing,
December 1, 2021,
Luca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin, Sean… more authors » Peisert, W. Bradley Holtz, Anil Aswani, Dipankar Dwivedi, Haruko Wainwright, Ghanshyam Pilania, Benjamin Nachman, Babetta L. Marrone, Nicola Falco, Prabhat, Daniel Arnold, Alejandro Wolf-Yadlin, Sarah Powers, Sharlee Climer, Quinn Jackson, Ty Carlson, Michael Sohn, Petrus Zwart, Neeraj Kumar, Amy Justice, Claire Tomlin, Daniel Jacobson, Gos Micklem, Georgios V. Gkoutos, Peter J. Bickel, Jean-Baptiste Cazier, Juliane Müller, Bobbie-Jo Webb-Robertson, Rick Stevens, Mark Anderson, Ken Kreutz-Delgado, Michael W. Mahoney, James B. Brown,, « fewer authors
Learning from Learning Machines: a New Generation of AI Technology to Meet the Needs of Science,
arXiv preprint arXiv:2111.13786,
November 27, 2021,
André Ramos Carneiro, Jean Luca Bez, Carla Osthoff, Lucas Mello Schnorr,… more authors » Phillipe Olivier Alexandre Navaux, « fewer authors
"HPC Data Storage at a Glance: The Santos Dumont Experience",
IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD),
IEEE,
November 26, 2021,
157-166,
doi: 10.1109/SBAC-PAD53543.2021.00027
Akel Hashim, Ravi K. Naik, Alexis Morvan, Jean-Loup Ville, Bradley… more authors » Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin P. O Brien, Ian Hincks, Joel J. Wallman, Joseph Emerson, Irfan Siddiqi, « fewer authors
"Randomized Compiling for Scalable Quantum Computing on a Noisy Superconducting Quantum Processor",
Physical Review X,
2021,
11:041039,
doi: 10.1103/PhysRevX.11.041039
Cong Xu, Suparna Bhattacharya, Martin Foltin, Suren Byna, and Paolo Faraboschi,
"Data-Aware Storage Tiering for Deep Learning",
6th International Parallel Data Systems Workshop (PDSW) 2021, held in conjunction with SC21,
November 21, 2021,
Show Details
We describe the replacement of MPI with UPC++ in an existing Kokkos code that simulates heat conduction within a rectangular 3D object, as well as an analysis of the new code’s performance on CUDA accelerators. The key challenges were packing the halos in Kokkos data structures in a way that allowed for UPC++ remote memory access, and streamlining synchronization costs. Additional UPC++ abstractions used included global pointers, distributed objects, remote procedure calls, and futures. We also make use of the device allocator concept to facilitate data management in memory with unique properties, such as GPUs. Our results demonstrate that despite the algorithm’s good semantic match to message passing abstractions, straightforward modifications to use UPC++ communication deliver vastly improved performance and scalability in the common case. We find the one-sided UPC++ version written in a natural way exhibits good performance, whereas the message-passing version written in a straightforward way exhibits performance anomalies. We argue this represents a productivity benefit for one-sided communication models.
PAW-ATM'21
Kenneth Rudinger, Craig W Hogle, Ravi K Naik, Akel Hashim, Daniel Lobser,… more authors » David I Santiago, Matthew D Grace, Erik Nielsen, Timothy Proctor, Stefan Seritan, others, « fewer authors
"Experimental Characterization of Crosstalk Errors with Simultaneous Gate Set Tomography",
PRX Quantum,
2021,
2:040338,
doi: 10.1103/PRXQuantum.2.040338
Franz Poeschel, Juncheng E, William F. Godoy, Norbert Podhorszki, Scott… more authors » Klasky, Greg Eisenhauer, Philip E. Davis, Lipeng Wan, Ana Gainaru, Junmin Gu, Fabian Koller, René Widera, Michael Bussmann, Axel Huebl, « fewer authors
"Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2",
Smoky Mountains Computational Sciences and Engineering Conference (SMC2021),
2021,
Show Details
UPC++ is a C++ library implementing the Asynchronous Partitioned Global Address Space (APGAS) model. We propose an enhancement to the completion mechanisms of UPC++ used to synchronize communication operations that is designed to reduce overhead for on-node operations. Our enhancement permits eager delivery of completion notification in cases where the data transfer semantics of an operation happen to complete synchronously, for example due to the use of shared-memory bypass. This semantic relaxation allows removing significant overhead from the critical path of the implementation in such cases. We evaluate our results on three different representative systems using a combination of microbenchmarks and five variations of the the HPCChallenge RandomAccess benchmark implemented in UPC++ and run on a single node to accentuate the impact of locality. We find that in RMA versions of the benchmark written in a straightforward manner (without manually optimizing for locality), the new eager notification mode can provide up to a 25% speedup when synchronizing with promises and up to a 13.5x speedup when synchronizing with conjoined futures. We also evaluate our results using a graph matching application written with UPC++ RMA communication, where we measure overall speedups of as much as 11% in single-node runs of the unmodified application code, due to our transparent enhancements.
PAW-ATM'21
J. Cheung, A. Sim, J. Kim, K. Wu,
"Performance Prediction of Large Data Transfers",
ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), ACM Student Research Competition (SRC),
2021,
Paul H. Hargrove, Dan Bonachea, Colin A. MacLean, Daniel Waters,
"GASNet-EX Memory Kinds: Support for Device Memory in PGAS Programming Models",
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21) Research Poster,
November 2021,
doi: 10.25344/S4P306
Show Details
Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work includes two major components: UPC++ (a C++ template library) and GASNet-EX (a portable, high-performance communication library). This poster describes recent advances in GASNet-EX to efficiently implement Remote Memory Access (RMA) operations to and from memory on accelerator devices such as GPUs. Performance is illustrated via benchmark results from UPC++ and the Legion programming system, both using GASNet-EX as their communications library.
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah… more authors » Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, « fewer authors
"Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows",
2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS),
November 15, 2021,
doi: 10.1109/WORKS54523.2021.00014
Show Details
Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.
Katherine A. Yelick, Amir Kamil, Damian Rouson, Dan Bonachea, Paul H. Hargrove,
UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications (SC21),
Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21),
November 15, 2021,
Show Details
UPC++ is a C++ library supporting Partitioned Global Address Space (PGAS) programming. UPC++ offers low-overhead one-sided Remote Memory Access (RMA) and Remote Procedure Calls (RPC), along with future/promise-based asynchrony to express dependencies between computation and asynchronous data movement. UPC++ supports simple/regular data structures as well as more elaborate distributed applications where communication is fine-grained and/or irregular. UPC++ provides a uniform abstraction for one-sided RMA between host and GPU/accelerator memories anywhere in the system. UPC++'s support for aggressive asynchrony enables applications to effectively overlap communication and reduce latency stalls, while the underlying GASNet-EX communication library delivers efficient low-overhead RMA/RPC on HPC networks.
This tutorial introduces UPC++, covering the memory and execution models and basic algorithm implementations. Participants gain hands-on experience incorporating UPC++ features into application proxy examples. We examine a few UPC++ applications with irregular communication (metagenomic assembler and COVID-19 simulation) and describe how they utilize UPC++ to optimize communication performance.
Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell,… more authors » Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, « fewer authors
"Architectural Requirements for Deep Learning Workloads in HPC Environments",
(BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS),
November 2021,
Tan Nguyen, Erich Strohmaier, John Shalf,
"Facilitating CoDesign with Automatic Code Similarity Learning",
7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC),
November 14, 2021,
Bradley K. Mitchell, Ravi K. Naik, Alexis Morvan, Akel Hashim, John Mark… more authors » Kreikebaum, Brian Marinelli, Wim Lavrijsen, Kasra Nowrouzi, David I. Santiago, Irfan Siddiqi, « fewer authors
"Hardware-Efficient Microwave-Activated Tunable Coupling between Superconducting Qubits",
Physical Review Letters,
2021,
127:200502,
doi: 10.1103/PhysRevLett.127.200502
Ran Cheng, Uday S. Goteti, Harrison Walker, Keith M. Krause, Luke Oeding, Michael C. Hamilton,
"Toward Learning in Neuromorphic Circuits Based on Quantum Phase Slip Junctions",
Frontiers in Neuroscience,
November 8, 2021,
A. Syal, A. Lazar, J. Kim, A. Sim, K. Wu,
"Network traffic performance analysis from passive measurements using gradient boosting machine learning",
International Journal of Big Data Intelligence,
2021,
8:13-30,
doi: 10.1504/IJBDI.2021.118741
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove,… more authors » Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, « fewer authors
"UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0",
Lawrence Berkeley National Laboratory Tech Report,
September 2021,
LBNL 2001424,
doi: 10.25344/S4SW2T
Show Details
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
Yilun Xu, Gang Huang, Jan Balewski, Ravi Naik, Alexis Morvan, Bradley… more authors » Mitchell, Kasra Nowrouzi, David I. Santiago, Irfan Siddiqi, « fewer authors
"QubiC: An Open-Source FPGA-Based Control and Measurement System for Superconducting Quantum Information Processors",
IEEE Transactions on Quantum Engineering,
2021,
2:1-11,
doi: 10.1109/TQE.2021.3116540
Show Details
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
E. Copps, A. Sim (Advisor), K. Wu (Advisor),
"Analyzing scientific data sharing patterns with in-network data caching",
ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2021), ACM Student Research Competition (SRC),
2021,
Marco Siracusa, Emanuele Del Sozzo, Marco Rabozzi, Lorenzo Di Tucci,… more authors » Samuel Williams, Donatella Sciuto, Marco Domenico Santambrogio, « fewer authors
"A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model",
Transactions on Computers (TC),
September 2021,
doi: 10.1109/TC.2021.3111761
Srivatsan Chakram, Andrew E. Oriani, Ravi K. Naik, Akash V. Dixit, Kevin… more authors » He, Ankur Agrawal, Hyeokshin Kwon, David I. Schuster, « fewer authors
"Seamless High-Q Microwave Cavities for Multimode Circuit Quantum Electrodynamics",
Physical Review Letters,
2021,
127:107701,
doi: 10.1103/PhysRevLett.127.107701
Tan Nguyen, Colin MacLean, Marco Siracusa, Douglas Doerfler, Nicholas J. Wright, Samuel Williams,
"FPGA‐based HPC accelerators: An evaluation on performance and energy efficiency",
CCPE,
August 22, 2021,
doi: 10.1002/cpe.6570
Show Details
This draft proposes an extension for a new future-based completion variant that can be more effectively streamlined for RMA and atomic access operations that happen to be satisfied at runtime using purely node-local resources. Many such operations are most efficiently performed synchronously using load/store instructions on shared-memory mappings, where the actual access may only require a few CPU instructions. In such cases we believe it’s critical to minimize the overheads imposed by the UPC++ runtime and completion queues, in order to enable efficient operation on hierarchical node hardware using shared-memory bypass.
The new upcxx::{source,operation}_cx::as_eager_future() completion variant accomplishes this goal by relaxing the current restriction that future-returning access operations must return a non-ready future whose completion is deferred until a subsequent explicit invocation of user-level progress. This relaxation allows access operations that are completed synchronously to instead return a ready future, thereby avoiding most or all of the runtime costs associated with deferment of future completion and subsequent mandatory entry into the progress engine.
We additionally propose to make this new as_eager_future() completion variant the new default completion for communication operations that currently default to returning a future. This should encourage use of the streamlined variant, and may provide performance improvements to some codes without source changes. A mechanism is proposed to restore the legacy behavior on-demand for codes that might happen to rely on deferred completion for correctness.
Finally, we propose a new as_eager_promise() completion variant that extends analogous improvements to promise-based completion, and corresponding changes to the default behavior of as_promise().
Ran Cheng, Uday S. Goteti, Michael C. Hamilton,
"High-Speed and Low-Power Superconducting Neuromorphic Circuits Based on Quantum Phase-Slip Junctions",
IEEE Transactions on Applied Superconductivity,
August 2021,
Nan Ding, Samuel Williams, Yang Liu, Xiaoye S. Li,
A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver,
July 19, 2021,
Nan Ding, Yang Liu, Samuel Williams, Xiaoye S. Li,
"A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver",
SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21),
July 19, 2021,
Charlene Yang, Yunsong Wang, Thorsten Kurth, Steven Farrell, Samuel Williams,
"Hierarchical Roofline Performance Analysis for Deep Learning Applications",
Intelligent Computing, LNNS,
July 15, 2021,
doi: 10.1007/978-3-030-80126-7
M. Nakashima, A. Sim, Y. Kim, J. Kim, J. Kim,
"Automated Feature Selection for Anomaly Detection in Network Traffic Data",
ACM Transactions on Management Information Systems (TMIS),
2021,
12:1-28,
doi: 10.1145/3446636
Show Details
Scientific exploration generates expanding volumes of data that commonly require High Performance Computing (HPC) systems to facilitate research. HPC systems are complex ecosystems of hardware and software that frequently are not user friendly. The Usable Data Abstractions (UDA) project set out to build usable software for scientific workflows in HPC environments by undertaking multiple rounds of qualitative user research. Qualitative research investigates how individuals accomplish their work and our interview-based study surfaced a variety of insights about the experiences of working in and with HPC ecosystems. This report examines multiple facets to the experiences of scientists and developers using and supporting HPC systems. We discuss how stakeholders grasp the design and configuration of these systems, the impacts of abstraction layers on their ability to successfully do work, and the varied perceptions of time that shape this work. Examining the adoption of the Cori HPC at NERSC we explore the anticipations and lived experiences of users interacting with this system's novel storage feature, the Burst Buffer. We present lessons learned from across these insights to illustrate just some of the challenges HPC facilities and their stakeholders need to account for when procuring and supporting these essential scientific resources to ensure their usability and utility to a variety of scientific practices.
Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy,… more authors » Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, « fewer authors
"Experiences with Reproducibility: Case Studies from Scientific Workflows",
(P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems,
ACM,
June 21, 2021,
doi: 10.1145/3456287.3465478
Show Details
Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.
A. Lazar, A. Sim, K. Wu,
"GPU-based Classification for Wireless Intrusion Detection",
4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021),
2021,
doi: 10.1145/3452411.3464445
Y. Wang, K. Wu, A. Sim, S. Yoo, S. Misawa,
"Access Patterns of Disk Cache for Large Scientific Archive",
4th ACM International Workshop on System and Network Telemetry and Analysis (SNTA 2021),
2021,
doi: 10.1145/3452411.3464444
Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Michael Beach, Drew Paine, Lavanya Ramakrishnan,
"Science Capsule - Capturing the Data Life Cycle",
Journal of Open Source Software,
2021,
6:2484,
doi: 10.21105/joss.02484
Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power,,
"Enabling Design Space Exploration for RISC-V Secure Compute Environments",
Proceedings of the Fifth Workshop on Computer Architecture Research with RISC-V (CARRV), (co-located with ISCA 2021),
June 17, 2021,
Ciaran Roberts, Sy-Toan Ngo, Alexandre Milesi, Anna Scaglione, Sean Peisert, Daniel Arnold,
"Deep Reinforcement Learning for Mitigating Cyber-Physical DER Voltage Unbalance Attacks”",
Proceedings of the 2021 American Control Conference (ACC),
May 2021,
doi: 10.23919/ACC50511.2021.9482815
George Michelogiannakis,
SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC,
IEEE International Parallel and Distributed Processing Symposium,
May 2021,
Y. Ma, F. Ruso, A. Sim, K. Wu,
"Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU+GPU Architectures",
Heterogeneity in Computing Workshop (HCW 2021), in conjunction with the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS),
2021,
doi: 10.1109/IPDPSW52791.2021.00012
Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert,
"Performance Analysis of Scientific Computing Workloads on General Purpose TEEs",
Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS),
IEEE,
May 2021,
doi: 10.1109/IPDPS49936.2021.00115
Tamsin L. Edwards, Sophie Nowicki, Ben Marzeion, Regine Hock, Heiko… more authors » Goelzer, Hélène Seroussi, Nicolas C. Jourdain, Donald A. Slater, Fiona E. Turner, Christopher J. Smith, Christine M. McKenna, Erika Simon, Ayako Abe-Ouchi, Jonathan M. Gregory, Eric Larour, William H. Lipscomb, Antony J. Payne, Andrew Shepherd, Cécile Agosta, Patrick Alexander, Torsten Albrecht, Brian Anderson, Xylar Asay-Davis, Andy Aschwanden, Alice Barthel, Andrew Bliss, Reinhard Calov, Christopher Chambers, Nicolas Champollion, Youngmin Choi, Richard Cullather, Joshua Cuzzone, Christophe Dumas, Denis Felikson, Xavier Fettweis, Koji Fujita, Benjamin K. Galton-Fenzi, Rupert Gladstone, Nicholas R. Golledge, Ralf Greve, Tore Hattermann, Matthew J. Hoffman, Angelika Humbert, Matthias Huss, Philippe Huybrechts, Walter Immerzeel, Thomas Kleiner, Philip Kraaijenbrink, Sébastien Le clec’h, Victoria Lee, Gunter R. Leguy, Christopher M. Little, Daniel P. Lowry, Jan-Hendrik Malles, Daniel F. Martin, Fabien Maussion, Mathieu Morlighem, James F. O’Neill, Isabel Nias, Frank Pattyn, Tyler Pelle, Stephen F. Price, Aurélien Quiquet, Valentina Radić, Ronja Reese, David R. Rounce, Martin Rückamp, Akiko Sakai, Courtney Shafer, Nicole-Jeanne Schlegel, Sarah Shannon, Robin S. Smith, Fiammetta Straneo, Sainan Sun, Lev Tarasov, Luke D. Trusel, Jonas Van Breedam, Roderik van de Wal, Michiel van den Broeke, Ricarda Winkelmann, Harry Zekollari, Chen Zhao, Tong Zhang, Thomas Zwinger, « fewer authors
"Projected land ice contributions to twenty-first-century sea level rise",
Nature,
May 5, 2021,
593:74-82,
doi: 10.1038/s41586-021-03302-y
Sean Peisert,
"Trustworthy Scientific Computing",
Communications of the ACM (CACM),
May 2021,
doi: 10.1145/3457191
T. Groves, N. Ravichandrasekaran, B. Cook, N. Keen, D. Trebotich, N.… more authors » Wright, B. Alverson, D. Roweth, K. Underwood, « fewer authors
"Not All Applications Have Boring Communication Patterns: Profiling Message Matching with BMM",
Concurrency and Computation: Practice and Experience,
April 26, 2021,
doi: 0.1002/cpe.6380
Jordan Musser, Ann S Almgren, William D Fullmer, Oscar Antepara, John B… more authors » Bell, Johannes Blaschke, Kevin Gott, Andrew Myers, Roberto Porcu, Deepak Rangarajan, Michele Rosso, Weiqun Zhang, and Madhava Syamlal, « fewer authors
"MFIX:Exa: A Path Towards Exascale CFD-DEM Simulations",
The International Journal of High Performance Computing Applications,
April 16, 2021,
Jonathan Madsen,
Roofline Instrumentation with TiMemory,
ECP Annual Meeting,
April 2021,
Khaled Ibrahim,
Roofline on GPUs (advanced topics),
ECP Annual Meeting,
April 2021,
Jonathan Madsen,
Roofline Model using NSight Compute,
ECP Annual Meeting,
April 2021,
Samuel Williams,
Roofline Analysis on NVIDIA GPUs,
ECP Annual Meeting,
April 2021,
Samuel Williams,
Introduction to the Roofline Model,
ECP Annual Meeting,
April 2021,
Paul H. Hargrove, Dan Bonachea, Max Grossman, Amir Kamil, Colin A. MacLean, Daniel Waters,
"UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'21)",
Poster at Exascale Computing Project (ECP) Annual Meeting 2021,
April 2021,
Show Details
We present UPC++ and GASNet-EX, which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.
UPC++ is a C++ PGAS library, featuring APIs for Remote Memory Access (RMA) and Remote Procedure Call (RPC). The combination of these two features yields performant, scalable solutions to problems of interest within ECP.
GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients. GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems
Marco Pritoni, Drew Paine, Gabriel Fierro, Cory Mosiman, Michael Poplawski,… more authors » Joel Bender, Jessica Granderson, « fewer authors
"Metadata Schemas and Ontologies for Building Energy Applications: A Critical Review and Use Case Analysis",
Energies,
April 6, 2021,
doi: 10.3390/en14072024
Show Details
Digital and intelligent buildings are critical to realizing efficient building energy operations and a smart grid. With the increasing digitalization of processes throughout the life cycle of buildings, data exchanged between stakeholders and between building systems have grown significantly. However, a lack of semantic interoperability between data in different systems is still prevalent and hinders the development of energy-oriented applications that can be reused across buildings, limiting the scalability of innovative solutions. Addressing this challenge, our review paper systematically reviews metadata schemas and ontologies that are at the foundation of semantic interoperability necessary to move toward improved building energy operations. The review finds 40 schemas that span different phases of the building life cycle, most of which cover commercial building operations and, in particular, control and monitoring systems. The paper’s deeper review and analysis of five popular schemas identify several gaps in their ability to fully facilitate the work of a building modeler attempting to support three use cases: energy audits, automated fault detection and diagnosis, and optimal control. Our findings demonstrate that building modelers focused on energy use cases will find it difficult, labor intensive, and costly to create, sustain, and use semantic models with existing ontologies. This underscores the significant work still to be done to enable interoperable, usable, and maintainable building models. We make three recommendations for future work by the building modeling and energy communities: a centralized repository with a search engine for relevant schemas, the development of more use cases, and better harmonization and standardization of schemas in collaboration with industry to facilitate their adoption by stakeholders addressing varied energy-focused use cases.
Fabio Massacci, Trent Jaeger, Sean Peisert,
"SolarWinds and the Challenges of Patching: Can We Ever Stop Dancing With the Devil?",
IEEE Security & Privacy,
April 2021,
14-19,
doi: 10.1109/MSEC.2021.3050433
Sean Peisert, Bruce Schneier, Hamed Okhravi, Fabio Massacci, Terry Benzel,… more authors » Carl Landwehr, Mohammad Mannan, Jelena Mirkovic, Atul Prakash, James Bret Michael, « fewer authors
"Perspectives on the SolarWinds Incident",
IEEE Security & Privacy,
April 2021,
7-13,
doi: 10.1109/MSEC.2021.3051235
Karol Kowalski, Raymond Bair, Nicholas P. Bauman, Jeffery S. Boschen, Eric… more authors » J. Bylaska, Jeff Daily, Wibe A. de Jong, Thom Dunning, Niranjan Govind, Robert J. Harrison, Murat Keceli, Kristopher Keipert, Sriram Krishnamoorthy, Suraj Kumar, Erdal Mutlu, Bruce Palmer, Ajay Panyala, Bo Peng, Ryan M. Richard, T. P. Straatsma, Peter Sushko, Edward F. Valeev, Marat Valiev, Hubertus J. J. van Dam, Jonathan M. Waldrop, David B. Williams-Young, Chao Yang, Marcin Zalewski, Theresa L. Windus, « fewer authors
"From NWChem to NWChemEx: Evolving with the Computational Chemistry Landscape",
Chemical Reviews,
March 31, 2021,
doi: 10.1021/acs.chemrev.0c00998
Show Details
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.
Show Details
We present QFAST, a quantum synthesis tool designed to produce short circuits and to scale well in practice. Our contributions are: 1) a novel representation of circuits able to encode placement and topology; 2) a hierarchical approach with an iterative refinement formulation that combines "coarse-grained" fast optimization during circuit structure search with a good, but slower, optimization stage only in the final circuit instantiation. When compared against state-of-the-art techniques, although not always optimal, QFAST can reduce circuits for "time-dependent evolution" algorithms, as used by domain scientists, by 60x in depth. On typical circuits, it provides 4x better depth reduction than the widely used Qiskit and UniversalQ compilers. We also show the composability and tunability of our formulation in terms of circuit depth and running time. For example, we show how to generate shorter circuits by plugging in the best available third party synthesis algorithm at a given hierarchy level. Composability enables portability across chip architectures, which is missing from similar approaches.
QFAST is integrated with Qiskit and available at github.com/bqskit.
Akel Hashim, Ravi Naik, Alexis Morvan, Jean-Loup Ville, Brad Mitchell,… more authors » John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin O Brien, Ian Hincks, Joel Wallman, Joseph V Emerson, David Ivan Santiago, Irfan Siddiqi, « fewer authors
Scalable Quantum Computing on a Noisy Superconducting Quantum Processor via Randomized Compiling,
Bulletin of the American Physical Society,
2021,
Show Details
Coherent errors in quantum hardware severely limit the performance of quantum algorithms in an unpredictable manner, and mitigating their impact is necessary for realizing reliable, large-scale quantum computations. Randomized compiling achieves this goal by converting coherent errors into stochastic noise, dramatically reducing unpredictable errors in quantum algorithms and enabling accurate predictions of aggregate performance via cycle benchmarking estimates. In this work, we demonstrate significant performance gains under randomized compiling for both the four-qubit quantum Fourier transform algorithm and for random circuits of variable depth on a superconducting quantum processor. We also validate solution accuracy using experimentally-measured error rates. Our results demonstrate that randomized compiling can be utilized to maximally-leverage and predict the capabilities of modern-day noisy quantum processors, paving the way forward for scalable quantum computing.
Show Details
Partitioned Global Address Space (PGAS) models, pioneered by languages such as Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.
GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in future exascale machines. The library is an evolution of the popular GASNet communication system, building on 20 years of lessons learned. We describe several features and enhancements that have been introduced to address the needs of modern runtimes and exploit the hardware capabilities of emerging systems. Microbenchmark results demonstrate the RMA performance of GASNet-EX is competitive with several MPI implementations on current systems. GASNet-EX provides communication services that help to deliver speedups in HPC applications written using the UPC++ library, enabling new science on pre-exascale systems.
George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman,
Maximizing The Impact of Emerging Photonic Switches At The System Level,
SPIE photonics west,
March 2021,
Show Details
In the article “A Rational QZ Method” by D. Camps, K. Meerbergen, and R. Vandebril [SIAM J. Matrix Anal. Appl., 40 (2019), pp. 943--972], we introduced rational QZ (RQZ) methods. Our theoretical examinations revealed that the convergence of the RQZ method is governed by rational subspace iteration, thereby generalizing the classical QZ method, whose convergence relies on polynomial subspace iteration. Moreover the RQZ method operates on a pencil more general than Hessenberg---upper triangular, namely, a Hessenberg pencil, which is a pencil consisting of two Hessenberg matrices. However, the RQZ method can only be made competitive to advanced QZ implementations by using crucial add-ons such as small bulge multishift sweeps, aggressive early deflation, and optimal packing. In this paper we develop these techniques for the RQZ method. In the numerical experiments we compare the results with state-of-the-art routines for the generalized eigenvalue problem and show that the presented method is competitive in terms of speed and accuracy.
Tuowen Zhao, Mary Hall, Hans Johansen, Samuel Williams,
"Improving Communication by Optimizing On-Node Data Movement with Data Layout",
PPoPP,
February 2021,
Donghun Koo, Jaehwan Lee, Jialin Liu, Eun-Kyu Byun, Jae-Hyuck Kwak, Glenn… more authors » K Lockwood, Soonwook Hwang, Katie Antypas, Kesheng Wu, Hyeonsang Eom, « fewer authors
"An empirical study of I/O separation for burst buffers in HPC systems",
Journal of Parallel and Distributed Computing,
2021,
148:96-108,
doi: 10.1016/j.jpdc.2020.10.007
Jed Brown, Yunhui He, Scott MacLachlan, Matt Menickelly, Stefan M. Wild,
"Tuning Multigrid Methods with Robust Optimization and Local Fourier Analysis",
SIAM Journal on Scientific Computing,
2021,
A109--A138,
doi: 10.1137/19m1308669
E Younis, K Sen, K Yelick, C Iancu,
QFAST: Conflating Search and Numerical Optimization for Scalable Quantum Circuit Synthesis,
Proceedings - 2021 IEEE International Conference on Quantum Computing and Engineering, QCE 2021,
Pages: 232--243
2021,
doi: 10.1109/QCE52317.2021.00041
M Ellis, A Buluç, K Yelick,
Asynchrony versus bulk-synchrony for a generalized N-body problem from genomics,
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP,
Pages: 465--466
2021,
doi: 10.1145/3437801.3441580
I Nisa, P Pandey, M Ellis, L Oliker, A Buluc, K Yelick,
Distributed-memory k-mer counting on GPUs,
Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021,
Pages: 527--536
2021,
doi: 10.1109/IPDPS49936.2021.00061
G Blelloch, W Dally, M Martonosi, U Vishkin, K Yelick,
SPAA 21 panel paper: Architecture-friendly algorithms versus algorithm-friendly architectures,
Annual ACM Symposium on Parallelism in Algorithms and Architectures,
Pages: 1--7
2021,
doi: 10.1145/3409964.3461780
M Norman, V Kellen, S Smallen, B Demeulle, S Strande, E Lazowska, N… more authors » Alterman, R Fatland, S Stone, A Tan, K Yelick, E Van Dusen, J Mitchell, « fewer authors
CloudBank: Managed Services to Simplify Cloud Access for Computer Science Research and Education,
ACM International Conference Proceeding Series,
2021,
doi: 10.1145/3437359.3465586
M Ellis, A Buluc, K Yelick,
Scaling Generalized N-Body Problems, A Case Study from Genomics,
ACM International Conference Proceeding Series,
2021,
doi: 10.1145/3472456.3472517
Jeremy Hewes, others,
Graph Neural Network for Object Reconstruction in Liquid Argon Time Projection Chambers,
EPJ Web Conf.,
Pages: 03054
2021,
doi: 10.1051/epjconf/202125103054
Sabrina Amrouche, others,
The Tracking Machine Learning challenge : Throughput phase,
2021,
JE Damerow, C Varadharajan, K Boye, EL Brodie, M Burrus, KD Chadwick, R… more authors » Crystal-Ornelas, H Elbashandy, RJ Eloy Alves, KS Ely, AE Goldman, T Haberman, V Hendrix, Z Kakalia, KM Kemner, AB Kersting, N Merino, F O Brien, Z Perzan, E Robles, P Sorensen, JC Stegen, RL Walls, P Weisenhorn, M Zavarin, D Agarwal, « fewer authors
Sample identifiers and metadata to support data management and reuse in multidisciplinary ecosystem sciences,
Data Science Journal,
2021,
doi: 10.5334/dsj-2021-011
R Crystal-Ornelas, C Varadharajan, B Bond-Lamberty, K Boye, M Burrus, S… more authors » Cholia, M Crow, J Damerow, R Devarakonda, KS Ely, A Goldman, S Heinz, V Hendrix, Z Kakalia, SC Pennington, E Robles, A Rogers, M Simmonds, T Velliquette, H Weierbach, P Weisenhorn, JN Welch, DA Agarwal, « fewer authors
A Guide to Using GitHub for Developing and Versioning Data Standards and Reporting Formats,
Earth and Space Science,
2021,
doi: 10.1029/2021EA001797
Y Segawa, H Hirose, D Kaneko, M Hasegawa, S Adachi, P Ade, MAOA Faúndez, Y… more authors » Akiba, K Arnold, J Avva, C Baccigalupi, D Barron, D Beck, S Beckman, F Bianchini, D Boettger, J Borrill, J Carron, S Chapman, K Cheung, Y Chinone, K Crowley, A Cukierman, T De Haan, M Dobbs, R Dunner, HE Bouhargani, T Elleflot, J Errard, G Fabbian, S Feeney, C Feng, T Fujino, N Galitzki, N Goeckner-Wald, J Groh, G Hall, N Halverson, T Hamada, M Hazumi, C Hill, L Howe, Y Inoue, J Ito, G Jaehnig, O Jeong, N Katayama, B Keating, R Keskitalo, S Kikuchi, T Kisner, N Krachmalnicoff, A Kusaka, AT Lee, D Leon, E Linder, LN Lowry, A Mangu, F Matsuda, Y Minami, J Montgomery, M Navaroli, H Nishino, J Peloton, ATP Pham, D Poletti, G Puglisi, C Raum, CL Reichardt, C Ross, M Silva-Feaver, P Siritanasak, R Stompor, A Suzuki, O Tajima, S Takakura, S Takatori, D Tanabe, GP Teply, C Tsai, C Verges, B Westbrook, Y Zhou, « fewer authors
"Method for rapid performance validation of large TES bolometer array for POLARBEAR-2A using a coherent millimeter-wave source",
AIP Conference Proceedings,
2021,
2319,
doi: 10.1063/5.0038197
M Tristram, AJ Banday, KM Górski, R Keskitalo, CR Lawrence, KJ Andersen,… more authors » RB Barreiro, J Borrill, HK Eriksen, R Fernandez-Cobos, TS Kisner, E Martínez-González, B Partridge, D Scott, TL Svalheim, H Thommesen, IK Wehus, « fewer authors
"Planck constraints on the tensor-to-scalar ratio",
Astronomy and Astrophysics,
2021,
647,
doi: 10.1051/0004-6361/202039585
G Puglisi, R Keskitalo, T Kisner, JD Borrill,
Simulating Calibration and Beam Systematics for a Future CMB Space Mission with the TOAST Package,
Research Notes of the AAS,
Pages: 137--137
2021,
doi: 10.3847/2515-5172/ac0823
N Aghanim, Y Akrami, M Ashdown, J Aumont, C Baccigalupi, M Ballardini, AJ… more authors » Banday, RB Barreiro, N Bartolo, S Basak, R Battye, K Benabed, JP Bernard, M Bersanelli, P Bielewicz, JJ Bock, JR Bond, J Borrill, FR Bouchet, F Boulanger, M Bucher, C Burigana, RC Butler, E Calabrese, JF Cardoso, J Carron, A Challinor, HC Chiang, J Chluba, LPL Colombo, C Combet, D Contreras, BP Crill, F Cuttaia, P De Bernardis, G De Zotti, J Delabrouille, JM Delouis, E DI Valentino, JM DIego, O Doré, M Douspis, A Ducout, X Dupac, S Dusini, G Efstathiou, F Elsner, TA Enßlin, HK Eriksen, Y Fantaye, M Farhang, J Fergusson, R Fernandez-Cobos, F Finelli, F Forastieri, M Frailis, AA Fraisse, E Franceschi, A Frolov, S Galeotta, S Galli, K Ganga, RT Génova-Santos, M Gerbino, T Ghosh, J González-Nuevo, KM Górski, S Gratton, A Gruppuso, JE Gudmundsson, J Hamann, W Handley, FK Hansen, D Herranz, SR Hildebrandt, E Hivon, Z Huang, AH Jaffe, WC Jones, A Karakci, E Keihänen, R Keskitalo, K Kiiveri, J Kim, TS Kisner, L Knox, N Krachmalnicoff, M Kunz, H Kurki-Suonio, G Lagache, JM Lamarre, A Lasenby, M Lattanzi, CR Lawrence, M Le Jeune, P Lemos, J Lesgourgues, F Levrier, A Lewis, M Liguori, « fewer authors
"Erratum: Planck 2018 results: VI. Cosmological parameters (Astronomy and Astrophysics (2020) 641 (A6) DOI: 10.1051/0004-6361/201833910)",
Astronomy and Astrophysics,
2021,
652,
doi: 10.1051/0004-6361/201833910e
M Tristram, AJ Banday, KM Górski, R Keskitalo, CR Lawrence, KJ Andersen,… more authors » RB Barreiro, J Borrill, LPL Colombo, HK Eriksen, R Fernandez-Cobos, TS Kisner, E Martínez-González, B Partridge, D Scott, TL Svalheim, IK Wehus, « fewer authors
Improved limits on the tensor-to-scalar ratio using BICEP and Planck,
2021,
Abigail Polin, Peter Nugent, Daniel Kasen,
Nebular Models of Sub-Chandrasekhar Mass Type Ia Supernovae: Clues to the Origin of Ca-rich Transients,
Astrophysical Journal,
Pages: 65
2021,
doi: 10.3847/1538-4357/abcccc
C. Frohmaier, C. R. Angus, M. Vincenzi, M. Sullivan, M. Smith, P. E.… more authors » Nugent, S. B. Cenko, A. Gal-Yam, S. R. Kulkarni, N. M. Law, R. M. Quimby, « fewer authors
From core collapse to superluminous: the rates of massive stellar explosions from the Palomar Transient Factory,
Monthly Notices of the RAS,
Pages: 5142-5158
2021,
doi: 10.1093/mnras/staa3607
S. Yang, J. Sollerman, T. -W. Chen, E. C. Kool, R. Lunnan, S. Schulze, N.… more authors » Strotjohann, A. Horesh, M. Kasliwal, T. Kupfer, A. A. Mahabal, F. J. Masci, P. Nugent, D. A. Perley, R. Riddle, B. Rusholme, Y. Sharma, « fewer authors
Is supernova SN 2020faa an iPTF14hls look-alike?,
Astronomy and Astrophysics,
Pages: A22
2021,
doi: 10.1051/0004-6361/202039440
Nora L. Strotjohann, Eran O. Ofek, Avishay Gal-Yam, Rachel Bruch, Steve… more authors » Schulze, Nir Shaviv, Jesper Sollerman, Alexei V. Filippenko, Ofer Yaron, Christoffer Fremling, Jakob Nordin, Erik C. Kool, Dan A. Perley, Anna Y. Q. Ho, Yi Yang, Yuhan Yao, Maayane T. Soumagnac, Melissa L. Graham, Cristina Barbarino, Leonardo Tartaglia, Kishalay De, Daniel A. Goldstein, David O. Cook, Thomas G. Brink, Kirsty Taggart, Lin Yan, Ragnhild Lunnan, Mansi Kasliwal, Shri R. Kulkarni, Peter E. Nugent, Frank J. Masci, Philippe Rosnet, Scott M. Adams, Igor Andreoni, Ashot Bagdasaryan, Eric C. Bellm, Kevin Burdge, Dmitry A. Duev, Alison Dugas, Sara Frederick, Samantha Goldwasser, Matthew Hankins, Ido Irani, Viraj Karambelkar, Thomas Kupfer, Jingyi Liang, James D. Neill, Michael Porter, Reed L. Riddle, Yashvi Sharma, Phil Short, Francesco Taddia, Anastasios Tzanidakis, Jan van Roestel, Richard Walters, Zhuyun Zhuang, « fewer authors
Bright, Months-long Stellar Outbursts Announce the Explosion of Interaction-powered Supernovae,
Astrophysical Journal,
Pages: 99
2021,
doi: 10.3847/1538-4357/abd032
J. Johansson, A. Goobar, S. H. Price, A. Sagu\ es Carracedo, L. Della… more authors » Bruna, P. E. Nugent, S. Dhawan, E. M\ ortsell, S. Papadogiannakis, R. Amanullah, D. Goldstein, S. B. Cenko, K. De, A. Dugas, M. M. Kasliwal, S. R. Kulkarni, R. Lunnan, « fewer authors
Spectroscopy of the first resolved strongly lensed Type Ia supernova iPTF16geu,
Monthly Notices of the RAS,
Pages: 510-520
2021,
doi: 10.1093/mnras/staa3829
Chelsea E. Harris, Laura Chomiuk, Peter. E. Nugent,
Tumbling Dice: Radio Constraints on the Presence of Circumstellar Shells around Type Ia Supernovae with Impact Near Maximum Light,
Astrophysical Journal,
Pages: 23
2021,
doi: 10.3847/1538-4357/abe940
Charlotte Ward, Suvi Gezari, Sara Frederick, Erica Hammerstein, Peter… more authors » Nugent, Sjoert van Velzen, Andrew Drake, Abigail Garc\ \ia-P\ erez, Immaculate Oyoo, Eric C. Bellm, Dmitry A. Duev, Matthew J. Graham, Mansi M. Kasliwal, Stephen Kaye, Ashish A. Mahabal, Frank J. Masci, Ben Rusholme, Maayane T. Soumagnac, Lin Yan, « fewer authors
AGNs on the Move: A Search for Off-nuclear AGNs from Recoiling Supermassive Black Holes and Ongoing Galaxy Mergers with the Zwicky Transient Facility,
Astrophysical Journal,
Pages: 102
2021,
doi: 10.3847/1538-4357/abf246
Michael S. Medford, Peter Nugent, Danny Goldstein, Frank J. Masci, Igor… more authors » Andreoni, Ron Beck, Michael W. Coughlin, Dmitry A. Duev, Ashish A. Mahabal, Reed L. Riddle, « fewer authors
Removing Atmospheric Fringes from Zwicky Transient Facility i-band Images using Principal Component Analysis,
Publications of the ASP,
Pages: 064503
2021,
doi: 10.1088/1538-3873/abfe9d
Steve Schulze, Ofer Yaron, Jesper Sollerman, Giorgos Leloudas, Amit Gal,… more authors » Angus H. Wright, Ragnhild Lunnan, Avishay Gal-Yam, Eran O. Ofek, Daniel A. Perley, Alexei V. Filippenko, Mansi M. Kasliwal, Shrinivas R. Kulkarni, James D. Neill, Peter E. Nugent, Robert M. Quimby, Mark Sullivan, Nora Linn Strotjohann, Iair Arcavi, Sagi Ben-Ami, Federica Bianco, Joshua S. Bloom, Kishalay De, Morgan Fraser, Christoffer U. Fremling, Assaf Horesh, Joel Johansson, Patrick L. Kelly, Nikola Kne\vzevi\ c, Sladjana Kne\vzevi\ c, Kate Maguire, Anders Nyholm, Sem\ eli Papadogiannakis, Tanja Petrushevska, Adam Rubin, Lin Yan, Yi Yang, Scott M. Adams, Filomena Bufano, Kelsey I. Clubb, Ryan J. Foley, Yoav Green, Jussi Harmanen, Anna Y. Q. Ho, Isobel M. Hook, Griffin Hosseinzadeh, D. Andrew Howell, Albert K. H. Kong, Rubina Kotak, Thomas Matheson, Curtis McCully, Dan Milisavljevic, Yen-Chen Pan, Dovi Poznanski, Isaac Shivvers, Sjoert van Velzen, Kars K. Verbeek, « fewer authors
The Palomar Transient Factory Core-collapse Supernova Host-galaxy Sample. I. Host-galaxy Distribution Functions and Environment Dependence of Core-collapse Supernovae,
Astrophysical Journal Supplement,
Pages: 29
2021,
doi: 10.3847/1538-4365/abff5e
C. Ashall, J. Lu, E. Y. Hsiao, P. Hoeflich, M. M. Phillips, L. Galbany, C.… more authors » R. Burns, C. Contreras, K. Krisciunas, N. Morrell, M. D. Stritzinger, N. B. Suntzeff, F. Taddia, J. Anais, E. Baron, P. J. Brown, L. Busta, A. Campillay, S. Castell\ on, C. Corco, S. Davis, G. Folatelli, F. F\ orster, W. L. Freedman, C. Gonzal\ ez, M. Hamuy, S. Holmbo, R. P. Kirshner, S. Kumar, G. H. Marion, P. Mazzali, T. Morokuma, P. E. Nugent, S. E. Persson, A. L. Piro, M. Roth, F. Salgado, D. J. Sand, J. Seron, M. Shahbandeh, B. J. Shappee, « fewer authors
Carnegie Supernova Project: The First Homogeneous Sample of Super-Chandrasekhar-mass/2003fg-like Type Ia Supernovae,
Astrophysical Journal,
Pages: 205
2021,
doi: 10.3847/1538-4357/ac19ac
J. Johansson, S. B. Cenko, O. D. Fox, S. Dhawan, A. Goobar, V. Stanishev,… more authors » N. Butler, W. H. Lee, A. M. Watson, U. C. Fremling, M. M. Kasliwal, P. E. Nugent, T. Petrushevska, J. Sollerman, L. Yan, J. Burke, G. Hosseinzadeh, D. A. Howell, C. McCully, S. Valenti, « fewer authors
Near-infrared Supernova Ia Distances: Host Galaxy Extinction and Mass-step Corrections Revisited,
Astrophysical Journal,
Pages: 237
2021,
doi: 10.3847/1538-4357/ac2f9e
J Müller, B Faybishenko, D Agarwal, S Bailey, C Jiang, Y Ryu, C Tull, L Ramakrishnan,
Assessing data change in scientific datasets,
Concurrency and Computation: Practice and Experience,
2021,
doi: 10.1002/cpe.6245
K Yelick, D Agarwal, D Bard, J Shalf, A Almgren, W Bhimji, B Brown, J… more authors » Carter, B Jong, D Doerfler, D Donofrio, C Guok, C Iancu, M Kiran, S Li, P Nugent, M Prabhat, L Ramakrishnan, D Vasudevan, N Wright, H Cademartori, K Antypas, K Kincade, « fewer authors
2019 Computing Sciences Strategic Plan,
2021,
doi: 10.2172/1827673
SL Brantley, T Wen, DA Agarwal, JG Catalano, PA Schroeder, K Lehnert, C… more authors » Varadharajan, J Pett-Ridge, M Engle, AM Castronova, RP Hooper, X Ma, L Jin, K McHenry, E Aronson, AR Shaughnessy, LA Derry, J Richardson, J Bales, EM Pierce, « fewer authors
The future low-temperature geochemical data-scape as envisioned by the U.S. geochemical community,
Computers and Geosciences,
2021,
doi: 10.1016/j.cageo.2021.104933
Hannah Klion, Paul C. Duffell, Daniel Kasen, Eliot Quataert,
"The effect of jet-ejecta interaction on the viewing angle dependence of kilonova light curves",
Monthly Notices of the RAS,
2021,
502:865-875,
doi: 10.1093/mnras/stab042
Jean Luca Bez, Houjun Tang, Bing Xie, David Williams-Young, Rob Latham,… more authors » Rob Ross, Sarp Oral, Suren Byna, « fewer authors
"I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis",
2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW),
January 1, 2021,
15-22,
doi: 10.1109/PDSW54622.2021.00008
Tonglin Li, Suren Byna, Quincey Koziol, Houjun Tang, Jean Luca Bez, Qiao Kang,
"h5bench: HDF5 I/O Kernel Suite for Exercising HPC I/O Patterns",
Cray User Group (CUG) 2021,
January 1, 2021,
Ankur K. Gupta, Benjamin C. Gamoke, Krishnan Raghavachari,
Interaction–Deletion: A Composite Energy Method for the Optimization of Molecular Systems Selectively Removing Specific Nonbonded Interactions,
The Journal of Physical Chemistry A,
Pages: 4668-4682
2021,
doi: 10.1021/acs.jpca.1c02918
MG Awan, S Hofmeyr, R Egan, N Ding, A Buluc, J Deslippe, L Oliker, K Yelick,
"Accelerating Large Scale de novo Metagenome Assembly Using GPUs",
International Conference for High Performance Computing, Networking, Storage and Analysis, SC,
January 1, 2021,
doi: 10.1145/3458817.3476212
Jan-Tobias Sohns, Gunther H. Weber, Christoph Garth,
"Distributed Task-Parallel Topology-Controlled Volume Rendering",
Topological Methods in Data Analysis and Visualization VI: Theory, Algorithms, and Applications,
(Springer International Publishing:
2021)
Pages: 55-69
doi: 10.1007/978-3-030-83500-2_4
Hamish A. Carr, Gunther H. Weber, Christopher M. Sewell, Oliver R\ ubel,… more authors » Patricia Fasel, James P. Ahrens, « fewer authors
"Scalable Contour Tree Computation by Data Parallel Peak Pruning",
Transactions on Visualization and Computer Graphics,
2021,
27:2437--2454,
doi: 10.1109/TVCG.2019.2948616
Hamish Carr, Oliver Rübel, Gunther H. Weber, James Ahrens,
"Optimization and Augmentation for Data Parallel Contour Trees",
IEEE Transactions on Visualization and Computer Graphics,
2021,
doi: 10.1109/TVCG.2021.3064385
Robbie Sadre, Colin Ophus, Anstasiia Butko, Gunther H Weber,
"Deep Learning Segmentation of Complex Features in Atomic-Resolution Phase Contrast Transmission Electron Microscopy Images",
Microscopy and Microanalysis,
2021,
doi: 10.1017/S1431927621000167
Brad Mitchell, Ravi Naik, Alexis Morvan, Akel Hashim, John Mark Kreikebaum,… more authors » David Santiago, Irfan Siddiqi, « fewer authors
Calibration of the Cross-Resonance Gate using Closed-Loop Optimal Control,
Bulletin of the American Physical Society,
2021,
Gerwin Koolstra, Noah Stevenson, Karthik Siva, William Livingston, Ravi… more authors » Naik, John Steinmetz, Debmalya Das, Andrew Jordan, David Santiago, Irfan Siddiqi, « fewer authors
Diagnosing Gate Errors in Superconducting Qubits Using Continuous Measurements (Experiment),
Bulletin of the American Physical Society,
2021,
Ravi Naik, Brad Mitchell, Akel Hashim, John Mark Kreikebaum, David Santiago, Irfan Siddiqi,
Contextual Characterization of the Cross-Resonance Gate on a Multi-Qubit Superconducting Quantum Processor,
Bulletin of the American Physical Society,
2021,
Robin Blume-Kohout, Susan Clark, Akel Hashim, Craig Hogle, Daniel Lobser,… more authors » Ravi Naik, Timothy Proctor, Kenneth Rudinger, David Santiago, Irfan Siddiqi, others, « fewer authors
Simultaneous Gate Set Tomography,
Bulletin of the American Physical Society,
2021,
Joachim Cohen, Agustin Di Paolo, Larry Chen, Trevor Chistolini, John Mark… more authors » Kreikebaum, Long Nguyen, Ravi Naik, David Santiago, Irfan Siddiqi, Alexandre Blais, « fewer authors
Novel two-qubit gates for the light fluxonium qubit,
Bulletin of the American Physical Society,
2021,
Alexis Morvan, Vinay Ramasesh, Machiel Blok, John Mark Kreikebaum, Kevin O… more authors » Brien, Larry Chen, Ravi Naik, Brad Mitchell, David Santiago, Irfan Siddiqi, « fewer authors
Qutrit Randomized Benchmarking on a Transmon Quantum Processor,
Bulletin of the American Physical Society,
2021,
John Steinmetz, Debmalya Das, Gerwin Koolstra, Noah Stevenson, Karthik… more authors » Siva, William Livingston, Ravi Naik, David Santiago, Irfan Siddiqi, Andrew Jordan, « fewer authors
Diagnosing Errors in Qubit Gates Using Continuous Measurements (Theory),
Bulletin of the American Physical Society,
2021,
Noah Stevenson, Gerwin Koolstra, Karthik Siva, Ravi Naik, William… more authors » Livingston, Shiva Lotfallahzadeh Barzili, Justin Dressel, Irfan Siddiqi, « fewer authors
Tracking Non-Markovian Quantum Trajectories of a Superconducting Qubit from a Finite-Memory Bath,
Bulletin of the American Physical Society,
2021,
Yilun Xu, Gang Huang, Ravi Naik, Alexis Morvan, Kasra Nowrouzi, Brad… more authors » Mitchell, David Santiago, Irfan Siddiqi, « fewer authors
Automatic two-qubit gate calibration with qubic,
Bulletin of the American Physical Society,
2021,
Kevin He, Srivatsan Chakram, Akash Dixit, Andrew Oriani, Ravi Naik, Nelson… more authors » Leung, Hyeokshin Kwon, Riju Banerjee, Wen-Long Ma, Liang Jiang, others, « fewer authors
State preparation and tomography in 3D multimode circuit QED,
Bulletin of the American Physical Society,
2021,
Jean-Loup Ville, Alexis Morvan, Akel Hashim, Ravi K Naik, Bradley Mitchell,… more authors » John-Mark Kreikebaum, Kevin P O Brien, Joel J Wallman, Ian Hincks, Joseph Emerson, others, « fewer authors
Leveraging Randomized Compiling for the QITE Algorithm,
arXiv preprint arXiv:2104.08785,
2021,
Akash V Dixit, Srivatsan Chakram, Kevin He, Ankur Agrawal, Ravi K Naik,… more authors » David I Schuster, Aaron Chou, « fewer authors
"Searching for dark matter with a superconducting qubit",
Physical Review Letters,
2021,
126:141302,
doi: 10.1103/PhysRevLett.126.141302
Alexis Morvan, VV Ramasesh, MS Blok, JM Kreikebaum, K O’Brien, L Chen, BK… more authors » Mitchell, RK Naik, DI Santiago, I Siddiqi, « fewer authors
"Qutrit randomized benchmarking",
Physical Review Letters,
2021,
126:210504,
doi: 10.1103/PhysRevLett.126.210504
David Schuster, Ravi Naik, Srivatsan Chakram,
Technologies for long-lived 3d multimode microwave cavities,
2021,
O Selvitopi, B Brock, I Nisa, A Tripathy, K Yelick, A Buluç,
"Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication",
Proceedings of the International Conference on Supercomputing,
January 2021,
431--442,
doi: 10.1145/3447818.3461472
G Guidi, M Ellis, A Buluç, K Yelick, D Culler,
"10 years later: Cloud computing is closing the performance gap",
ICPE 2021 - Companion of the ACM/SPEC International Conference on Performance Engineering,
January 1, 2021,
41--48,
doi: 10.1145/3447545.3451183