Skip to navigation Skip to content
Careers | Phone Book | A - Z Index

Recent Publications

2022

L. Jin, A. Lazar, C. Brown, Q. Chen, A. Sim, K. Wu, S. Ravulaparthy, V. Garikapati, C. A. Spurlock, What Makes You Hold on to That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions, Transportation Research Board 101st Annual Meeting, 2022,

2021

J. Bang, C. Kim, K. Wu, A. Sim, S. Byna, H. Sung, H. Eom, "An In-Depth I/O Pattern Analysis in HPC Systems", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021,

S. Lee, Q. Kang, K. Wang, J. Balewski, A. Sim, A. Agrawal, A. Choudhary, P. Nugent, K. Wu, W-K. Liao, "Asynchronous I/O Strategy for Large-Scale Deep Learning Applications", IEEE International Conference on High Performance Computing, Data & Analytics (HiPC2021), 2021,

A. Lazar, L. Jin, C. Brown, C. A. Spurlock, A. Sim, K. Wu, "Performance of the Gold Standard and Machine Learning in Predicting Vehicle Transactions", the 3rd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD 2021), 2021,

James R. Clavin, Yue Huang, Xin Wang, Pradeep M. Prakash, Sisi Duan, Jianwu Wang, Sean Peisert, "A Framework for Evaluating BFT", Proceedings of the IEEE International Conference on Parallel and Distributed Systems (ICPADS), IEEE, December 2021,

Akel Hashim, Ravi K. Naik, Alexis Morvan, Jean-Loup Ville, Bradley Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin P. O Brien, Ian Hincks, Joel J. Wallman, Joseph Emerson, Irfan Siddiqi, "Randomized Compiling for Scalable Quantum Computing on a Noisy Superconducting Quantum Processor", Physical Review X, 2021, 11:041039, doi: 10.1103/PhysRevX.11.041039

Daniel Waters, Colin A. MacLean, Dan Bonachea, Paul H. Hargrove, "Demonstrating UPC++/Kokkos Interoperability in a Heat Conduction Simulation (Extended Abstract)", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S4630V


We describe the replacement of MPI with UPC++ in an existing Kokkos code that simulates heat conduction within a rectangular 3D object, as well as an analysis of the new code’s performance on CUDA accelerators. The key challenges were packing the halos in Kokkos data structures in a way that allowed for UPC++ remote memory access, and streamlining synchronization costs. Additional UPC++ abstractions used included global pointers, distributed objects, remote procedure calls, and futures. We also make use of the device allocator concept to facilitate data management in memory with unique properties, such as GPUs. Our results demonstrate that despite the algorithm’s good semantic match to message passing abstractions, straightforward modifications to use UPC++ communication deliver vastly improved performance and scalability in the common case. We find the one-sided UPC++ version written in a natural way exhibits good performance, whereas the message-passing version written in a straightforward way exhibits performance anomalies. We argue this represents a productivity benefit for one-sided communication models.

Amir Kamil, Dan Bonachea, "Optimization of Asynchronous Communication Operations through Eager Notifications", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S42C71


UPC++ is a C++ library implementing the Asynchronous Partitioned Global Address Space (APGAS) model. We propose an enhancement to the completion mechanisms of UPC++ used to synchronize communication operations that is designed to reduce overhead for on-node operations. Our enhancement permits eager delivery of completion notification in cases where the data transfer semantics of an operation happen to complete synchronously, for example due to the use of shared-memory bypass. This semantic relaxation allows removing significant overhead from the critical path of the implementation in such cases. We evaluate our results on three different representative systems using a combination of microbenchmarks and five variations of the the HPCChallenge RandomAccess benchmark implemented in UPC++ and run on a single node to accentuate the impact of locality. We find that in RMA versions of the benchmark written in a straightforward manner (without manually optimizing for locality), the new eager notification mode can provide up to a 25% speedup when synchronizing with promises and up to a 13.5x speedup when synchronizing with conjoined futures. We also evaluate our results using a graph matching application written with UPC++ RMA communication, where we measure overall speedups of as much as 11% in single-node runs of the unmodified application code, due to our transparent enhancements.

J. Cheung, A. Sim, J. Kim, K. Wu, "Performance Prediction of Large Data Transfers", ACM/IEEE The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), ACM Student Research Competition (SRC), 2021,

Paul H. Hargrove, Dan Bonachea, Colin A. MacLean, Daniel Waters, "GASNet-EX Memory Kinds: Support for Device Memory in PGAS Programming Models", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21) Research Poster, November 2021, doi: 10.25344/S4P306

Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work includes two major components: UPC++ (a C++ template library) and GASNet-EX (a portable, high-performance communication library). This poster describes recent advances in GASNet-EX to efficiently implement Remote Memory Access (RMA) operations to and from memory on accelerator devices such as GPUs. Performance is illustrated via benchmark results from UPC++ and the Legion programming system, both using GASNet-EX as their communications library.

Katherine A. Yelick, Amir Kamil, Damian Rouson, Dan Bonachea, Paul H. Hargrove, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications, Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21), November 16, 2021,

UPC++ is a C++ library supporting Partitioned Global Address Space (PGAS) programming. UPC++ offers low-overhead one-sided Remote Memory Access (RMA) and Remote Procedure Calls (RPC), along with future/promise-based asynchrony to express dependencies between computation and asynchronous data movement. UPC++ supports simple/regular data structures as well as more elaborate distributed applications where communication is fine-grained and/or irregular. UPC++ provides a uniform abstraction for one-sided RMA between host and GPU/accelerator memories anywhere in the system. UPC++'s support for aggressive asynchrony enables applications to effectively overlap communication and reduce latency stalls, while the underlying GASNet-EX communication library delivers efficient low-overhead RMA/RPC on HPC networks.

This tutorial introduces UPC++, covering the memory and execution models and basic algorithm implementations. Participants gain hands-on experience incorporating UPC++ features into application proxy examples. We examine a few UPC++ applications with irregular communication (metagenomic assembler and COVID-19 simulation) and describe how they utilize UPC++ to optimize communication performance.

Muaaz Gul Awan,Steven Hofmeyr,Rob Egan,Nan Ding,Aydin Buluc,Jack Deslippe,Leonid Oliker,Katherine Yelick, "Accelerating Large Scale de Novo Metagenome Assembly Using GPUs", The International Conference for High Performance Computing, Networking, Storage and Analysis ( SC21), November 16, 2021,

Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Drew Paine, Sarah Poon, Michael Beach, Alpha N'Diaye, Patrick Huck, Lavanya Ramakrishnan, "Science Capsule: Towards Sharing and Reproducibility of Scientific Workflows", 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), November 15, 2021, doi: 10.1109/WORKS54523.2021.00014

Workflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might include a diverse ecosystem of tools and also often involve human-in-the-loop steps. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we describe Science Capsule, which is a framework to capture, share, and reproduce scientific workflows. Science Capsule captures, manages and represents both computational and human elements of a workflow. It automatically captures and processes events associated with the execution and data life cycle of workflows, and lets users add other types and forms of scientific artifacts. Science Capsule also allows users to create `workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and data analysis workflows without incurring any significant performance overheads.

Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,

Tan Nguyen, Erich Strohmaier, John Shalf, "Facilitating CoDesign with Automatic Code Similarity Learning", 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), November 14, 2021,

Bradley K. Mitchell, Ravi K. Naik, Alexis Morvan, Akel Hashim, John Mark Kreikebaum, Brian Marinelli, Wim Lavrijsen, Kasra Nowrouzi, David I. Santiago, Irfan Siddiqi, "Hardware-Efficient Microwave-Activated Tunable Coupling between Superconducting Qubits", Physical Review Letters, 2021, 127:200502, doi: 10.1103/PhysRevLett.127.200502

A. Syal, A. Lazar, J. Kim, A. Sim, K. Wu, "Network traffic performance analysis from passive measurements using gradient boosting machine learning", International Journal of Big Data Intelligence, 2021, 8:13-30, doi: 10.1504/IJBDI.2021.118741

Y. Ma, F. Rusu, K. Wu, A. Sim, Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers, arXiv preprint arXiv:2110.07029, 2021,

Pietro Benedusi, Michael L Minion, Rolf Krause, "An experimental comparison of a space-time multigrid method with PFASST for a reaction-diffusion problem", Computers & Mathematics with Applications, October 1, 2021,

John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T


UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.

UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.

Yilun Xu, Gang Huang, Jan Balewski, Ravi Naik, Alexis Morvan, Bradley Mitchell, Kasra Nowrouzi, David I. Santiago, Irfan Siddiqi, "QubiC: An Open-Source FPGA-Based Control and Measurement System for Superconducting Quantum Information Processors", IEEE Transactions on Quantum Engineering, 2021, 2:1-11, doi: 10.1109/TQE.2021.3116540

Andrew Adams, Kay Avila, Elisa Heymann, Mark Krenz, Jason R. Lee, Barton Miller, Sean Peisert, "The State of the Scientific Software World: Findings of the 2021 Trusted CI Software Assurance Annual Challenge Interviews", Trusted CI Report, September 29, 2021,

Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001425, doi: 10.25344/S4XK53


UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.

E. Copps, A. Sim (Advisor), K. Wu (Advisor), "Analyzing scientific data sharing patterns with in-network data caching", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2021), ACM Student Research Competition (SRC), 2021,

Marco Siracusa, Emanuele Del Sozzo, Marco Rabozzi, Lorenzo Di Tucci, Samuel Williams, Donatella Sciuto, Marco Domenico Santambrogio, "A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model", Transactions on Computers (TC), September 2021,

Srivatsan Chakram, Andrew E. Oriani, Ravi K. Naik, Akash V. Dixit, Kevin He, Ankur Agrawal, Hyeokshin Kwon, David I. Schuster, "Seamless High-Q Microwave Cavities for Multimode Circuit Quantum Electrodynamics", Physical Review Letters, 2021, 127:107701, doi: 10.1103/PhysRevLett.127.107701

G Koolstra, N Stevenson, S Barzili, L Burns, K Siva, S Greenfield, W Livingston, A Hashim, RK Naik, JM Kreikebaum, KP O'Brien, DI Santiago, J Dressel, I Siddiqi, "Monitoring fast superconducting qubit dynamics using a neural network", Preprint, August 2021,

Tommaso Buvoli, Michael Minion, "IMEX Runge-Kutta Parareal for Non-diffusive Equations", Springer Proceedings in Mathematics & Statistics, August 25, 2021,

Sebastian Götschel, Michael Minion, Daniel Ruprecht, Robert Speck, "Twelve Ways To Fool The Masses When Giving Parallel-In-Time Results Authors", Springer Proceedings in Mathematics & Statistics, August 25, 2021,

Tan Nguyen, Colin MacLean, Marco Siracusa, Douglas Doerfler, Nicholas J. Wright, Samuel Williams, "FPGA‐based HPC accelerators: An evaluation on performance and energy efficiency", CCPE, August 22, 2021, doi: 10.1002/cpe.6570

Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, George Michelogiannakis, "Superconducting Shuttle-flux Shift Buffer for Race Logic", 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), August 2021,

Dan Bonachea, "UPC++ as_eager Working Group Draft, Revision 2020.6.2", Lawrence Berkeley National Laboratory Tech Report, August 9, 2021, LBNL 2001416, doi: 10.25344/S4FK5R

This draft proposes an extension for a new future-based completion variant that can be more effectively streamlined for RMA and atomic access operations that happen to be satisfied at runtime using purely node-local resources. Many such operations are most efficiently performed synchronously using load/store instructions on shared-memory mappings, where the actual access may only require a few CPU instructions. In such cases we believe it’s critical to minimize the overheads imposed by the UPC++ runtime and completion queues, in order to enable efficient operation on hierarchical node hardware using shared-memory bypass.

The new upcxx::{source,operation}_cx::as_eager_future() completion variant accomplishes this goal by relaxing the current restriction that future-returning access operations must return a non-ready future whose completion is deferred until a subsequent explicit invocation of user-level progress. This relaxation allows access operations that are completed synchronously to instead return a ready future, thereby avoiding most or all of the runtime costs associated with deferment of future completion and subsequent mandatory entry into the progress engine.

We additionally propose to make this new as_eager_future() completion variant the new default completion for communication operations that currently default to returning a future. This should encourage use of the streamlined variant, and may provide performance improvements to some codes without source changes. A mechanism is proposed to restore the legacy behavior on-demand for codes that might happen to rely on deferred completion for correctness.

Finally, we propose a new as_eager_promise() completion variant that extends analogous improvements to promise-based completion, and corresponding changes to the default behavior of as_promise().

Nan Ding, Muaaz Awan, Samuel Williams, "Instruction Roofline: An insightful visual performance model for GPUs", CCPE, August 4, 2021, doi: 10.1002/cpe.6591

Nan Ding, Yang Liu, Samuel Williams, Xiaoye S. Li, "A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), July 19, 2021,

Charlene Yang, Yunsong Wang, Thorsten Kurth, Steven Farrell, Samuel Williams, "Hierarchical Roofline Performance Analysis for Deep Learning Applications", Intelligent Computing, LNNS, July 15, 2021, doi: 10.1007/978-3-030-80126-7

B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, "Enhancing IoT Anomaly Detection Performance for Federated Learning", Digital Communications and Networks, Special Issue on Edge Computation and Intelligence, 2021,

M. Nakashima, A. Sim, Y. Kim, J. Kim, J. Kim, "Automated Feature Selection for Anomaly Detection in Network Traffic Data", ACM Transactions on Management Information Systems (TMIS), 2021, 12:1-28, doi: 10.1145/3446636

Drew Paine, Sarah Poon, Lavanya Ramakrishnan, "Investigating User Experiences with Data Abstractions on High Performance Computing Systems", June 29, 2021, LBNL LBNL-2001374,

Scientific exploration generates expanding volumes of data that commonly require High Performance Computing (HPC) systems to facilitate research. HPC systems are complex ecosystems of hardware and software that frequently are not user friendly. The Usable Data Abstractions (UDA) project set out to build usable software for scientific workflows in HPC environments by undertaking multiple rounds of qualitative user research. Qualitative research investigates how individuals accomplish their work and our interview-based study surfaced a variety of insights about the experiences of working in and with HPC ecosystems. This report examines multiple facets to the experiences of scientists and developers using and supporting HPC systems. We discuss how stakeholders grasp the design and configuration of these systems, the impacts of abstraction layers on their ability to successfully do work, and the varied perceptions of time that shape this work. Examining the adoption of the Cori HPC at NERSC we explore the anticipations and lived experiences of users interacting with this system's novel storage feature, the Burst Buffer. We present lessons learned from across these insights to illustrate just some of the challenges HPC facilities and their stakeholders need to account for when procuring and supporting these essential scientific resources to ensure their usability and utility to a variety of scientific practices.

Thomas M Evans, Andrew Siegel, Erik W Draeger,Jack Deslippe, Marianne M Francois, Timothy C Germann,William E Hart, Daniel F Martin, "A survey of software implementations usedby application codes in the ExascaleComputing Project", The International Journal of HighPerformance Computing Applications, June 25, 2021, doi: https://doi.org/10.1177/10943420211028940

Élie Genois, Jonathan A. Gross, Agustin Di Paolo, Noah J. Stevenson, Gerwin Koolstra, Akel Hashim, Irfan Siddiqi, Alexandre Blais, "Quantum-tailored machine-learning characterization of a superconducting qubit", Preprint, June 24, 2021,

Robin J Dolleman, Debadi Chakraborty, Daniel R Ladiges, Herre SJ van der Zant, John E Sader, Peter G Steeneken, "Squeeze-film effect on atomically thin resonators in the high-pressure limit", Submitted to Nano Letters, June 24, 2021,

Melanie E. Moses, Steven Hofmeyr, Judy L Cannon, Akil Andrews, Rebekah Gridley, Monica Hinga, Kirtus Leyba, Abigail Pribisova, Vanessa Surjadidjaja, Humayra Tasnim, Stephanie Forrest, "Spatially distributed infection increases viral load in a computational model of SARS-CoV-2 lung infection", June 2021, doi: 10.1101/2021.05.19.444569

Devarshi Ghoshal, Drew Paine, Gilberto Pastorello, Abdelrahman Elbashandy, Dan Gunter, Oluwamayowa Amusat, Lavanya Ramakrishnan, "Experiences with Reproducibility: Case Studies from Scientific Workflows", (P-RECS'21) Proceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems, ACM, June 21, 2021, doi: 10.1145/3456287.3465478

Reproducible research is becoming essential for science to ensure transparency and for building trust. Additionally, reproducibility provides the cornerstone for sharing of methodology that can improve efficiency. Although several tools and studies focus on computational reproducibility, we need a better understanding about the gaps, issues, and challenges for enabling reproducibility of scientific results beyond the computational stages of a scientific pipeline. In this paper, we present five different case studies that highlight the reproducibility needs and challenges under various system and environmental conditions. Through the case studies, we present our experiences in reproducing different types of data and methods that exist in an experimental or analysis pipeline. We examine the human aspects of reproducibility while highlighting the things that worked, that did not work, and that could have worked better for each of the cases. Our experiences capture a wide range of scenarios and are applicable to a much broader audience who aim to integrate reproducibility in their everyday pipelines.

A. Lazar, A. Sim, K. Wu, "GPU-based Classification for Wireless Intrusion Detection", 4th ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464445

Y. Wang, K. Wu, A. Sim, S. Yoo, S. Misawa, "Access Patterns of Disk Cache for Large Scientific Archive", 4th ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464444

E. Copps, H. Zhang, A. Sim, K. Wu, I. Monga, C. Guok, F. Würthwein, D. Davila, E. Fajardo, "Analyzing scientific data sharing patterns with in-network data caching", 4th ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2021), 2021, doi: 10.1145/3452411.3464441

Devarshi Ghoshal, Ludovico Bianchi, Abdelilah Essiari, Michael Beach, Drew Paine, Lavanya Ramakrishnan, "Science Capsule - Capturing the Data Life Cycle", Journal of Open Source Software, 2021, 6:2484, doi: 10.21105/joss.02484

Ayaz Akram, Venkatesh Akella, Sean Peisert, Jason Lowe-Power,, "Enabling Design Space Exploration for RISC-V Secure Compute Environments", Proceedings of the Fifth Workshop on Computer Architecture Research with RISC-V (CARRV), (co-located with ISCA 2021), June 17, 2021,

Oguz Selvitopi, Benjamin Brock, Israt Nisa, Alok Tripathy, Katherine Yelick, Aydın Buluç, "Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication", ICS '21: Proceedings of the ACM International Conference on Supercomputing, June 2021, 431-442, doi: 10.1145/3447818.3461472

Ciaran Roberts, Sy-Toan Ngo, Alexandre Milesi, Anna Scaglione, Sean Peisert, Daniel Arnold, "Deep Reinforcement Learning for Mitigating Cyber-Physical DER Voltage Unbalance Attacks”", Proceedings of the 2021 American Control Conference (ACC), May 2021,

David McCallen, Houjun Tang, Suiwen Wu, Eric Eckert, Junfei Huang, N Anders Petersson, "Coupling of regional geophysics and local soil-structure models in the EQSIM fault-to-structure earthquake simulation framework", The International Journal of High Performance Computing Applications, May 25, 2021,

George Michelogiannakis, SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC, IEEE International Parallel and Distributed Processing Symposium, May 2021,

Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad, "Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale", 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021, doi: 10.1109/IPDPS49936.2021.00018

Y. Ma, F. Ruso, A. Sim, K. Wu, "Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU+GPU Architectures", Heterogeneity in Computing Workshop (HCW 2021), in conjunction with the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2021, doi: 10.1109/IPDPSW52791.2021.00012

Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert, "Performance Analysis of Scientific Computing Workloads on General Purpose TEEs", Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE, May 2021,

George Michelogiannakis, Darren Lyles, Patricia Gonzalez-Guerrero, Meriam Bautista, Dilip Vasudevan, Anastasiia Butko, "SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC", May 2021,

Tamsin L. Edwards, Sophie Nowicki, Ben Marzeion, Regine Hock, Heiko Goelzer, Hélène Seroussi, Nicolas C. Jourdain, Donald A. Slater, Fiona E. Turner, Christopher J. Smith, Christine M. McKenna, Erika Simon, Ayako Abe-Ouchi, Jonathan M. Gregory, Eric Larour, William H. Lipscomb, Antony J. Payne, Andrew Shepherd, Cécile Agosta, Patrick Alexander, Torsten Albrecht, Brian Anderson, Xylar Asay-Davis, Andy Aschwanden, Alice Barthel, Andrew Bliss, Reinhard Calov, Christopher Chambers, Nicolas Champollion, Youngmin Choi, Richard Cullather, Joshua Cuzzone, Christophe Dumas, Denis Felikson, Xavier Fettweis, Koji Fujita, Benjamin K. Galton-Fenzi, Rupert Gladstone, Nicholas R. Golledge, Ralf Greve, Tore Hattermann, Matthew J. Hoffman, Angelika Humbert, Matthias Huss, Philippe Huybrechts, Walter Immerzeel, Thomas Kleiner, Philip Kraaijenbrink, Sébastien Le clec’h, Victoria Lee, Gunter R. Leguy, Christopher M. Little, Daniel P. Lowry, Jan-Hendrik Malles, Daniel F. Martin, Fabien Maussion, Mathieu Morlighem, James F. O’Neill, Isabel Nias, Frank Pattyn, Tyler Pelle, Stephen F. Price, Aurélien Quiquet, Valentina Radić, Ronja Reese, David R. Rounce, Martin Rückamp, Akiko Sakai, Courtney Shafer, Nicole-Jeanne Schlegel, Sarah Shannon, Robin S. Smith, Fiammetta Straneo, Sainan Sun, Lev Tarasov, Luke D. Trusel, Jonas Van Breedam, Roderik van de Wal, Michiel van den Broeke, Ricarda Winkelmann, Harry Zekollari, Chen Zhao, Tong Zhang, Thomas Zwinger, "Projected land ice contributions to twenty-first-century sea level rise", Nature, May 5, 2021, 593:74-82, doi: 10.1038/s41586-021-03302-y

D. A. Agarwal, J. Damerow, C. Varadharajan, D. S. Christianson, G. Z. Pastorello, Y.-W. Cheah, L. Ramakrishnan, "Balancing the needs of consumers and producers for scientific data collections", Ecological Informatics, 2021, 62:101251, doi: 10.1016/j.ecoinf.2021.101251

Sean Peisert, "Trustworthy Scientific Computing", Communications of the ACM (CACM), May 2021, doi: 10.1145/3457191

Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, Aydın Buluç, "BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), 2021, doi: 10.1101/464420

T. Groves, N. Ravichandrasekaran, B. Cook, N. Keen, D. Trebotich, N. Wright, B. Alverson, D. Roweth, K. Underwood, "Not All Applications Have Boring Communication Patterns: Profiling Message Matching with BMM", Concurrency and Computation: Practice and Experience, April 26, 2021, doi: 0.1002/cpe.6380

J. Kim, A. Sim, J. Kim, K, Wu, J. Hahm, Improving Botnet Detection with Recurrent Neural Network and Transfer Learning, arXiv preprint arXiv:2104.12602, 2021,

Douglas Doerfler, Farzad Fatollahi-Fard, Colin MacLean, Tan Nguyen, Samuel Williams, Nicholas J. Wright, Marco Siracusa, "Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs", International Workshop on OpenCL (iWOCL), April 2021,

Jonathan Madsen, Roofline Instrumentation with TiMemory, ECP Annual Meeting, April 2021,

Khaled Ibrahim, Roofline on GPUs (advanced topics), ECP Annual Meeting, April 2021,

Jonathan Madsen, Roofline Model using NSight Compute, ECP Annual Meeting, April 2021,

Samuel Williams, Roofline Analysis on NVIDIA GPUs, ECP Annual Meeting, April 2021,

Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, April 2021,

Paul H. Hargrove, Dan Bonachea, Max Grossman, Amir Kamil, Colin A. MacLean, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes", Poster at Exascale Computing Project (ECP) Annual Meeting 2021, April 2021,

We present UPC++ and GASNet-EX, which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.

UPC++ is a C++ PGAS library, featuring APIs for Remote Memory Access (RMA) and Remote Procedure Call (RPC).  The combination of these two features yields performant, scalable solutions to problems of interest within ECP.

GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients.  GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems

Marco Pritoni, Drew Paine, Gabriel Fierro, Cory Mosiman, Michael Poplawski, Joel Bender, Jessica Granderson, "Metadata Schemas and Ontologies for Building Energy Applications: A Critical Review and Use Case Analysis", Energies, April 6, 2021, doi: 10.3390/en14072024

Digital and intelligent buildings are critical to realizing efficient building energy operations and a smart grid. With the increasing digitalization of processes throughout the life cycle of buildings, data exchanged between stakeholders and between building systems have grown significantly. However, a lack of semantic interoperability between data in different systems is still prevalent and hinders the development of energy-oriented applications that can be reused across buildings, limiting the scalability of innovative solutions. Addressing this challenge, our review paper systematically reviews metadata schemas and ontologies that are at the foundation of semantic interoperability necessary to move toward improved building energy operations. The review finds 40 schemas that span different phases of the building life cycle, most of which cover commercial building operations and, in particular, control and monitoring systems. The paper’s deeper review and analysis of five popular schemas identify several gaps in their ability to fully facilitate the work of a building modeler attempting to support three use cases: energy audits, automated fault detection and diagnosis, and optimal control. Our findings demonstrate that building modelers focused on energy use cases will find it difficult, labor intensive, and costly to create, sustain, and use semantic models with existing ontologies. This underscores the significant work still to be done to enable interoperable, usable, and maintainable building models. We make three recommendations for future work by the building modeling and energy communities: a centralized repository with a search engine for relevant schemas, the development of more use cases, and better harmonization and standardization of schemas in collaboration with industry to facilitate their adoption by stakeholders addressing varied energy-focused use cases.

Fabio Massacci, Trent Jaeger, Sean Peisert, "SolarWinds and the Challenges of Patching: Can We Ever Stop Dancing With the Devil?", IEEE Security & Privacy, April 2021, 14-19, doi: 10.1109/MSEC.2021.3050433

Sean Peisert, Bruce Schneier, Hamed Okhravi, Fabio Massacci, Terry Benzel, Carl Landwehr, Mohammad Mannan, Jelena Mirkovic, Atul Prakash, James Bret Michael, "Perspectives on the SolarWinds Incident", IEEE Security & Privacy, April 2021, 7-13, doi: 10.1109/MSEC.2021.3051235

Daniel R. Ladiges, Sean P. Carney, Andrew Nonaka, Katherine Klymko, Guy C. Moore, Alejandro L. Garcia, Sachin R. Natesh, Aleksandar Donev, John B. Bell, "A Discrete Ion Stochastic Continuum Overdamped Solvent Algorithm for Modeling Electrolytes", Physical Review Fluids, April 1, 2021, 6(4):044309,

Karol Kowalski, Raymond Bair, Nicholas P. Bauman, Jeffery S. Boschen, Eric J. Bylaska, Jeff Daily, Wibe A. de Jong, Thom Dunning, Niranjan Govind, Robert J. Harrison, Murat Keceli, Kristopher Keipert, Sriram Krishnamoorthy, Suraj Kumar, Erdal Mutlu, Bruce Palmer, Ajay Panyala, Bo Peng, Ryan M. Richard, T. P. Straatsma, Peter Sushko, Edward F. Valeev, Marat Valiev, Hubertus J. J. van Dam, Jonathan M. Waldrop, David B. Williams-Young, Chao Yang, Marcin Zalewski, Theresa L. Windus, "From NWChem to NWChemEx: Evolving with the Computational Chemistry Landscape", Chemical Reviews, March 31, 2021, doi: 10.1021/acs.chemrev.0c00998

J. Goings, H. Hu, C. Yang, X. Li, "Reinforcement Learning Configuration Interaction", March 31, 2021,

Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2021.3.0", Lawrence Berkeley National Laboratory Tech Report, March 31, 2021, LBNL 2001388, doi: 10.25344/S4K881

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.

Georgios Tzimpragos, Jennifer Volk, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, John Shalf, Timothy Sherwood, "Temporal Computing With Superconductors", IEEE MIcro, March 2021, 41:71-79, doi: 10.1109/MM.2021.3066377

Dan Bonachea, GASNet-EX: A High-Performance, Portable Communication Library for Exascale, Berkeley Lab – CS Seminar, March 10, 2021,

Partitioned Global Address Space (PGAS) models, pioneered by languages such as Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.

GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in future exascale machines. The library is an evolution of the popular GASNet communication system, building on 20 years of lessons learned. We describe several features and enhancements that have been introduced to address the needs of modern runtimes and exploit the hardware capabilities of emerging systems. Microbenchmark results demonstrate the RMA performance of GASNet-EX is competitive with several MPI implementations on current systems. GASNet-EX provides communication services that help to deliver speedups in HPC applications written using the UPC++ library, enabling new science on pre-exascale systems. 

George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman, Maximizing The Impact of Emerging Photonic Switches At The System Level, SPIE photonics west, March 2021,

George Michelogiannakis, Min Yeh Teh, Madeleine Glick, John Shalf, Keren Bergman, "Maximizing the impact of emerging photonic switches at the system level", SPIE 11692, Optical Interconnects XXI, 116920Z, March 2021,

Giulia Guidi, Marquita Ellis, Aydin Buluc, Katherine Yelick, David Culler, "10 Years Later: Cloud Computing is Closing the Performance Gap", 4th Workshop on Hot Topics in Cloud Computing Performance (HotCloudPerf 2021) at the International Conference on Performance Engineering (ICPE) 2021., February 10, 2021,

Tuowen Zhao, Mary Hall, Hans Johansen, Samuel Williams, "Improving Communication by Optimizing On-Node Data Movement with Data Layout", PPoPP, February 2021,

Donghun Koo, Jaehwan Lee, Jialin Liu, Eun-Kyu Byun, Jae-Hyuck Kwak, Glenn K Lockwood, Soonwook Hwang, Katie Antypas, Kesheng Wu, Hyeonsang Eom, "An empirical study of I/O separation for burst buffers in HPC systems", Journal of Parallel and Distributed Computing, 2021, 148:96-108, doi: 10.1016/j.jpdc.2020.10.007

R. Van Beeumen, L. Perisa, D. Kressner, C. Yang, "A Flexible Power Method for Solving Infinite Dimensional Tensor Eigenvalue Problems", January 30, 2021,

Ankur K. Gupta, Benjamin C. Gamoke, Krishnan Raghavachari, Interaction–Deletion: A Composite Energy Method for the Optimization of Molecular Systems Selectively Removing Specific Nonbonded Interactions, The Journal of Physical Chemistry A, Pages: 4668-4682 2021, doi: 10.1021/acs.jpca.1c02918

Jan-Tobias Sohns, Gunther H. Weber, Christoph Garth, "Distributed Task-Parallel Topology-Controlled Volume Rendering", Mathematics and Visualization, (Springer International Publishing: 2021)

Hamish A. Carr, Gunther H. Weber, Christopher M. Sewell, Oliver R\ ubel, Patricia Fasel, James P. Ahrens, "Scalable Contour Tree Computation by Data Parallel Peak Pruning", Transactions on Visualization and Computer Graphics, 2021, 27:2437--2454, doi: 10.1109/TVCG.2019.2948616

Hamish Carr, Oliver Rübel, Gunther H. Weber, James Ahrens, "Optimization and Augmentation for Data Parallel Contour Trees", IEEE Transactions on Visualization and Computer Graphics, 2021, doi: 10.1109/TVCG.2021.3064385

Robbie Sadre, Colin Ophus, Anstasiia Butko, Gunther H Weber, "Deep Learning Segmentation of Complex Features in Atomic-Resolution Phase Contrast Transmission Electron Microscopy Images", Microscopy and Microanalysis, 2021, doi: 10.1017/S1431927621000167

Brad Mitchell, Ravi Naik, Alexis Morvan, Akel Hashim, John Mark Kreikebaum, David Santiago, Irfan Siddiqi, Calibration of the Cross-Resonance Gate using Closed-Loop Optimal Control, Bulletin of the American Physical Society, 2021,

Gerwin Koolstra, Noah Stevenson, Karthik Siva, William Livingston, Ravi Naik, John Steinmetz, Debmalya Das, Andrew Jordan, David Santiago, Irfan Siddiqi, Diagnosing Gate Errors in Superconducting Qubits Using Continuous Measurements (Experiment), Bulletin of the American Physical Society, 2021,

Ravi Naik, Brad Mitchell, Akel Hashim, John Mark Kreikebaum, David Santiago, Irfan Siddiqi, Contextual Characterization of the Cross-Resonance Gate on a Multi-Qubit Superconducting Quantum Processor, Bulletin of the American Physical Society, 2021,

Akel Hashim, Ravi Naik, Alexis Morvan, Jean-Loup Ville, Brad Mitchell, John Mark Kreikebaum, Marc Davis, Ethan Smith, Costin Iancu, Kevin O Brien, others, Scalable Quantum Computing on a Noisy Superconducting Quantum Processor via Randomized Compiling, Bulletin of the American Physical Society, 2021,

Robin Blume-Kohout, Susan Clark, Akel Hashim, Craig Hogle, Daniel Lobser, Ravi Naik, Timothy Proctor, Kenneth Rudinger, David Santiago, Irfan Siddiqi, others, Simultaneous Gate Set Tomography, Bulletin of the American Physical Society, 2021,

Joachim Cohen, Agustin Di Paolo, Larry Chen, Trevor Chistolini, John Mark Kreikebaum, Long Nguyen, Ravi Naik, David Santiago, Irfan Siddiqi, Alexandre Blais, Novel two-qubit gates for the light fluxonium qubit, Bulletin of the American Physical Society, 2021,

Alexis Morvan, Vinay Ramasesh, Machiel Blok, John Mark Kreikebaum, Kevin O Brien, Larry Chen, Ravi Naik, Brad Mitchell, David Santiago, Irfan Siddiqi, Qutrit Randomized Benchmarking on a Transmon Quantum Processor, Bulletin of the American Physical Society, 2021,

John Steinmetz, Debmalya Das, Gerwin Koolstra, Noah Stevenson, Karthik Siva, William Livingston, Ravi Naik, David Santiago, Irfan Siddiqi, Andrew Jordan, Diagnosing Errors in Qubit Gates Using Continuous Measurements (Theory), Bulletin of the American Physical Society, 2021,

Noah Stevenson, Gerwin Koolstra, Karthik Siva, Ravi Naik, William Livingston, Shiva Lotfallahzadeh Barzili, Justin Dressel, Irfan Siddiqi, Tracking Non-Markovian Quantum Trajectories of a Superconducting Qubit from a Finite-Memory Bath, Bulletin of the American Physical Society, 2021,

Yilun Xu, Gang Huang, Jan Balewski, Ravi K Naik, Alexis Morvan, Brad Mitchell, Kasra Nowrouzi, David I Santiago, Irfan Siddiqi, Automatic Qubit Characterization and Gate Optimization with QubiC, arXiv preprint arXiv:2104.10866, 2021,

Kenneth Rudinger, Craig W Hogle, Ravi K Naik, Akel Hashim, Daniel Lobser, David I Santiago, Matthew D Grace, Erik Nielsen, Timothy Proctor, Stefan Seritan, others, Experimental Characterization of Crosstalk Errors with Simultaneous Gate Set Tomography, arXiv preprint arXiv:2103.09890, 2021,

Yilun Xu, Gang Huang, Ravi Naik, Alexis Morvan, Kasra Nowrouzi, Brad Mitchell, David Santiago, Irfan Siddiqi, Automatic two-qubit gate calibration with qubic, Bulletin of the American Physical Society, 2021,

Kevin He, Srivatsan Chakram, Akash Dixit, Andrew Oriani, Ravi Naik, Nelson Leung, Hyeokshin Kwon, Riju Banerjee, Wen-Long Ma, Liang Jiang, others, State preparation and tomography in 3D multimode circuit QED, Bulletin of the American Physical Society, 2021,

Jean-Loup Ville, Alexis Morvan, Akel Hashim, Ravi K Naik, Bradley Mitchell, John-Mark Kreikebaum, Kevin P O Brien, Joel J Wallman, Ian Hincks, Joseph Emerson, others, Leveraging Randomized Compiling for the QITE Algorithm, arXiv preprint arXiv:2104.08785, 2021,

Akash V Dixit, Srivatsan Chakram, Kevin He, Ankur Agrawal, Ravi K Naik, David I Schuster, Aaron Chou, "Searching for dark matter with a superconducting qubit", Physical Review Letters, 2021, 126:141302, doi: 10.1103/PhysRevLett.126.141302

Alexis Morvan, VV Ramasesh, MS Blok, JM Kreikebaum, K O’Brien, L Chen, BK Mitchell, RK Naik, DI Santiago, I Siddiqi, "Qutrit randomized benchmarking", Physical Review Letters, 2021, 126:210504, doi: 10.1103/PhysRevLett.126.210504

David Schuster, Ravi Naik, Srivatsan Chakram, Technologies for long-lived 3d multimode microwave cavities, 2021,

Nazanin Jafari, Oguz Selvitopi, Cevdet Aykanat, "Fast shared-memory streaming multilevel graph partitioning", Journal of Parallel and Distributed Computing, January 2021, 147:140-151, doi: https://doi.org/10.1016/j.jpdc.2020.09.004

2020

Ling Jin, Alina Lazar, James Sears, Annika Todd, Alex Sim, Kesheng Wu, Hung-Chai Yang, C. Anna Spurlock, "Clustering Life Course to Understand the Heterogeneous Effects of Life Events, Gender and Generation on Habitual Travel Modes", IEEE Access, 2020, 1-17, doi: 10.1109/ACCESS.2020.3032328

B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, "Enhancing IoT Anomaly Detection Performance for Federated Learning", The 16th IEEE International Conference on Mobility, Sensing and Networking (IEEE MSN 2020), 2020, doi: 10.1109/MSN50589.2020.00045

D. B. Williams-Young, W. A. de Jong, H. J. J. van Dam and C. Yang, "On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters", Frontiers in Chemistry, December 10, 2020, 8:951, doi: 10.3389/fchem.2020.581058

C. Yang, J. Brabec, L. Veis, D. B. Williams-Young, K. Kolwaski, "Solving Coupled Cluster Equations by the Newton Krylov Method", Frontiers in Chemistr, December 10, 2020, 8:987, doi: 10.3389/fchem.2020.590184

B. Cho, T. Dayrit, Y. Gao, Z. Wang, T. Hong, A. Sim, K. Wu, "Effective Missing Value Imputation Methods for Building Monitoring Data", The 2nd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD 2020) in conjunction with IEEE International Conference on Big Data (IEEE BigData 2020), 2020, doi: 10.1109/BigData50022.2020.9378230

Veronica Rodr\iguez Tribaldos, Nathaniel J Lindsey, Shan Dou, Craig Ulrich, Michelle Robertson, Bin Dong, Vincent Dumont, Kesheng Wu, Inder Monga, Chris Tracy, others, Combining Ambient Noise and Distributed Acoustic Sensing (DAS) Deployed on Dark Fiber Networks for High-resolution Imaging at the Basin Scale, AGU Fall Meeting 2020, 2020,

V. Dumont, V. Rodriguez Tribaldos, J. Ajo-Franklin, K. Wu, "Deep Learning for Surface Wave Identification in Distributed Acoustic Sensing Data", IEEE BigData 2020, December 8, 2020,

J. Kim, A. Sim, J. Kim, K. Wu, "Botnets Detection Using Recurrent Variational Autoencoder", IEEE Global Communications Conference (Globecom 2020), 2020, doi: 10.1109/GLOBECOM42002.2020.9348169

David McCallen, Anders Petersson, Arthur Rodgers, Arben Pitarka, Mamun Miah, Floriana Petrone, Bjorn Sjogreen, Norman Abrahamson, Houjun Tang, "EQSIM-A multidisciplinary framework for fault-to-structure earthquake simulations on exascale computers part I: Computational models and workflow", Earthquake Spectra, December 1, 2020, 37:707-735,

Anastasiia Butko, George Michelogiannakis, Samuel Williams, Costin Iancu, David Donofrio, John Shalf, Jonathan Carter, Irfan Siddiqi, "Understanding Quantum Control Processor Capabilities and Limitations through Circuit Characterization", IEEE Conference on Rebooting Computing (ICRC), December 2020,

Roel Van Beeumen, Khaled Z. Ibrahim, Gregory D. Kahanamoku-Meyer, Norman Y. Yao, Chao Yang, "Enhancing Scalability of a Matrix-Free Eigensolver for Studying Many-Body Localization", December 1, 2020,

Dan Wang, Qiang Du, Tong Zhou, Bashir Mohammed, Mariam Kiran, Derun Li, Russell Wilcox, "Artificial Neural Networks Applied to Stabilization of 81-beam Coherent Combining", Advanced Solid State Lasers, Optical Society of America, December 1, 2020,

Bashir Mohammed, Mariam Kiran, Dan Wang, Qiang Du, Russell Wilcox, "Deep Reinforcement Learning based Control for two-dimensional Coherent Combining", Laser Applications Conference, pp. JTu5A-7. Optical Society of America, 2020., OSA Publishing, December 1, 2020,

I. Monga, C. Guok, J. MacAuley, A. Sim, H. Newman, J. Balcas, P. DeMar, L. Winkler, T. Lehman, X. Yang, "SDN for End-to-end Networked Science at the Exascale", Future Generation Computer Systems, 2020, doi: 10.1016/j.future.2020.04.018

Madeleine Glick, Nathan C. Abrams, Qixiang Cheng, Min Yee Teh, Yu-Han Hung, Oscar Jimenez, Songtao Liu, Yoshitomo Okawachi, Xiang Meng, Leif Johansson, Manya Ghobadi, Larry Dennison, George Michelogiannakis, John Shalf, Alan Liu, John Bowers, Alex Gaeta, Michal Lipson, and Keren Bergman, "PINE: Photonic Integrated Networked Energy efficient datacenters (ENLITENED Program)", Journal of Optical Communications and Networking, 2020, 12:443-456,

B Mohammed, IU Awan, H Ugail, and Y Mohammad., "Failure Prediction using Machine Learning in a Virtualized HPC System and Application", Cluster Computing: The Journal of Networks, Software Tools and Applications, 2020, 471–485,

Chris Lawson, Jose Manuel Martí, Tijana Radivojevic, Sai Vamshi R. Jonnalagadda, Reinhard Gentz, Nathan J. Hillson, Sean Peisert, Joonhoon Kim, Blake A. Simons, Christopher J. Petzold, Steven W. Singer, Aindrila Mukhopadhyay, Deepti Tanjore, Josh Dunn, Héctor García Martín,, "Machine Learning for Metabolic Engineering: A Review", Metabolic Engineering, November 19, 2020,

Min Yee Teh, Yu-Han Hung, George Michelogiannakis, Shijia Yan, Madeleine Glick, John Shalf, Keren Bergman, "TAGO: rethinking routing design in high performance reconfigurable networks", SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2020,

B. Enders, D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, "Cross-facility science with the Superfacility Project at LBNL", 2nd Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP 2020), in conjunction with the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 20), 2020, doi: 10.1109/XLOOP51963.2020.00006

D. Bard, C. Snavely, L. Gerhardt, J. Lee, B. Totzke, K. Antypas, S. Byna, R. Cheema, S. Cholia, M. Day, B. Enders, A. Gaur, A. Greiner, T. Groves, M. Kiran, Q. Koziol, K. Rowland, C. Samuel, A. Selvarajan, A. Sim, D. Skinner, R. Thomas, G. Torok, The Superfacility project: automated pipelines for experiments and HPC, International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20), State of the Practice (SOP), 2020,

Brett Weinger, Alex Sim (Advisor), John Wu (Advisor), Jinoh Kim (Advisor), "Enhancing IoT Anomaly Detection Performance for Federated Learning", International Conference for High Performance Computing, Networking, Storage and Analysis (SC’20), ACM Student Research Competition (SRC), 2020,

Taylor Groves, Ben Brock, Yuxin Chen, Khaled Ibrahim, Lenny Oliker, Nicholas J. Wright, Samuel Williams, Katherine Yelick, "Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2020,

Tan Nguyen, Samuel Williams, Marco Siracusa, Colin MacLean, Douglas Doerfler, Nicholas J. Wright, "The Performance and Energy Efficiency Potential of FPGAs in Scientific Computing", (BEST PAPER) Performance Modeling, Benchmarking, and Simulation of High Performance Computer Systems (PMBS), November 2020,

Yunsong Wang, Charlene Yang, Steven Farrell, Yan Zhang, Thorsten Kurth, Samuel Williams, "Time-Based Roofline for Deep Learning Performance Analysis", Deep Learning on Supercomputing (DLonSC), November 2020,

Ciaran Roberts Sy-Toan Ngo, Alexandre Milesi, Sean Peisert, Daniel Arnold, Shammya Saha, Anna Scaglione, Nathan Johnson, Anton Kocheturov, Dmitriy Fradkin, "Deep Reinforcement Learning for DER Cyber-Attack Mitigation", Proceedings of the IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), IEEE, November 2020,

Ignacio Losada Carreño, Raksha Ramakrishna, Anna Scaglione, Daniel Arnold, Ciaran Roberts, Sy-Toan Ngo, Sean Peisert, David Pinney, "SODA: An Irradiance-Based Tool to Generate Sub-Minute Solar Power Stochastic Time Series", Proceedings of the IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), IEEE, November 2020,

Katherine A. Yelick, Amir Kamil, Dan Bonachea, Paul H Hargrove, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications, Tutorial at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20), November 10, 2020,

UPC++ is a C++ library supporting Partitioned Global Address Space (PGAS) programming. The UPC++ API offers low-overhead one-sided Remote Memory Access (RMA) and Remote Procedure Calls (RPC), along with future/promise-based asynchrony to express dependencies between asynchronous computations and data movement. UPC++ supports simple, regular data structures as well as more elaborate distributed structures where communication is fine-grained, irregular, or both. UPC++'s support for aggressive asynchrony enables the application to overlap communication to reduce communication wait times, and the GASNet communication layer provides efficient low-overhead RMA/RPC on HPC networks.

This tutorial introduces basic concepts and advanced optimization techniques of UPC++. We discuss the UPC++ memory and execution models and examine basic algorithm implementations. Participants gain hands-on experience incorporating UPC++ features into several application examples. We also examine two irregular applications (metagenomic assembler and multifrontal sparse solver) and describe how they leverage UPC++ features to optimize communication performance.

 

Charlene Yang, Accelerating Large-Scale Excited-State Studies in Materials Science, Supercomputing (SC), November 2020,

Samuel Williams, Introduction to the Roofline Model, Supercomputing (SC), November 2020,

N Krishnaswamy; M Kiran; B Mohammed; Singh Kunal, "Data-driven Learning to Predict WAN Network Traffic.", SNTA '20: Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics, November 3, 2020, 11-18, doi: 10.1145/3391812.3396268

B Mohammed, M Kiran; N Krishnaswamy; Keshang, Wu, "Predicting WAN Traffic Volumes using Fourier and Multivariate SARIMA Approach", International Journal of Big Data Intelligence, November 3, 2020,

M Hocine, M Kiran, A Mercian, and B Mohammed, "Using Machine Learning for Intent-based provisioning in High-Speed Science Networks.", SNTA '20: Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics, November 2, 2020, 27-30, doi: 10.1145/3391812.3396269

T Mallick, M Kiran, B Mohammed, Prasanna Balaprakash, "Dynamic Graph Neural Network for Traffic Forecasting in Wide Area Networks.", Machine Learning Big Data 2020, November 2, 2020,

Marco Siracusa, Marco Rabozzi, Emanuele Del Sozzo, Lorenzo Di Tucci, Samuel Williams, Marco D. Santambrogio, "A CAD-based methodology to optimize HLS code via the Roofline model", International Conference on Computer Aided Design (ICCAD), November 2020,

Jonathan Blair Ajo-Franklin, Ver\ onica Rodr\ \iguez Tribaldos, Avinash Nayak, Nathaniel J Lindsey, Feng Cheng, Benxin Chi, Bin Dong, Kesheng Wu, Inder Monga, Distributed Acoustic Sensing (DAS) at the Plot to Basin Scale: Connecting Near-Surface Sensing and Seismology with a Common Observational Tool, AGU Fall Meeting 2020, 2020,

Drew Paine, Lavanya Ramakrishnan, "Understanding Interactive and Reproducible Computing With Jupyter Tools at Facilities", LBNL Technical Report, October 31, 2020, LBNL LBNL-2001355,

Increasingly Jupyter tools are being adopted and incorporated into High Performance Computing (HPC) and scientific user facilities. Adopting Jupyter tools enables more interactive and reproducible computational work at facilities across data life cycles. As the volume, variety, and scope of data grow, scientists need to be able to analyze and share results in user friendly ways. Human-centered research highlights design challenges around computational notebooks, and our qualitative user study shifts focus to better characterize how Jupyter tools are being used in HPC and science user facilities today. We conducted twenty-nine interviews, and obtained 103 survey responses from NERSC Jupyter users, to better understand the increasing role of interactive computing tools in DOE sponsored scientific work. We examine a range of issues that emerge using and supporting Jupyter in HPC ecosystems, including: how Jupyter is being used by scientists in HPC and user facility ecosystems; how facilities are purposefully supporting Jupyter in their ecosystems; feedback NERSC users have about the facility’s deployment, and, discuss features NERSC indicated would be helpful. We offer a variety of takeaways for staff supporting Jupyter at facilities, Project Jupyter and related open source communities, and funding agencies supporting interactive computing work.

John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2020.10.0", Lawrence Berkeley National Laboratory Tech Report, October 2020, LBNL 2001368, doi: 10.25344/S4HG6Q

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores. 

V. Dumont, V. Rodriguez Tribaldos, J. Ajo-Franklin, K. Wu, "Deep Learning on Real Geophysical Data: A Case Study for Distributed Acoustic Sensing Research", NeurIPS "Machine Learning and the Physical Sciences" workshop, 2020,

Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2020.10.0", Lawrence Berkeley National Laboratory Tech Report, October 30, 2020, LBNL 2001367, doi: 10.25344/S4CS3F

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.

Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, Sean Peisert, "Performance Analysis of Scientific Computing Workloads on Trusted Execution Environments", arXiv preprint arXiv:2010.13216, October 25, 2020,

J. Hu, J. K. Webb, T. R. Ayres, M. B. Bainbridge, J. D. Barrow, M. A. Barstow, J. C. Berengut, R. F. Carswell, V. Dumont, V. Dzuba, V. V. Flambaum, C. C. Lee, N. Reindl, S. P. Preval, W. -Ü. L. Tchang-Brillet, "Measuring the fine-structure constant on a white dwarf surface; a detailed analysis of Fe V absorption in G191−B2B", Monthly Notices of the Royal Astronomical Society, Volume 500, Issue 1, January 2021, Pages 1466–1475, October 23, 2020, doi: 10.1093/mnras/staa3066

Giulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Katherine Yelick, Aydin Buluc, "Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly", Proceedings of the IPDPS, 2021., October 20, 2020,

T. Hernandez, R. Van Beeumen, M. Caprio, C. Yang, "A greedy algorithm for computing eigenvalues of a symmetric matrix with localized eigenvectors", Numerical Linear Algebra and Applications, October 9, 2020, 28:e2341, doi: https://doi.org/10.1002/nla.2341

A. Sim, Statistical Pattern Detection with Locally Exchangeable Measures, International Conference on Advanced Communications and Computation (INFOCOMP 2020), 2020,

Benjamin Nachman, Miroslav Urbanek, Wibe A. de Jong, Christian W. Bauer, "Unfolding quantum computer readout noise", npj Quantum Information, 2020, 6:84, doi: 10.1038/s41534-020-00309-7

Christopher Daley, Hadia Ahmed, Samuel Williams, Nicholas Wright, "A case study of porting HPGMG from CUDA to OpenMP target offload", The International Workshop on OpenMP (IWOMP), September 2020,

Muaaz G Awan, Jack Deslippe, Aydin Buluc, Oguz Selvitopi, Steven Hofmeyr, Leonid Oliker, Katherine Yelick, "ADEPT: a domain independent sequence alignment strategy for gpu architectures", BMC Bioinformatics, September 2020, 21, doi: https://doi.org/10.1186/s12859-020-03720-1

D. Camps, R. Van Beeumen, C. Yang, "Quantum Fourier Transform Revisited", Numerical Linear Algebra and Applications, September 15, 2020, 28:e2331, doi: https://doi.org/10.1002/nla.2331

Sun, S., Pattyn, F., Simon, E., Albrecht, T., Cornford, S., Calov, R., . . . Zhang, T., "Antarctic ice sheet response to sudden and sustained ice-shelf collapse (ABUMIP)", Journal of Glaciology, September 14, 2020, 1-14, doi: 10.1017/jog.2020.67

Li Zhou, Lihao Yan, Mark A. Caprio, Weiguo Gao, Chao Yang, "Solving the k-sparse Eigenvalue Problem with Reinforcement Learning", September 9, 2020,

Drew Paine, Devarshi Ghoshal, Lavanya Ramakrishnan, "Experiences with a Flexible User Research Process to Build Data Change Tools", Journal of Open Research Software, September 1, 2020, doi: 10.5334/jors.284

Scientific software development processes are understood to be distinct from commercial software development practices due to uncertain and evolving states of scientific knowledge. Sustaining these software products is a recognized challenge, but under-examined is the usability and usefulness of such tools to their scientific end users. User research is a well-established set of techniques (e.g., interviews, mockups, usability tests) applied in commercial software projects to develop foundational, generative, and evaluative insights about products and the people who use them. Currently these approaches are not commonly applied and discussed in scientific software development work. The use of user research techniques in scientific environments can be challenging due to the nascent, fluid problem spaces of scientific work, varying scope of projects and their user communities, and funding/economic constraints on projects.

In this paper, we reflect on our experiences undertaking a multi-method user research process in the Deduce project. The Deduce project is investigating data change to develop metrics, methods, and tools that will help scientists make decisions around data change. There is a lack of common terminology since the concept of systematically measuring and managing data change is under explored in scientific environments. To bridge this gap we conducted user research that focuses on user practices, needs, and motivations to help us design and develop metrics and tools for data change. This paper contributes reflections and the lessons we have learned from our experiences. We offer key takeaways for scientific software project teams to effectively and flexibly incorporate similar processes into their projects.

Miroslav Urbanek, Benjamin Nachman, Wibe A. de Jong, "Error detection on quantum computers improving the accuracy of chemical calculations", Physical Review A, 2020, 102:022427, doi: 10.1103/PhysRevA.102.022427

Oguz Selvitopi*, Saliya Ekanayake*, Giulia Guidi, Georgios Pavlopoulos, Ariful Azad, Aydın Buluç, "Distributed Many-to-Many Protein Sequence Alignment Using Sparse Matrices", Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’20)., 2020,

(*:joint first authors)

Patricia Gonzalez-Guerrero, Tommy Tracy II, Xinfei Guo, Rahul Sreekumar, Marzieh Lenjani, Kevin Skadron, Mircea R Stan, "Towards on-node Machine Learning for Ultra-low-power Sensors Using Asynchronous Σ Δ Streams", Journal on Emerging Technologies in Computing Systems (JETC), August 26, 2020, doi: https://doi.org/10.1145/3404975

We propose a novel architecture to enable low-power, complex on-node data processing, for the next generation of sensors for the internet of things (IoT), smartdust, or edge intelligence. Our architecture combines near-analog-memory-computing (NAM) and asynchronous-computing-with-streams (ACS), eliminating the need for ADCs. ACS enables ultra-low power, massive computational resources required to execute on-node complex Machine Learning (ML) algorithms; while NAM addresses the memory-wall that represents a common bottleneck for ML and other complex functions. In ACS an analog value is mapped to an asynchronous stream that can take one of two logic levels (vhvl). This stream-based data representation enables area/power-efficient computing units such as a multiplier implemented as an AND gate yielding savings in power of ∼90% compared to digital approaches. The generation of streams for NAM and ACS in a brute force manner, using analog-to-digital-converters (ADCs) and digital-to-streams-converters, would sky-rocket the power-latency-energy cost making the approach impractical. Our NAM-ACS architecture eliminates expensive conversions, enabling an end-to-end processing on asynchronous streams data-path. We tailor the NAM-ACS architecture for random forest (RaF), an ML algorithm, chosen for its ability to classify using a reduced number of features. Simulations show that our NAM-ACS architecture enables 75% of savings in power compared with a single ADC, obtaining a classification accuracy of 85% using an RaF-inspired algorithm

Miroslav Urbanek, Daan Camps, Roel Van Beeumen, Wibe A. de Jong, "Chemistry on quantum computers with virtual quantum subspace expansion", Journal of Chemical Theory and Computation, 2020, 16:5425–5431, doi: 10.1021/acs.jctc.0c00447

Drew Paine, Devarshi Ghoshal, Lavanya Ramakrishnan, "Investigating Scientific Data Change with User Research Methods", August 20, 2020, LBNL LBNL-2001347,

Scientific datasets are continually expanding and changing due to fluctuations with instruments, quality assessment and quality control processes, and modifications to software pipelines. Datasets include minimal information about these changes or their effects requiring scientists manually assess modifications through a number of labor intensive and ad-hoc steps. The Deduce project is investigating data change to develop metrics, methods, and tools that will help scientists systematically identify and make decisions around data changes. Currently, there is a lack of understanding, and common practices, for identifying and evaluating changes in datasets since systematically measuring and managing data change is under explored in scientific work. We are conducting user research to address this need by exploring scientist's conceptualizations, behaviors, needs, and motivations when dealing with changing datasets. Our user research utilizes multiple methods to produce foundational, generative insights and evaluate research products produced by our team. In this paper, we detail our user research process and outline our findings about data change that emerge from our studies. Our work illustrates how scientific software teams can push beyond just usability testing user interfaces or tools to better probe the underlying ideas they are developing solutions to address.

George Michelogiannakis, Forecasting the future of HPC systems, RIPCON 2020, August 2020,

C. A. Spurlock, A. Gopal, J. Auld, P. Leiby, C. Sheppard, T. Wenzel, S. Belal, A. Duvall, A. Enam, S. Fujita, A. Henao, L. Jin, E. Kontou, A. Lazar, Z. Needell, C. Rames, T. Rashidi, J. Sears, A. Sim, M. Stinson, M. Taylor, A. Todd-Blick, O. Verbas, V. Walker, J. Ward, G. Wong-Parodi, K. Wu, H.-C. Yang, "SMART Mobility, Mobility Decision Science Capstone Report", Vehicle Technologies Office (VTO), Office of Energy Efficiency and Renewable Energy (EERE), US Department of Energy, 2020,

Samuel Williams, The Roofline Model: A Bridge between Computer Science, Applied Math, and Computational Science, SciDAC Meeting, July 2020,

John Shalf, George Michelogiannakis, Brian Austin, Taylor Groves, Manya Ghobadi, Larry Dennison, Tom Gray, Yiwen Shen, Min Yee Teh, Madeleine Glick, and Keren Bergman, "Photonic Memory Disaggregation in Datacenters", OSA Advanced Photonics Congress (AP), July 2020,

Gustavo Chavez, Elizaveta Rebrova, Yang Liu, Pieter Ghysels, Xiaoye Sherry Li, "Scalable and memory-efficient kernel ridge regression", 34th IEEE International Parallel and Distributed Processing Symposium, July 14, 2020,

Bin Dong, Ver\ onica Rodr\ \iguez Tribaldos, Xin Xing, Suren Byna, Jonathan Ajo-Franklin, Kesheng Wu, "DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection", 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 14, 2020, 254--263,

G. Z. Pastorello, C. Trotta, E. Canfora, H. Chu, D. Christianson, Y.-W. Cheah, C. Poindexter, J. Chen, A. Elbashandy, M. Humphrey, P. Isaac, D. Polidori, M. Reichstein, A. Ribeca, C. van Ingen, N. Vuichard, L. Zhang, B. Amiro, C. Ammann, M. A. Arain, J. Ardö, T. Arkebauer, S. K. Arndt, N. Arriga, M. Aubinet, M. Aurela, D. Baldocchi, A. Barr, E. Beamesderfer, L. B. Marchesini, O. Bergeron, J. Beringer, C. Bernhofer, D. Berveiller, D. Billesbach, T. A. Black, P. D. Blanken, G. Bohrer, J. Boike, P. V. Bol stad, D. Bonal, J.-M. Bonnefond, D. R. Bowling, R. Bracho, J. Brodeur, C. Brümmer, N. Buchmann, B. Burban, S. P. Burns, P. Buysse, P. Cale, M. Cavagna, P. Cellier, S. Chen, I. Chini, T. R. Chris tensen, J. Cleverly, A. Collalti, C. Consalvo, B. D. Cook, D. Cook, C. Coursolle, E. Cremonese, P. S. Curtis, E. D’Andrea, H. da Rocha, X. Dai, K. J. Davis, B. D. Cinti, A. de Grandcourt, A. D. Ligne, R. C. D. Oliveira, N. Delpierre, A. R. Desai, C. M. D. Bella, P. di Tommasi, H. Dolman, F. Domingo, G. Dong, S. Dore, P. Duce, E. Dufrêne, A. Dunn, J. Dušek, D. Eamus, U. Eichelmann, H. A. M. ElKhidir, W. Eugster, C. M. Ewenz, B. Ewers, D. Famulari, S. Fares, I. Feigenwinter, A. Feitz, R. Fensholt, G. Fil ippa, M. Fischer, J. Frank, M. Galvagno, M. Gharun, D. Gianelle, B. Gielen, B. Gioli, A. Gitelson, I. Goded, M. Goeckede, A. H. Goldstein, C. M. Gough, M. L. Goulden, A. Graf, A. Griebel, C. Gruening, T. Grünwald, A. Hammerle, S. Han, X. Han, B. U. Hansen, C. Hanson, J. Hatakka, Y. He, M. Hehn, B. Heinesch, N. Hinko-Najera, L. Hörtnagl, L. Hutley, A. Ibrom, H. Ikawa, M. Jackowicz-Korczynski, D. Janouš, W. Jans, R. Jassal, S. Jiang, T. Kato, M. Khomik, J. Klatt, A. Knohl, S. Knox, H. Kobayashi, G. Koerber, O. Kolle, Y. Kosugi, A. Kotani, A. Kowalski, B. Kruijt, J. Kurbatova, W. L. Kutsch, H. Kwon, S. Launiainen, T. Laurila, B. Law, R. Leuning, Y. Li, M. Liddell, J.-M. Limousin, M. Lion, A. J. Liska, A. Lohila, A. López-Ballesteros, E. López-Blanco, B. Loubet, D. Loustau, A. Lucas-Moffat, J. Lüers, S. Ma, C. Macfarlane, V. Magliulo, R. Maier, I. Mammarella, G. Manca, B. Marcolla, H. A. Margolis, S. Mar ras, W. Massman, M. Mastepanov, R. Matamala, J. H. Matthes, F. Mazzenga, H. McCaughey, I. McHugh, A. M. S. McMillan, L. Merbold, W. Meyer, T. Meyers, S. D. Miller, S. Minerbi, U. Moderow, R. K. Monson, L. Montagnani, C. E. Moore, E. Moors, V. Moreaux, C. Moureaux, J. W. Munger, T. Nakai, J. Neirynck, Z. Nesic, G. Nicolini, A. Noormets, M. Northwood, M. Nosetto, Y. Nouvellon, K. Novick, W. Oechel, J. E. Olesen, J.-M. Ourcival, S. A. Papuga, F.-J. Parmentier, E. Paul-Limoges, M. Pavelka, M. Peichl, E. Pendall, R. P. Phillips, K. Pilegaard, N. Pirk, G. Posse, T. Powell, H. Prasse, S. M. Prober, S. Ram bal, U. Rannik, N. Raz-Yaseef, D. Reed, V. R. de Dios, N. Restrepo-Coupe, B. R. Reverter, M. Roland, S. Sabbatini, T. Sachs, S. R. Saleska, E. P. S.-C. nete, Z. M. Sanchez-Mejia, H. P. Schmid, M. Schmidt, K. Schneider, F. Schrader, I. Schroder, R. L. Scott, P. Sedlák, P. Serrano-Ortíz, C. Shao, P. Shi, I. Shironya, L. Siebicke, L. Šigut, R. Silberstein, C. Sirca, D. Spano, R. Steinbrecher, R. M. Stevens, C. Sturtevant, A. Suyker, T. Tagesson, S. Takanashi, Y. Tang, N. Tapper, J. Thom, F. Tiedemann, M. Tomassucci, J.-P. Tuovinen, S. Urbanski, R. Valentini, M. van der Molen, E. van Gorsel, K. van Huissteden, A. Varlagin, J. Verfaillie, T. Vesala, C. Vincke, D. Vitale, N. Vygodskaya, J. P. Walker, E. Walter-Shea, H. Wang, R. Weber, S. Westermann, C. Wille, S. Wofsy, G. Wohlfahrt, S. Wolf, W. Woodgate, Y. Li, R. Zampedri, J. Zhang, G. Zhou, D. Zona, D. Agarwal, S. Biraud, M. Torn, D. Papale, "The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data", Scientific Data, 2020, 7:225, doi: 10.1038/s41597-020-0534-3

Samuel Williams, Introduction to the Roofline Model, NERSC NVIDIA Roofline Hackathon, July 2020,

Oguz Selvitopi, Seher Acer, Murat Manguoğlu, Cevdet Aykanat, "The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures", Parallel Algorithms in Computational Science and Engineering, (Birkhäuser, Cham: July 2020) Pages: 35-62 doi: https://doi.org/10.1007/978-3-030-43736-7_2

Yang Liu, Eric Michielssen, "Parallel fast time-domain integral-equation methods for transient electromagnetic analysis", Parallel Algorithms in Computational Science and Engineering, ( July 7, 2020)

Samuel Williams, Introduction to the Roofline Model, NERSC GPU For Science Workshop, July 2020,

Yang Liu, Pieter Ghysels, Lisa Claus, Xiaoye Sherry Li, "Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations", arxiv-preprint, July 1, 2020,

Leen Alawieh, Jonathan Goodman, John B. Bell, "Iterative construction of Gaussian process surrogate models for Bayesian inference", Journal of Statistical Planning and Inference, 2020,

J. Galen Wang, Qi Li, Xiaoguang Peng, Gregory B. McKenna, Roseanna N. Zia, "“Dense diffusion” in colloidal glasses: short-ranged long-time self-diffusion as a mechanistic model for relaxation dynamics", Soft Matter, June 30, 2020,

Gaurav R Ghosal, Dipak Ghosal, Alex Sim, Aditya V Thakur, Kesheng Wu, "A Deep Deterministic Policy Gradient Based Network Scheduler For Deadline-Driven Data Transfers", Proceedings of International Federation for Information Processing (IFIP) Networking Conference (NETWORKING 2020), 2020, 253--261,

Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Yongseok Son, Hyeonsang Eom, "Towards hpc i/o performance prediction through large-scale log analysis", Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 77--88, doi: 10.1145/3369583.3392678

Nan Ding, Victor W. Lee, Wei Xue, Weimin Zheng, "APMT: an automatic hardware counter-based performance modeling tool for HPC applications", CCF Transactions on High Performance Computing, June 24, 2020,

Jiwoo Bang, Chungyong Kim, Kesheng Wu, Alex Sim, Suren Byna, Sunggon Kim, Hyeonsang Eom, "HPC Workload Characterization Using Feature Selection and Clustering", ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2020), in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 33--40, doi: 10.1145/3391812.3396270

Jeeyung Kim, Alex Sim, Jinoh Kim, Kesheng Wu, Jaegyoon Hahm, "Transfer Learning Approach for Botnet Detection Based on Recurrent Variational Autoencoder", ACM International Workshop on ​System and Network Telemetry and Analysis (SNTA 2020), in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020), 2020, 41--47, doi: 10.1145/3391812.3396273

S. Bhandari, A. K. Kukreja, A. Lazar, A. Sim, K. Wu, "Feature Selection and Tree-based Classification for Wireless Intrusion Detection", the 3rd ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2020, in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2020, doi: 10.1145/3391812.3396274

M. Nakashima, A. Sim, J. Kim, "Evaluation of Deep Learning Models for Network PerformancePrediction for Scientific Facilities", the 3rd ACM International Workshop on System and Network Telemetry and Analysis (SNTA) 2020, in conjunction with The 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2020, doi: 10.1145/3391812.3396272

E. Motheau, J. Wakefield, "Investigation of finite-volume methods to capture shocks and turbulence spectra in compressible flows", Commun. in Appl. Math. and Comput. Sci, 15-1 (2020), 1--36, June 3, 2020,

Jonathan R Madsen, Muaaz G Awan, Hugo Brunie, Jack Deslippe, Rahul Gayatri, Leonid Oliker, Yunsong Wang, Charlene Yang, Samuel Williams, "TiMemory: Modular Performance Analysis for HPC", International Supercomputing Conference (ISC), June 2020,

Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Rob Egan, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Kathy Yelick, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications, Argonne Leadership Computing Facility (ALCF) Webinar Series, May 27, 2020,

UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The UPC++ API offers low-overhead one-sided RMA communication and Remote Procedure Calls (RPC), along with futures and promises. These constructs enable the programmer to express dependencies between asynchronous computations and data movement. UPC++ supports the implementation of simple, regular data structures as well as more elaborate distributed data structures where communication is fine-grained, irregular, or both. The library’s support for asynchrony enables the application to aggressively overlap and schedule communication and computation to reduce wait times.

UPC++ is highly portable and runs on platforms from laptops to supercomputers, with native implementations for HPC interconnects. As a C++ library, it interoperates smoothly with existing numerical libraries and on-node programming models (e.g., OpenMP, CUDA).

In this webinar, hosted by DOE’s Exascale Computing Project and the ALCF, we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through basic algorithm implementations. We will also look at irregular applications and show how they can take advantage of UPC++ features to optimize their performance.

Event page

Video recording

Yu-Hang Tang, Oguz Selvitopi, Doru Thom Popovici, Aydın Buluç, "A high-throughput solver for marginalized graph kernels on GPU", IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, May 2020, doi: 10.1109/IPDPS47924.2020.00080

Oguz Selvitopi, Md Taufique Hussain, Ariful Azad, Aydın Buluç, "Optimizing high performance markov clustering for pre-exascale architectures", IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, May 2020, doi: 10.1109/IPDPS47924.2020.00022

Benjamin Brock, Aydin Buluç, Timothy G Mattson, Scott McMillan, José E Moreira, Roger Pearce, Oguz Selvitopi, Trevor Steil, "Considerations for a Distributed GraphBLAS API", IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, May 2020, doi: 10.1109/IPDPSW50202.2020.00048

Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Parallel Query Service for Object-centric Data Management Systems", 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, May 18, 2020, 406-415,

H. Sung, J. Bang, C. Kim, H. Kim, A. Sim, G. K. Lockwood, H. Eom, "BBOS: Efficient HPC Storage Management via Burst Buffer Over-Subscription", the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2020), 2020, doi: 10.1109/CCGrid49817.2020.00-79

Qiao Kang, Alex Sim, Peter Nugent, Sunwoo Lee, Wei-keng Liao, Ankit Agrawal, Alok Choudhary, Kesheng Wu, "Predicting Resource Requirement in Intermediate Palomar Transient Factory Workflow", 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), 2020, 619--628, doi: 10.1109/CCGrid49817.2020.00-31

Shao-Jun Dong, Chao Wang, Yong-Jian Han, Chao Yang and Lixin He, "Stable diagonal stripes in the t–J model at nhbar = 1/8 doping from fPEPS calculations", npj Quantum Materials, May 8, 2020, 5:28, doi: https://doi.org/10.1038/s41535-020-0226-4

S. B. Kachuck, D. F. Martin, J. N. Bassis, S. F. Price, "Rapid viscoelastic deformation slows marine ice sheet instability at Pine Island Glacier", Geophysical Research Letters, May 7, 2020, 47, doi: 10.1029/2019GL086446

S. B. Kachuck, D. F. Martin, J. N. Bassis, S. F. Price, "Rapid viscoelastic deformation slows marine ice sheet instability at Pine Island Glacier", Geophysical Research Letters, May 7, 2020, 47, doi: 10.1029/2019GL086446

Bogdan Copos, Sean Peisert, "Catch Me If You Can: Using Power Analysis to Identify HPC Activity", arXiv:2005.03135 [cs.CR], May 6, 2020,

Daniel F. Martin, Stephen L. Cornford, Esmond G Ng, Effect of Improved Bedrock Geometry on Antarctic Vulnerability to Regional Ice Shelf Collapse, European Geosciences Union 2020 General Assembly, May 5, 2020,

Andrew Wells, James Parkinson, Daniel F Martin, Three-dimensional convection, phase change, and solute transport in mushy sea ice, European Geosciences Union 2020 General Assembly,, May 4, 2020,

Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D. Santambrogio, Steven Hofmeyr, Aydın Buluç, Leonid Oliker, Katherine Yelick, "LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment", 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS20), 2020,

H. Masia-Roig, J. A. Smiga, D. Budker, V. Dumont, Z. Grujic, D. Kim, D. F. Jackson Kimball, V. Lebedev, M. Monroy, S. Pustelny, T. Scholtes, P. C. Segura, Y. K. Semertzidis, Y. Chang Shin, J. E. Stalnaker, I. Sulai, A. Weis, A. Wickenbrock, "Analysis method for detecting topological defect dark matter with a global magnetometer network", Physics of the Dark Universe, Volume 28, 100494, May 2020, doi: 10.1016/j.dark.2020.100494

C. T. Kelley, J. Bernholc, E. L. Briggs, S. Hamilton, L. Lin and C. Yang, "Mesh Independence of the Generalized Davidson Algorithm", Journal of Computational Physics, May 1, 2020, 409:109322, doi: https://doi.org/10.1016/j.jcp.2020.109322

Kai-Hsin Liou, Chao Yang and James R.Chelikowsky, "Scalable Implementation of Polynomial Filering for Density Functional Theory Calculation in PARSEC", Computer Physics Communications, April 28, 2020, In press, doi: https://doi.org/10.1016/j.cpc.2020.107330

M. R. Wilczynska, J. K. Webb, M. Bainbridge, S. E. I. Bosman, J. D. Barrow, R. F. Carswell, M. P. Dabrowski, V. Dumont, A. C. Leite, C. Lee, K. Leszczynska, J. Liske, K. Marosek, C. J. A. P. Martins, D. Milakovic, P. Molaro, L. Pasquini, "Four direct measurements of the fine-structure constant 13 billion years ago", Science Advances, Volume 6, No. 17, eaay9672, April 24, 2020, doi: 10.1126/sciadv.aay9672

Christine T. Wolf, Drew Paine, "Sensemaking Practices in the Everyday Work of AI/ML Software Engineering", IEEE/ACM 42nd International Conference on Software Engineering Workshops (ICSEW’20), ACM, April 5, 2020, doi: 10.1145/3387940.3391496

Li Zhou, Chao Yang, Weiguo Gao, Talita Perciano, Karen M. Davies, Nicholas K. Sauter, "Subcellular structure segmentation from cryo-electron tomograms via machine learning", PLOS Journal of Computational Biology, April 2, 2020, submitte, doi: doi: https://doi.org/10.1101/2020.04.09.034025

Georgios Tzimpragos, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, Jennifer Volk, John Shalf, Timothy Sherwood, "A Computational Temporal Logic for Superconducting Accelerators", ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, March 2020,

F. Henneke, L. Lin, C. Vorwerk, C. Draxl, R. Klein and C. Yang, "Fast optical absorption spectra calculations for periodic solid state systems", Communications in Applied Mathematics and Computational Science, March 16, 2020, in press,

John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2020.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2020, LBNL 2001269, doi: 10.25344/S4P88Z

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Dan Bonachea, Amir Kamil, "UPC++ v1.0 Specification, Revision 2020.3.0", Lawrence Berkeley National Laboratory Tech Report, March 12, 2020, LBNL 2001268, doi: 10.25344/S4T01S

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). All communication operations are syntactically explicit and default to non-blocking; asynchrony is managed through the use of futures, promises and continuation callbacks, enabling the programmer to construct a graph of operations to execute asynchronously as high-latency dependencies are satisfied. A global pointer abstraction provides system-wide addressability of shared memory, including host and accelerator memories. The parallelism model is primarily process-based, but the interface is thread-safe and designed to allow efficient and expressive use in multi-threaded applications. The interface is designed for extreme scalability throughout, and deliberately avoids design features that could inhibit scalability.

Samuel Williams, Charlene Yang, Yunsong Wang, Roofline Performance Modeling for HPC and Deep Learning Applications, NVIDIA GPU Technology Conference (GTC), March 2020,

Muammar El Khatib, Wibe De Jong, Feature Extraction Using Semi-Supervised Deep Learning., APS March 2020, March 5, 2020,

Features are defined as measurable properties that characterize observed phenomena and represent a key part of machine learning (ML) algorithms. In materials sciences, ML has successfully accelerated atomistic simulations using man-engineered features for tasks such as energy or atomic forces predictions. These features fulfill physics constraints such as rotational and translational invariance, uniqueness and, locality (the sum of local contributions reconstructs a global quantity). However, these ML models are known to perform poorly when operating out of the training set regime because features are not representative of the underlying structure of the data. This could be improved if features are extracted with advanced hybrid architectures e.g. a variational autoencoder that is trained with physics constraints introduced with an external task and a loss function. We will explore how the use of semi-supervised learning techniques can be a powerful tool for the extraction of features for atomistic simulations. All results shown herein can be reproduced with ML4Chem: a free software package for machine learning in chemistry and materials sciences.

Marzieh Lenjani, Patricia Gonzalez, Elaheh Sadredini, Shuangchen Li, Yuan Xie, Ameen Akel, Sean Eilert, Mircea R Stan, Kevin Skadron, "Fulcrum: a simplified control and access mechanism toward flexible and practical in-situ accelerators", International Symposium on High Performance Computer Architecture (HPCA), San Diego, CA, USA, IEEE, February 22, 2020, doi: 10.1109/HPCA47549.2020.00052

In-situ approaches process data very close to the memory cells, in the row buffer of each subarray. This minimizes data movement costs and affords parallelism across subarrays. However, current in-situ approaches are limited to only row-wide bitwise (or few-bit) operations applied uniformly across the row buffer. They impose a significant overhead of multiple row activations for emulating 32-bit addition and multiplications using bitwise operations and cannot support operations with data dependencies or based on predicates. Moreover, with current peripheral logic, communication among subarrays is inefficient, and with typical data layouts, bits in a word are not physically adjacent. The key insight of this work is that in-situ, single-word ALUs outperform in-situ, parallel, row-wide, bitwise ALUs by reducing the number of row activations and enabling new operations and optimizations. Our proposed lightweight access and control mechanism, Fulcrum, sequentially feeds data into the single-word ALU and enables operations with data dependencies and operations based on a predicate. For algorithms that require communication among subarrays, we augment the peripheral logic with broadcasting capabilities and a previously-proposed method for low-cost inter-subarray data movement. The sequential processor also enables overlapping of broadcasting and computation, and reuniting bits that are physically adjacent. In order to realize true subarray-level parallelism, we introduce a lightweight column-selection mechanism through shifting one-hot encoded values. This technique enables independent column selection in each subarray. We integrate Fulcrum with Compress Express Link (CXL), a new interconnect standard. Fulcrum with one memory stack delivers on average (up to) 23.4 (76) speedup over a server-class GPU, NVIDIA P100, with three stacks of HBM2 memory, (ii) 70 (228) times speedup per memory stack over the GPU, and (iii) 19 (178.9) times speedup per memory stack over an ideal model of the GPU, which only accounts for the overhead of data movement.

Ross Gegan, Christina Mao, Dipak Ghosal, Matt Bishop, Sean Peisert, "Anomaly Detection for Science DMZ Using System Performance Data", Proceedings of the 2020 IEEE International Conference on Computing, Networking and Communications (ICNC 2020), Big Island, HI, February 2020, doi: 10.1109/ICNC47757.2020.9049695

Levermann, A., Winkelmann, R., Albrecht, T., Goelzer, H., Golledge, N. R., Greve, R., Huybrechts, P., Jordan, J., Leguy, G., Martin, D., Morlighem, M., Pattyn, F., Pollard, D., Quiquet, A., Rodehacke, C., Seroussi, H., Sutter, J., Zhang, T., Van Breedam, J., Calov, R., DeConto, R., Dumas, C., Garbe, J., Gudmundsson, G. H., Hoffman, M. J., Humbert, A., Kleiner, T., Lipscomb, W. H., Meinshausen, M., Ng, E., Nowicki, S. M. J., Perego, M., Price, S. F., Saito, F., Schlegel, N.-J., Sun, S., van de Wal, R. S. W, "Projecting Antarctica’s contribution to future sea level rise from basal ice shelf melt using linear response functions of 16 ice sheet models (LARMIP-2)", Earth System Dynamics, February 14, 2020, 11:35–76, doi: 10.5194/esd-11-35-2020

Nan Ding, Samuel Williams, Yang Liu, Xiaoye S. Li, "Leveraging One-Sided Communication for Sparse Triangular Solvers", 2020 SIAM Conference on Parallel Processing for Scientific Computing, February 14, 2020,

Drew Paine, Charlotte P. Lee, "Coordinative Entities: Forms of Organizing in Data Intensive Science", Journal of Computer Supported Cooperative Work, February 11, 2020, doi: 10.1007/s10606-020-09372-2

Yang Liu, Xin Xing, Han Guo, Eric Michielssen, Pieter Ghysels, Xiaoye Sherry Li, "Butterfly factorization via randomized matrix-vector multiplications", arxiv e-preprint, February 9, 2020,

Revathi Jambunathan, Deborah Levin, "A Self-Consistent Open Boundary Condition for Fully Kinetic Plasma Thruster Plume Simulations", IEEE Transactions on Plasma Science, February 7, 2020, doi: 10.1109/TPS.2020.2968887

Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Rob Egan, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Kathy Yelick, UPC++: A PGAS/RPC Library for Asynchronous Exascale Communication in C++, Tutorial at Exascale Computing Project (ECP) Annual Meeting 2020, February 6, 2020,

UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The UPC++ API offers low-overhead one-sided RMA communication and Remote Procedure Calls (RPC), along with futures and promises. These constructs enable the programmer to express dependencies between asynchronous computations and data movement. UPC++ supports the implementation of simple, regular data structures as well as more elaborate distributed data structures where communication is fine-grained, irregular, or both. The library’s support for asynchrony enables the application to aggressively overlap and schedule communication and computation to reduce wait times.

UPC++ is highly portable and runs on platforms from laptops to supercomputers, with native implementations for HPC interconnects. As a C++ library, it interoperates smoothly with existing numerical libraries and on-node programming models (e.g., OpenMP, CUDA).

In this tutorial we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through basic algorithm implementations. We will also look at irregular applications and show how they can take advantage of UPC++ features to optimize their performance.

Event page

Jack Deslippe, Guiding Optimization with the Roofline Model, ECP Annual Meeting, February 2020,

Charlene Yang, Hierarchical Roofline Analysis on CPUs, ECP Annual Meeting, February 2020,

Samuel Williams, Roofline on GPUs (Advanced Topics), ECP Annual Meeting, February 2020,

Charlene Yang, Hierarchical Roofline Analysis on GPUs, ECP Annual Meeting, February 2020,

Samuel Williams, Introduction to the Roofline Model, ECP Annual Meeting, February 2020,

Brandon Runnels, Vinamra Agrawal, Weiqun Zhang, Ann Almgren, "Massively parallel finite difference elasticity using a block-structured adaptive mesh refinement with a geometric multigrid solver", submitted for publication, 2020,

Nicholas Z. Liu, Daniel R. Ladiges, Jason Nassios, John E. Sader, "Acoustic flows in a slightly rarefied gas", Physical Review Fluids, February 4, 2020, 5:043401,

Amir Kamil, John Bachan, Dan Bonachea, Paul H. Hargrove, Erich Strohmaier and Daniel Waters, "UPC++: Asynchronous RMA and RPC Communication for Exascale Applications", Poster at Exascale Computing Project (ECP) Annual Meeting 2020, February 2020,

Paul H. Hargrove, Dan Bonachea, "GASNet-EX: RMA and Active Message Communication for Exascale Programming Models", Poster at Exascale Computing Project (ECP) Annual Meeting 2020, February 2020,

Suren Byna, M. Scot Breitenfeld, Bin Dong, Quincey Koziol, Elena Pourmal, Dana Robinson, Jerome Soumagne, Houjun Tang, Venkatram Vishwanath, and Richard Warren, "ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems", Journal of Computer Science and Technology 2020, 35(1): 145-160, February 2, 2020, doi: 10.1007/s11390-020-9822-9

S Abimbola, B Mohammed, M Sibusiso, IU Awan, and Jules Pagna Disso, "A Framework for Distributed Denial of Service Attack Detection and Reactive Countermeasure in Software Defined Network", 2019 7th IEEE International Conference on Future Internet of Things and Cloud (FiCloud), January 30, 2020, doi: 10.1109/FiCloud.2019.00019

Sergi Molins, Cyprien Soulaine, Nikolaos I. Prasianakis, Aida Abbasi, Philippe Poncet, Anthony J. C. Ladd, Vitalii Starchenko, Sophie Roman, David Trebotich, Hamdi Tchelepi, Carl I. Steefel, "Simulation of mineral dissolution at the pore scale with evolving fluid-solid interfaces: review of approaches and benchmark problem set", Computational Geosciences, January 23, 2020, doi: 10.1007/s10596-019-09903-x

Katherine Yelick, Aydın Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi, Cristina Teodoropol, Leonid Oliker, "The parallelism motifs of genomic data analysis", Philosophical Transactions of The Royal Society A: Mathematical, Physical and Engineering Sciences, 2020,

F. Alexander, A. Almgren, J. Bell, A. Bhattacharjee, J. Chen, P. Colella, D. Daniel, J. DeSlippe, L. Diachin, E. Draeger, A. Dubey, T. Dunning, T. Evans, I. Foster, M. Francois, T. Germann, M. Gordon, S. Habib, M. Halappanavar, S. Hamilton, W. Hart, Z. Huang, A. Hungerford, D. Kasen, P. Kent, T. Kolev, D. Kothe, A. Kronfeld, Y. Luo, P. Mackenzie, D. McCallen, B. Messer, S. Mniszewski, C. Oehmen, A. Perazzo, D. Perez, D. Richard, W. Rider, R. Rieben, K. Roche, A. Siegel, M. Sprague, C. Steefel, R. Stevens, M. Syamlal, M. Taylor, J. Turner, J.-L. Vay, A. Voter, T. Windus and K. Yelick, "Exascale applications: skin in the game", Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2020,

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, "Life Course as a Contextual System to Investigate the Effects of Life Events, Gender, and Generation on Travel Mode Use", Transportation Research Board (TRB) 99th Annual Meeting, 2020,

R. Van Beeumen, G. D. Kahanamoku-Meyer, N. Y. Yao and C. Yang, "A scalable matrix-free iterative eigensolver for studying many-body localization", HPCAsia2020: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, ACM, January 7, 2020, 179-187, doi: 10.1145/3368474.3368497

James R.G. Parkinson, Daniel F. Martin, Andrew J. Wells, Richard F. Katz, "Modelling binary alloy solidification with adaptive mesh refinement", Journal of Computational Physics: X, January 7, 2020, 5, doi: 10.1016/j.jcpx.2019.100043

Petar Hristov, Gunther H. Weber, Hamish A. Carr, Oliver R\ ubel, James P. Ahrens, "Data Parallel Hypersweeps for In Situ Topological Analysis", Proceedings of the 10th IEEE Symposium on Large Data Analysis and Visualization (LDAV), 2020, 12--21, doi: 10.1109/LDAV51489.2020.00008

Hamish A. Carr, Julien Tierney, Gunther H. Weber, "Pathological and Test Cases For Reeb Analysis", Mathematics and Visualization, (Springer International Publishing: 2020) Pages: 103--120 doi: 10.1007/978-3-030-43036-8_7

Jonas Lukasczyk, Christoph Garth, Gunther H. Weber, Tim Biedert, Ross Maciejewski, Heike Leitte, "Dynamic Nested Tracking Graphs", IEEE Transactions on Visualization and Computer Graphics (Proceedings IEEE VIS 2019), 2020, 26:249--258, doi: 10.1109/TVCG.2019.2934368

Sugeerth Murugesan, Kristofer Bouchard, Jesse Brown, Mariam Kiran, Dan Lurie, Bernd Hamann, Gunther H. Weber, "State-based Network Similarity Visualization", Information Visualization, 2020, 19:96--113, doi: 10.1177/1473871619882019

Anna-Pia Lohfink, Florian Wetzels, Jonas Lukasczyk, Gunther H. Weber, Christoph Garth, "Fuzzy Contour Trees: Alignment and Joint Layout of Multiple Contour Trees", Computer Graphics Forum (Special Issue, Proceedings Eurographics/IEEE Symposium on Visualization), 2020, 39:343--355, doi: 10.1111/cgf.13985

H. Childs, S. Ahern, J. Ahrens, A. C. Bauer, J. Bennett, E. W. Bethel, P.-T. Bremer, E. Brugger, J. Cottam, M. Dorier, S. Dutta, J. Favre, T. Fogal, S. Frey, C. Garth, B. Geveci, W. F. Godoy, C. D. Hansen, C. Harrison, B. Hentschel, J. Insley, C. Johnson, S. Klasky, A. Knoll, J. Kress, M. Larsen, J. Lofstead, K.-L. Ma, P. Malakar, J. Meredith, K. Moreland, P. Navratil, P. O Leary, M. Parashar, V. Pascucci, J. Patchett, T. Peterka, S. Petruzza, N. Podhorszki, D. Pugmire, M. Rasquin, S. Rizzi, D. H. Rogers, S. Sane, F. Sauer, R. Sisneros, H.-W. Shen, W. Usher, R. Vickery, V. Vishwanath, I. Wald, R. Wang, G. H. Weber, B. Whitlock, M. Wolf, H. Yu, S. B. Ziegler, "A Terminology for In Situ Visualization and Analysis Systems", International Journal of High Performance Computing Applications, 2020, 34:676--691, doi: 10.1177/1094342020935991

Gang Huang, Yilun Xu, Ravi Naik, Bradley Mitchell, David Santiago, Irfan Siddiqi, Qubit fast reset with QubiC, Bulletin of the American Physical Society, 2020,

Yilun Xu, Gang Huang, Jan Balewski, Ravi Naik, Alexis Morvan, Bradley Mitchell, Kasra Nowrouzi, David I Santiago, Irfan Siddiqi, QubiC: An open source FPGA-based control and measurement system for superconducting quantum information processors, arXiv preprint arXiv:2101.00071, 2020,

Srivatsan Chakram, Andrew E Oriani, Ravi K Naik, Akash V Dixit, Kevin He, Ankur Agrawal, Hyeokshin Kwon, David I Schuster, Seamless high-Q microwave cavities for multimode circuit QED, arXiv preprint arXiv:2010.16382, 2020,

Srivatsan Chakram, Kevin He, Akash V Dixit, Andrew E Oriani, Ravi K Naik, Nelson Leung, Hyeokshin Kwon, Wen-Long Ma, Liang Jiang, David I Schuster, Multimode photon blockade, arXiv preprint arXiv:2010.15292, 2020,

Yilun Xu, Gang Huang, Ravi Naik, Bradley Mitchell, David Santiago, Irfan Siddiqi, Automatic single qubit characterization with QubiC, Bulletin of the American Physical Society, 2020,

Bradley Mitchell, Ravi Naik, Akel Hashim, John Mark Kreikebaum, Irfan Siddiqi, Cross-resonance Dynamics with Tunable Transmon Qubits, Bulletin of the American Physical Society, 2020,

Ravi Naik, Bradley Mitchell, Akel Hashim, John Mark Kreikebaum, Irfan Siddiqi, Fidelity Optimization of the Cross-resonance Gate on a Multi-qubit Quantum Processor, Bulletin of the American Physical Society, 2020,

Akel Hashim, Kasra Nowrouzi, Alexis Morvan, Ravi Naik, John Mark Kreikebaum, Irfan Siddiqi, Experimental Realization of Randomized Compiling for in-situ Error Reduction, Bulletin of the American Physical Society, 2020,

Gang Huang, Yilun Xu, Ravi Naik, Bradley Mitchell, David Santiago, Irfan Siddiqi, QubiC-An open FPGA based Qubit Control system, Bulletin of the American Physical Society, 2020,

Akash Dixit, Srivatsan Chakram, Ankur Agrawal, Ravi Naik, David Schuster, Aaron Chou, Using Superconducting Qubits for Axion Dark Matter Detection, Bulletin of the American Physical Society, 2020,

Aziza Suleymanzade, Alexander Anferov, Mark Stone, Ravi K Naik, Andrew Oriani, Jonathan Simon, David Schuster, "A tunable high-Q millimeter wave cavity for hybrid circuit and cavity QED experiments", Applied Physics Letters, 2020, 116:104001, doi: 10.1063/1.5137900

H Chang, J J Donatelli, P Enfedaque, G Freychet, M Haranczyk, A Hexemer, Z Hu, O Jain, H Krishnan, D Kumar, X Li, L Lin, M MacNeil, S Marchesini, X Mo, M Noack, K Pande, R Pandolfi, D Parkinson, D M Pelt, T Perciano, D A Shapiro, D Ushizima, C Yang, P H Zwart, J A Sethian, "Building Mathematics, Algorithms, and Software for Experimental Facilities", Handbook on Big Data and Machine Learning in the Physical Sciences, ( 2020) Pages: 189--240 doi: 10.1142/9789811204579_0012

Talita Perciano, Colleen Heinemann, David Camp, Brenton Lessley, E Wes Bethel, "Shared-Memory Parallel Probabilistic Graphical Modeling Optimization: Comparison of Threads, OpenMP, and Data-Parallel Primitives", High Performance Computing, Cham, Springer International Publishing, 2020, 127--145, doi: 10.1007/978-3-030-50743-5_7

Stefano Marchesini, Anuradha Trivedi, Pablo Enfedaque, Talita Perciano, Dilworth Parkinson, "Sparse Matrix-Based HPC Tomography", Computational Science -- ICCS 2020, Cham, Springer International Publishing, 2020, 248--261, doi: 10.1007/978-3-030-50371-0_18

E. Wes Bethel, David Camp, Talita Perciano, Colleen Heinemann, "Performance Analysis of Traditional and Data-Parallel Primitive Implementations of Visualization and Analysis Kernels", Berkeley, CA, USA, 94720, 2020,

A Sangodoyin, B Mohammed, lU Awan, "Data driven Machine Learning approach to detect DDoS attack in Software Defined Network", Journal of Concurrency and Computation: Practice and Experience, January 1, 2020,

William F Godoy, Norbert Podhorszki, Ruonan Wang, Chuck Atkins, Greg Eisenhauer, Junmin Gu, Philip Davis, Jong Choi, Kai Germaschewski, Kevin Huck, others, ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management, SoftwareX, Pages: 100561 2020,

Jeremy Logan, Mark Ainsworth, Chuck Atkins, Jieyang Chen, Jong Choi, Junmin Gu, James Kress, Greg Eisenhauer, Berk Geveci, William Godoy, others, Extending the Publish/Subscribe Abstraction for High-Performance I/O and Data Management at Extreme Scale, Data Engineering, 2020,

Jeeyung Kim, Alex Sim, Jinoh Kim, Kesheng Wu, Botnet Detection Using Recurrent Variational Autoencoder, arXiv preprint arXiv:2004.00234, 2020,

JM Kreikebaum, KP O Brien, A Morvan, I Siddiqi, "Improving wafer-scale Josephson junction resistance variation in superconducting quantum coherent circuits", Superconductor Science and Technology, 2020, 33, doi: 10.1088/1361-6668/ab8617

LS Martin, WP Livingston, S Hacohen-Gourgy, HM Wiseman, I Siddiqi, "Implementation of a canonical phase measurement with quantum feedback", Nature Physics, 2020, 16:1046--1049, doi: 10.1038/s41567-020-0939-0

E Flurin, LS Martin, S Hacohen-Gourgy, I Siddiqi, "Using a Recurrent Neural Network to Reconstruct Quantum Dynamics of a Superconducting Qubit from Physical Observations", Physical Review X, 2020, 10, doi: 10.1103/PhysRevX.10.011006

S Schaal, I Ahmed, JA Haigh, L Hutin, B Bertrand, S Barraud, M Vinet, C-M Lee, N Stelmashenko, JWA Robinson, JY Qiu, S Hacohen-Gourgy, I Siddiqi, MF Gonzalez-Zalba, JJL Morton, "Fast Gate-Based Readout of Silicon Quantum Dots Using Josephson Parametric Amplification.", Physical review letters, 2020, 124:067701, doi: 10.1103/physrevlett.124.067701

Anna Giannakou, Dipankar Dwivedi, Sean Peisert, "A Machine Learning Approach for Packet Loss Prediction in ScienceFlows", Future Generation Computer Systems, January 2020, 102:190-197, doi: 10.1016/j.future.2019.07.053

2019

Reinhard Gentz, Sean Peisert, "An Examination and Survey of Random Bit Flips and Scientific Computing", Trusted CI Report, December 20, 2019,

D. Fan, A. Nonaka, A. S. Almgren, D. E. Willcox, A. Harpole, and M. Zingale, "MAESTROeX: A Massively Parallel Low Mach Number Astrophysical Solver", Journal of Open Source Software, December 19, 2019,

Timothy T Duignan, Gregory Schenter, John L Fulton, Thomas Huthwelker, Mahalingam Balasubramanian, Mirza Galib, Marcel Baer, Jan Wilhelm, Jürg Hutter, Mauro Del Ben, Xiu Song Zhao, Christopher Jay Mundy, "Quantifying the Hydration Structure of Sodium and Potassium Ions: Taking Additional Steps on Jacob's Ladder", Physical Chemistry Chemical Physics, December 19, 2019, doi: 10.1039/C9CP06161D

D. Fan, A. Nonaka, A.S. Almgren, A. Harpole, M. Zingale, "MAESTROeX: A Massively Parallel Low Mach Number Astrophysical Solver", The Astrophysical Journal, December 19, 2019,

Houjun Tang, Suren Byna, Stephen Bailey, Zarija Lukic, Jialin Liu, Quincey Koziol, Bin Dong, "Tuning Object-centric Data Management Systems for Large Scale Scientific Applications", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Richard Warren, Jerome Soumagne, Jingqing Mu, Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Analysis in the Data Path of an Object-centric Data Management System", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Wei Zhang, Suren Byna, Chenxu Niu, Yong Chen, "Exploring Metadata Search Essentials for Scientific Data Management", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 17, 2019,

Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Kathy Yelick, UPC++ Tutorial, National Energy Research Scientific Computing Center (NERSC), December 16, 2019,

This event was a repeat of the tutorial delivered on November 1, but with the restoration of the hands-on component which was omitted due to uncertainty surrounding the power outage at NERSC.

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. UPC++ provides mechanisms for low-overhead one-sided communication, moving computation to data through remote-procedure calls, and expressing dependencies between asynchronous computations and data movement. It is particularly well-suited for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces are designed to be composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds.

In this tutorial we introduced basic concepts and advanced optimization techniques of UPC++. We discussed the UPC++ memory and execution models and walked through implementing basic algorithms in UPC++. We also discussed irregular applications and how to take advantage of UPC++ features to optimize their performance. The tutorial included hands-on exercises with basic UPC++ constructs. Registrants were given access to run their UPC++ exercises on NERSC’s Cori (currently the #14 fastest computer in the world).

Event page

tutorial 2019 12

 

Schneider, Joseph D.; Domann, John P.; Panduranga, M. K.; Tiwari, Sidhant; Shirazi, Paymon; Yao, Zhi Jackie; Sennott, Casey; Shahan, David; Selvin, Skyler; McKnight, Geoff, et al., "Experimental demonstration and operating principles of a multiferroic antenna", Journal, December 14, 2019, 126:224104, doi: 10.1063/1.5126047

Hans Johansen, Daniel Martin, Esmond Ng, "High-resolution Treatment of Topography and Grounding Line Dynamics in BISICLES", AGU 2019 Fall Meeting, December 13, 2019,

Yao, Zhi Jackie; Tiwari, Sidhant; Lu, Ting; Rivera, Jesse; Luong, Kevin Q. T.; Candler, Robert N.; Carman, Gregory P.; Wang, Yuanxun Ethan, "Modeling of multiple dynamics in the radiation of bulk acoustic wave (BAW) antennas", Journal, December 13, 2019, 5:7-20, doi: 10.1109/JMMCT.2019.2959596

Daniel F. Martin, James Parkinson, Andrew Wells, Richard Katz, "3D convection, phase change, and solute transport in mushy sea ice", AGU 2019 Fall Meeting, December 12, 2019,

Samuel Kachuck, Daniel Martin, Jeremy Bassis, Stephen Price, "Rapid viscoelastic deformation slows marine ice sheet instability at Pine Island Glacier", AGU 2019 Fall Meeting, December 10, 2019,

A. Lazar, A. Ballow, L. Jin, C. A. Spurlock, A. Sim, K. Wu, "Machine Learning for Prediction of Mid to LongTerm Habitual Transportation Mode Use", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019, doi: 10.1109/BigData47090.2019.9006411

Chaincy Kuo, Daniel Feldman, Daniel Martin, "Quantification of seasonal heat retention by sea-ice: calculations from analytic surface-energy balance", AGU Fall Meeting 2019, December 9, 2019,

M Kiran, B Mohammed and N. Krishnaswamy, "DeepRoute: Herding Elephant and Mice Flows with Reinforcement Learning", 2nd IFIP International Conference on Machine Learning for Networking (MLN'2019), December 2, 2019, doi: 10.1007/978-3-030-45778-5_20

Amir Teshome Wonjiga, Louis Rilling, Christine Morin, Sean Peisert, "Blockchain as a Trusted Component in Cloud SLA Verification", Proceedings of the International Workshop on Cloud, IoT and Fog Security (CIFS), co-located with the 12th IEEE/ACM International Conference on Utility and Cloud Computing (UCC), Auckland, New Zealand, December 2019, 93-100, doi: 10.1145/3368235.3368872

Yong Shi, Daniel R. Ladiges, John E. Sader, "Origin of spurious oscillations in lattice Boltzmann simulations of oscillatory noncontinuum gas flows", Physical Review E, November 25, 2019, 100,

Tirthak Patel, Suren Byna, Glenn K. Lockwood, Devesh Tiwari, "Revisiting I/O Behavior in Large-Scale Storage Systems: The Expected and the Unexpected", Supercomputing 2019 (SC19), November 24, 2019, doi: 10.1145/3295500.3356183

Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas, "Comparison of Array Management Library Performance - A Neuroscience Use Case", SC19 Poster, November 20, 2019,

George Michelogiannakis, Bandwidth Steering in HPC Using Silicon Nanophotonics, SC19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, November 20, 2019,

George Michelogiannakis, Yiwen Shen, Min Yeh Teh, Xian Meng, Benjamin Aivazi, Taylor Groves, John Shalf, Madeleine Glick, Manya Ghobadi, Larry Dennison, Keren Bergman, "Bandwidth Steering in HPC Using Silicon Nanophotonics", SC19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2019,

Tuowen Zhao, Mary Hall, Samuel Williams, Hans Johansen, "Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs", Supercomputing (SC), November 2019,

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Glenn K. Lockwood, Shane Snyder, Suren Byna, Philip Carns, Nicholas J. Wright, "Understanding Data Motion in the Modern HPC Data Center", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00012

Megha Agarwal, Divyansh Singhvi, Preeti Malakar, Suren Byna, "Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00007

Houjun Tang, Quincey Koziol, Suren Byna, John Mainzer, Tonglin Li, "Enabling Transparent Asynchronous I/O using Background Threads", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW 2019), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00006

Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen, "MIQS: Metadata Indexing and erying Service for Self-Describing File Formats", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 19, 2019,

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, Life course as a contextual system to investigate the effects of life events, gender and generation on travel mode usage, The Behavior, Energy & Climate Change Conference (BECC), 2019,

Benjamin A. Brock, Yuxin Chen, Jiakun Yan, John Owens, Aydın Buluç, Katherine Yelick, "RDMA vs. RPC for Implementing Distributed Data Structures", Proceedings of the 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3), Denver, CO, USA, IEEE, November 18, 2019, doi: 10.1109/IA349570.2019.00009

Distributed data structures are key to implementing scalable applications for scientific simulations and data analysis. In this paper we look at two implementation styles for distributed data structures: remote direct memory access (RDMA) and remote procedure call (RPC). We focus on operations that require individual accesses to remote portions of a distributed data structure, e.g., accessing a hash table bucket or distributed queue, rather than global operations in which all processors collectively exchange information. We look at the trade-offs between the two styles through microbenchmarks and a performance model that approximates the cost of each. The RDMA operations have direct hardware support in the network and therefore lower latency and overhead, while the RPC operations are more expressive but higher cost and can suffer from lack of attentiveness from the remote side. We also run experiments to compare the real-world performance of RDMA- and RPC-based data structure operations with the predicted performance to evaluate the accuracy of our model, and show that while the model does not always precisely predict running time, it allows us to choose the best implementation in the examples shown. We believe this analysis will assist developers in designing data structures that will perform well on current network architectures, as well as network architects in providing better support for this class of distributed data structures.

Nan Ding, Samuel Williams, An Instruction Roofline Model for GPUs, Performance Modeling, Benchmarking, and Simulation (PMBS), BEST PAPER AWARD, November 18, 2019,

Nan Ding, Samuel Williams, "An Instruction Roofline Model for GPUs", Performance Modeling, Benchmarking, and Simulation (PMBS), BEST PAPER AWARD, November 18, 2019,

Oguz Selvitopi, Cevdet Aykanat, "Regularizing irregularly sparse point-to-point communications", SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, November 2019, doi: 10.1145/3295500

Paul H. Hargrove, Dan Bonachea, "Efficient Active Message RMA in GASNet Using a Target-Side Reassembly Protocol (Extended Abstract)", IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), Lawrence Berkeley National Laboratory Technical Report, November 17, 2019, LBNL 2001238, doi: 10.25344/S4PC7M

GASNet is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models on future exascale machines. This paper investigates strategies for efficient implementation of GASNet’s “AM Long” API that couples an RMA (Remote Memory Access) transfer with an Active Message (AM) delivery.
We discuss several network-level protocols for AM Long and propose a new target-side reassembly protocol. We present a microbenchmark evaluation on the Cray XC Aries network hardware. The target-side reassembly protocol on this network improves AM Long end-to-end latency by up to 33%, and the effective bandwidth by up to 49%, while also enabling asynchronous source completion that drastically reduces injection overheads.
The improved AM Long implementation for Aries is available in GASNet-EX release v2019.9.0 and later.

George Papadimitriou, Mariam Kiran, Cong Wang, Anirban Mandal, Ewa Deelman, "Training Classifiers to Identify TCP Signatures in Scientific Workflows", INDIS, SC19, November 14, 2019,

B Mohammed, M Kiran, N Krishnaswamy, "DeepRoute on Chameleon: Experimenting with Large-scale Reinforcement Learning and SDN on Chameleon Testbed", IEEE 27th International Conference on Network Protocols (ICNP), IEEE, November 14, 2019, 1-2, doi: 10.1109/ICNP.2019.8888090

Khaled Ibrahim, Samuel Williams, Leonid Oliker, "Performance Analysis of GPU Programming Models using the Roofline Scaling Trajectories", International Symposium on Benchmarking, Measuring and Optimizing (Bench), BEST PAPER AWARD, November 2019,

Dan Bonachea, Paul H. Hargrove, "GASNet-EX: A High-Performance, Portable Communication Library for Exascale", LNCS 11882: Proceedings of Languages and Compilers for Parallel Computing (LCPC'18), edited by Hall M., Sundar H., November 2019, 11882:138-158, doi: 10.1007/978-3-030-34627-0_11

Partitioned Global Address Space (PGAS) models, typified by such languages as Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.

GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in future exascale machines. The library is an evolution of the popular GASNet communication system, building upon over 15 years of lessons learned. We describe and evaluate several features and enhancements that have been introduced to address the needs of modern client systems. Microbenchmark results demonstrate the RMA performance of GASNet-EX is competitive with several MPI-3 implementations on current HPC systems.

Christine T. Wolf, Julia Bullard, Stacy Wood, Amelia Acker, Drew Paine, Charlotte P. Lee, "Mapping the “How” of Collaborative Action: Research Methods for Studying Contemporary Sociotechnical Processes", CSCW '19: Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing, November 10, 2019, doi: 10.1145/3311957.3359441

Process has been a concern since the beginning of CSCW. Developments in sociotechnical landscapes raise new challenges for studying processes (e.g., massive online communities bringing together vast crowds; Big Data technologies connecting many through the flow of data). This re-opens questions about how we study, document, conceptualize, and design to support processes in complex, contemporary sociotechnical systems. This one-day workshop will bring together researchers to discuss the CSCW community’s unique focus and methodological toolkit for studying process and workflow; provide a collaborative space for the improvement and extension of research projects within this space; and catalyze a network of scholars with expertise and interest in addressing challenging methodological questions around studying process in contemporary, sociotechnical systems.

Mahdi Jamei, Raksha Ramakrishna, Teklemariam Tesfay, Reinhard Gentz, Ciaran Roberts, Anna Scaglione, Sean Peisert, "Phasor Measurement Units Optimal Placement and Performance Limits for Fault Localization", IEEE Journal on Selected Areas in Communications (J-SAC), Special Issue on Communications and Data Analytics in Smart Grid, November 6, 2019, 38(1):180-192, doi: 10.1109/jsac.2019.2951971

J. Balcas, H. Newman, M. Spiropulu, X. Yang, T. Lehman, I. Monga, C. Guok, J. MacAuley, A. Sim, P. Demar, "SDN for End-to-End Networking at Exascale", the 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP2019), 2019,

Patricia Gonzalez-Guerrero, Mircea R. Stan, "Asynchronous Stochastic Computing", 53rd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, IEEE, November 3, 2019, doi: 10.1109/IEEECONF44664.2019.9049011

Asynchronous Stochastic Computing (ASC) leverages Synchronous Stochastic Computing (SSC) advantages and addresses its drawbacks. In SSC a multiplier is a single AND gate, saving ~ 90% of power and area compared with a typical 8bit binary multiplier. The key for SSC power-area efficiency comes from mapping numbers to streams of 1s and 0s. Despite the power-area efficiency, SSC drawbacks such as long latency, costly clock distribution network (CDN), and expensive stream generation, causes the energy consumption to grow prohibitively large. In this work, we introduce the foundations for ASC using Continuous-time-Markov-chains, and analyze the computing error due to random fluctuations. In ASC data is mapped to asynchronous-continuous-time streams, which yields two advantages over the synchronous counterpart: (1) CDN elimination, and (2) better accuracy performance. We compare ASC with SSC for three applications: (1) multiplication, (2) an image processing algorithm: gamma-correction, and (3) a singlelayer of a fully-connected artificial-neural-network (ANN) using a FinFET1X technology. Our Matlab, Spice-level simulations and post-place&route (P&R) reports demonstrate that ASC yields savings of 10%-55%, 33%-44%, and 50% in latency, power, and energy respectively. These savings make ASC a good candidate to address the ultra-low-power requirements of machine learning for the IoT.

Marzieh Lenjani, Patricia Gonzalez, Elaheh Sadredini, M Arif Rahman, Mircea R Stan, "An overflow-free quantized memory hierarchy in general-purpose processors", International Symposium on Workload Characterization (IISWC), Orlando, FL, USA, IEEE, November 3, 2019, doi: 10.1109/IISWC47752.2019.9042035

Data movement comprises a significant portion of energy consumption and execution time in modern applications. Accelerator designers exploit quantization to reduce the bitwidth of values and reduce the cost of data movement. However, any value that does not fit in the reduced bitwidth results in an overflow (we refer to these values as outliers). Therefore accelerators use quantization for applications that are tolerant to overflows. We observe that in most applications the rate of outliers is low and values are often within a narrow range, providing the opportunity to exploit quantization in general-purpose processors. However, a software implementation of quantization in general-purpose processors has three problems. First, the programmer has to manually implement conversions and the additional instructions that quantize and dequantize values, imposing a programmer's effort and performance overhead. Second, to cover outliers, the bitwidth of the quantized values often become greater than or equal to the original values. Third, the programmer has to use standard bitwidth; otherwise, extracting non-standard bitwidth (i.e., 1-7, 9-15, and 17-31) for representing narrow integers exacerbates the overhead of software-based quantization. The key idea of this paper is to propose a hardware support in the memory hierarchy of general-purpose processors for quantization, which represents values by few and flexible numbers of bits and stores outliers in their original format in a separate space, preventing any overflow. We minimize metadata and the overhead of locating quantized values using a software-hardware interaction that transfers quantization parameters and data layout to hardware. As a result, our approach has three advantages over cache compression techniques: (i) less metadata, (ii) higher compression ratio for floating-point values and cache blocks with multiple data types, and (iii) lower overhead for locating the compressed blocks. It delivers on average 1.40/1.45/1.56× speedup and 24/26/30% energy reduction compared to a baseline that uses full-length variables in a 4/8/16-core system. Our approach also provides 1.23× speedup, in a 4-core system, compared to the state of the art cache compression techniques and adds only 0.25% area overhead to the baseline processor.

Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Rob Egan, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Kathy Yelick, UPC++ Tutorial, National Energy Research Scientific Computing Center (NERSC), November 1, 2019,

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. UPC++ provides mechanisms for low-overhead one-sided communication, moving computation to data through remote-procedure calls, and expressing dependencies between asynchronous computations and data movement. It is particularly well-suited for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces are designed to be composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds.

In this tutorial we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through implementing basic algorithms in UPC++. We will also look at irregular applications and how to take advantage of UPC++ features to optimize their performance.

Event Page

 

L. Yang, Z. Wen, C. Yang and Y. Zhang, "`Block Algorithms with Augmented Rayleigh-Ritz Projections for Large-Scale Eigenpair Computation", Journal of Computational Mathematics, November 1, 2019, 37:889-915, doi: 10.4208/jcm.1910-m2019-0034

Pooria Mohammadiyaghni, George Michelogiannakis, Paul V. Gratz, "SpecLock: Speculative Lock Forwarding", International Conference on Computer Design (ICCD), November 2019,

Mark Adams, Stephen Cornford, Daniel Martin, Peter McCorquodale, "Composite matrix construction for structured grid adaptive mesh refinement", Computer Physics Communications, November 2019, 244:35-39, doi: 10.1016/j.cpc.2019.07.006

Alexandra Ballow, Alina Lazar (Advisor), Alex Sim (Advisor), Kesheng Wu (Advisor), "Handling Missing Values in Joint Sequence Analysis", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2019), ACM Student Research Competition (SRC), First place winner, 2019,

Thomas W. Edgar, Aditya Ashok, Garret E. Seppala, K.M. Arthur-Durrett, M. Engels, Reinhard Gentz, Sean Peisert, "An Automated Disruption-Tolerant Key Management Framework for Critical Systems", Journal of Information Warfare, October 8, 2019, 18(4):85-124, doi: https://www.jinfowar.com/journal/volume-18-issue-4/automated-disruption-tolerant-device-authentication-key-management-framework-critical-systems

Brandon Krull, Michael Minion, "Parallel-In-Time Magnus Integrators", SIAM Journal on Scientific Computing, October 1, 2019,

Patricia Gonzalez-Guerrero, Tommy Tracy II, Xinfei Guo, Mircea R Stan, "Towards low-power random forest using asynchronous computing with streams", Tenth International Green and Sustainable Computing Conference (IGSC), EEE Computer Society, October 1, 2019, doi: 10.1109/IGSC48788.2019.8957193

We propose a sensor architecture for the internet of things (IoT), smartdust or edge-intelligence (EI) that combines near-analog-memory (NAM) processing and asynchronous computing with streams (ACS) addressing the need for machine learning (ML) capabilities at low power budgets. In ACS an analog value is mapped to an asynchronous stream that can take one of two values (vh, vl). This stream-based data representation enables area-power efficient computing units such as the multiplier implemented as an AND gate yielding savings in power of 90% compared with digital approaches. However, a major bottleneck for computing on streams, vision sensors, and NAM approaches is the cost of analog-to-digital (ADC) and digital-to-stream-to-digital converters. Our NAM-ACS architecture, simplifies the sensor and eliminates the need for the expensive conversions. The architecture is tailored for random forest (Raf), a ML algorithm, chosen for its ability to classify using a reduced number of features. Our simulations show that using an analog-memory array of 256 512, the power consumption of the ACS-core combined with the memory interface is comparable with the consumption of an ADC based memory interface, obtaining an accuracy of 83%.

Reinhard Gentz, Héctor García Martin, Edward Baidoo, Sean Peisert, "Workflow Automation in Liquid Chromatography Mass Spectrometry", Proceedings of the 15th IEEE International Conference on e-Science (eScience), San Diego, CA, IEEE, September 2019, doi: 10.1109/eScience.2019.00095

Reinhard Gentz, Sean Peisert, Joshua Boverhof, Daniel Gunter, "SPARCS: Stream-Processing Architecture applied in Real-time Cyber-physical Security", Proceedings of the 15th IEEE International Conference on e-Science (eScience), San Diego, CA, IEEE, September 2019, doi: 10.1109/eScience.2019.00028

Bin Wang, Rongxin Yin, Doug Black and Cy Chan, "Multistage and decentralized operations of electric vehicles within the California demand response markets", Decision Making Applications in Modern Power Systems, (Academic Press, Elsevier: September 21, 2019) Pages: 411-439 doi: https://doi.org/10.1016/B978-0-12-816445-7.00016-5

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2019, LBNL 2001236, doi: 10.25344/S4V30R

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

S Werner, P Fotouhi, X Xiao, M Fariborz, SJB Yoo, G Michelogiannakis, D Vasudevan, "3D photonics as enabling technology for deep 3D DRAM stacking", Proceedings of the International Symposium on Memory Systems - MEMSYS 19, ACM Press, September 2019, doi: 10.1145/3357526.3357559

Bin Wang, Cy Chan, Divya Somasi, Jane Macfarlane, Eric Rask, "Data-Driven Energy Use Estimation in Large Scale Transportation Networks", Proceedings of the 2nd ACM/EIGSCC Symposium on Smart Cities and Communities - SCC '19, ACM Press, September 10, 2019,

Revathi Jambunathan, Andrew Myers, Donald Willcox, Jean-Luc Vay, Ann Almgren, Diana Amorim, John Bell, Kevin Gott, Axel Huebl, Remi Lehe, Micahel Rowan, Olga Shapoval, Maxence Thevenet, Weiqun Zhang, "WarpX: Towards exascale modeling of pulsar magnetospheres", Connecting Micro and Macro Scales: Acceleration, Reconnection, and Dissipation in Astrophysical Plasmas, September 9, 2019,

Doru Thom Popovici, Devangi N. Parikh, Daniele G. Spampinato, Tze Meng Low, "Exploiting Symmetries of Small Prime-Sized DFTs", PPAM 2019, 2019,

Patricia Gonzalez-Guerrero, Stephen G Wilson, Mircea R Stan, "Error-latency Trade-off for Asynchronous Stochastic Computing with ΣΔ Streams for the IoT", International System-on-Chip Conference (SOCC), Singapore, IEEE, September 3, 2019, doi: 10.1109/SOCC46988.2019.1570548453

Asynchronous stochastic computing (ASC) using continuous-time-asynchronous ΣΔ modulators (SC-AΣΔM) has the potential to enable ultra-low-power, on-node machine learning algorithms for the next generation of sensors for the Internet of Things (IoT). Similar to synchronous stochastic computing (SSC 1 ), in SC-AΣΔM complex processing units can be implemented with simple gates because numbers are represented with streams. For example a multiplier is implemented with a XNOR gate, yielding savings in power and area of 90% compared with the typical binary approach. Previous work demonstrated that SC-AΣΔM leverages SSC advantages and addresses its drawbacks, achieving significant savings in energy, power and latency. In this work, we study a theoretical model to determine the fundamental limits of accuracy and computing time for SCAΣΔM. Since the ΣΔ streams are periodic the final computing error is non-zero and depends on the period of the input streams. We validate our theoretical model with Spice-level simulations and evaluate the power and energy consumption using a standard FinFet1X2 technology for two cases: 1) multiplication and 2) gamma correction, an image processing algorithm. Our work determines circuit design guidelines for SC-AΣΔM and shows that multiplication with SC-AΣΔM requires at least 6X less time than SSC. The latency reduction and novel architecture positively impacts the overall energy consumption in the IoT node, enabling savings in energy of 79% compared with the binary approach.

Timur Takhtaganov, Zarija Lukić, Juliane Mueller, Dmitriy Morozov, "Cosmic Inference: Constraining Parameters With Observations and Highly Limited Number of Simulations", Astrophysical Journal (in review), 2019,

Melissa Stockman, Dipankar Dwivedi, Reinhard Gentz, Sean Peisert, "Detecting Programmable Logic Controller Code Using Machine Learning", International Journal of Critical Infrastructure Protection, September 2019, vol. 26,, doi: 10.1016/j.ijcip.2019.100306

Charlene Yang, Thorsten Kurth, Samuel Williams, "Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC-9 Perlmutter system", Concurrency and Computation: Practice and Experience (CCPE), August 2019, doi: 10.1002/cpe.5547

Antoine Bambade, Kesheng Wu, "An Assessment of the Prediction Quality of VPIN", Advanced Analytics and Artificial Intelligence Applications, (IntechOpen: 2019)

M. Zingale, M.P. Katz, J.B. Bell, M.L. Minion, A.J. Nonaka, W. Zhang, "Improved Coupling of Hydrodynamics and Nuclear Reactions via Spectral Deferred Corrections", August 14, 2019,

J. Bell, M. Day, J. Goodman, R. Grout, M. Morzfeld, "A Bayesian approach to calibrating hydrogen flame kinetics using many experiments and parameters", Combustion and Flame, 2019,

A. J. Aspden, M. S. Day, J. B. Bell, "Towards the Distributed Burning Regime in Turbulent Premixed Flames", Journal of Fluid Mechanics, 2019, 871:1-21,

M. Zingale, K. Eiden, Y. Cavecchi, A. Harpole, J. B. Bell, M. Chang, I. Hawke, M. P. Katz, C.M. Malone, A. J. Nonaka, D. E. Willcox, W. Zhang, "Toward resolved simulations of burning fronts in thermonuclear X-ray bursts", Journal of Physics: Conference Series, 2019, 1225,

D. Dasgupta, W. Sun, M. Day, A. Aspden, T. Lieuwen, "Analysis of chemical pathways for n-dodecane/air turbulent premixed flames", August 10, 2019,

M. T. Henry de Frahan, S. Yellapantula, R. King, M. S. Day, R. W. Grout, "Deep learning for presumed probability density function models", August 10, 2019,

N. T. Wimer, M. S. Day, C. Lapointe, A. S. Makowiecki, J. F. Glusman, J. W. Daily, G. B. Rieker, P. E. Hamlington, "High-resolution numerical simulations of a large-scale helium plume using adaptive mesh refinement", August 10, 2019,

A. Donev, A. J. Nonaka, C. Kim, A. L. Garcia, J. B. Bell, "Fluctuating hydrodynamics of electrolytes at electroneutral scales", August 10, 2019,

D. R. Ladiges, A. J. Nonaka, J. B. Bell, A. L. Garcia, "On the Suppression and Distortion of Non-Equilibrium Fluctuations by Transpiration", Physics of Fluids, August 10, 2019,

L. Esclapez, V. Ricchiuti, J.B. Bell, M.S. Day, "A spectral deferred correction strategy for low Mach number flows subject to electric fields", August 10, 2019,

Knut Sverdrup, Ann S. Almgren, Nikolaos Nikiforakis, "An embedded boundary approach for efficient simulations of viscoplastic fluids in three dimensions", August 10, 2019,

Benjamin Brock, Aydın Buluç, Katherine Yelick, "BCL: A Cross-Platform Distributed Data Structures Library", ICPP 2019: Proceedings of the 48th International Conference on Parallel Processing, Kyoto, Japan, Association for Computing Machinery, August 2019, doi: 10.1145/3337821.3337912

One-sided communication is a useful paradigm for irregular parallel applications, but most one-sided programming environments, including MPI's one-sided interface and PGAS programming languages, lack application-level libraries to support these applications. We present the Berkeley Container Library, a set of generic, cross-platform, high-performance data structures for irregular applications, including queues, hash tables, Bloom filters and more. BCL is written in C++ using an internal DSL called the BCL Core that provides one-sided communication primitives such as remote get and remote put operations. The BCL Core has backends for MPI, OpenSHMEM, GASNet-EX, and UPC++, allowing BCL data structures to be used natively in programs written using any of these programming environments. Along with our internal DSL, we present the BCL ObjectContainer abstraction, which allows BCL data structures to transparently serialize complex data types while maintaining efficiency for primitive types. We also introduce the set of BCL data structures and evaluate their performance across a number of high-performance computing systems, demonstrating that BCL programs are competitive with hand-optimized code, even while hiding many of the underlying details of message aggregation, serialization, and synchronization.

Mauro Del Ben, Osni Marques, Andrew Canning, "Improved Unconstrained Energy Functional Method for Eigensolvers in Electronic Structure Calculations", Proceedings of the 48th International Conference on Parallel Processing, ACM, 2019, 73, doi: 10.1145/3337821.3337914

Patricia Gonzalez-Guerrero, Mircea R Stan, "Asynchronous Stream Computing for Low Power IoT", International Midwest Symposium on Circuits and Systems (MWSCAS), Dallas, TX, USA, IEEE, August 4, 2019, doi: 10.1109/MWSCAS.2019.8885388

Asynchronous circuits have many advantages over their synchronous counterparts in terms of robustness to parameter variations, wide supply voltage ranges, and potentially low power by not needing a clock, yet their promise has not been translated yet into commercial success due to several issues related to design methodologies and the need for handshake signals. Stochastic computing is another processing paradigm that has shown promises of low power and extremely compact circuits but has yet to become a commercial success mainly because of the need for a fast clock to generate the random streams. The Asynchronous Stream Processing circuits described in this paper combine the best features of asynchronous circuits (lack of clock, robustness) with the best features of stochastic computing (processing on streams) to enable extremely compact and low power IoT sensing nodes that can finally fulfill the promise of smart dust, another concept that was ahead of its time and yet to achieve commercial success.

B Mohammed, N Krishnaswamy, M Kiran, "Multivariate Time-Series Prediction for Traffic in Large WAN Topology", ACM/IEEE Symposium on Architectures for Networking and Communications, August 1, 2019, doi: 10.1109/ANCS.2019.8901870

Ciaran Roberts, Anna Scaglione, Mahdi Jamei, Reinhard Gentz, Sean Peisert, Emma M. Stewart, Chuck McParland, Alex McEachern, Daniel Arnold, "Learning Behavior of Distribution System Discrete Control Devices for Cyber-Physical Security", IEEE Transaction on Smart Grid, August 1, 2019, 11(1):749-761, doi: 0.1109/TSG.2019.2936016

Andrew Adams, Kay Avila, Jim Basney, Dana Brunson, Robert Cowles, Jeannette Dopheide, Terry Fleury, Elisa Heymann, Florence Hudson, Craig Jackson, Ryan Kiser, Mark Krenz, Jim Marsteller, Barton P. Miller, Sean Peisert, Scott Russell, Susan Sons, Von Welch, John Zage, "Trusted CI Experiences in Cybersecurity and Service to Open Science", Proceedings of the Conference on Practice and Experience in Advanced Research Computing (PEARC), ACM, July 2019, doi: 10.1145/3332186.3340601

Peiyun Li, Sergii Gridin, K. Burak Ucer, Richard T. Williams, Mauro Del Ben, Andrew Canning, Federico Moretti, Edith Bourret, "Picosecond Absorption Spectroscopy of Excited States in BaBrCl with and without Eu Dopant and Au Codopant", Physical Review Applied, 2019, 12 (1):014035, doi: 10.1103/PhysRevApplied.12.014035

Nan Ding, Samuel Williams, Sherry Li, Yang Liu, "Leveraging One-Sided Communication for Sparse Triangular Solvers", SciDAC19, July 18, 2019,

Samuel Williams, Charlene Yang, Khaled Ibrahim, Thorsten Kurth, Nan Ding, Jack Deslippe, Leonid Oliker, "Performance Analysis using the Roofline Model", SciDAC PI Meeting, July 2019,

A. Harpole, D. Fan, M. P. Katz, A. J. Nonaka, D. E. Willcox, and M. Zingale, "Modelling low Mach number stellar hydrodynamics with MAESTROeX", Proceedings of Astronum 2019, July 1, 2019,

J. Onorbe, F. B. Davies, Z. Lukić, J. F. Hennawi, D. Sorini, "Inhomogeneous Reionization Models in Cosmological Hydrodynamical Simulations", Monthly Notices of Royal Astronomical Society, 2019, 486:4075, doi: 10.1093/mnras/stz984

Hannah E. Ross, Keri L. Dixon, Raghunath Ghara, Ilian T. Iliev, Garrelt Mellema,, "Evaluating the QSO contribution to the 21-cm signal from the Cosmic Dawn", Monthly Notices of the Royal Astronomical Society, July 2019, 487:1101-1119, doi: 10.1093/mnras/stz1220

J. Choi, A. Sim, Data reduction methods, systems and devices, U.S. Patent No. 10,366,078, 2019,

U.S. Patent No. 10,366,078, “DATA REDUCTION METHODS, SYSTEMS, AND DEVICES”, LBNL IB2013-133.

Marquita Ellis, Giulia Guidi, Aydın Buluç, Leonid Oliker, Katherine Yelick, "diBELLA: Distributed Long Read to Long Read Alignment", 48th International Conference on Parallel Processing (ICPP), June 25, 2019,

B Mohammed, IU Awan, H Ugail, and Y Mohammad., "Failure Prediction using Machine Learning in a Virtualized HPC System and Application", Cluster Computing: The Journal of Networks, Software Tools and Applications, June 3, 2019, 471–485, doi: 10.1007/s10586-019-02917-1

Vikram Khaire, Michael Walther, Joseph F. Hennawi, Jose Oñorbe, Zarija Lukić, Xavier J. Prochaska, Todd M. Tripp, Joseph N. Burchett, Christian Rodriguez, "The power spectrum of the Lyman-α Forest at z < 0.5", Monthly Notices of the Royal Astronomical Society, 2019, 486:769, doi: 10.1093/mnras/stz344

Revathi Jambunathan, Deborah Levin, "Kinetic, three-dimensional, PIC-DSMC simulations of ion thruster plumes and the backflow region, Part 1: A colocated ion-electron source", (under review), May 20, 2019,

Elliott Binder, Tze Meng Low, Doru Thom Popovici, "Portable GPU Framework for SNP Comparisons", HiCOMB 2019, 2019,

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

Teng Wang, Suren Byna, Glenn Lockwood, Philip Carns, Shane Snyder, Sunggon Kim, Nicholas Wright, "A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks", IEEE/ACM CCGrid 2019, May 14, 2019,

S. Kim, A. Sim, K. Wu, S. Byna, T. Wang, Y. Son, H. Eom, "DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File System", 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019), 2019, doi: 10.1109/CCGRID.2019.00049

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

Tonglin Li, Quincey Koziol, Houjun Tang, Jialin Liu, Suren Byna, "I/O Performance Analysis of Science Applications Using HDF5 File-level Provenance", Cray User Group (CUG) 2019, May 10, 2019,

T. Bowen, E. Zhivun, A. Wickenbrock, V. Dumont, S. D. Bale, C. Pankow, G. Dobler, J. S. Wurtele, D. Budker, "A Network of Magnetometers for Multi-Scale Urban Science and Informatics", Geosci. Instrum. Method. Data Syst., Volume 8, Issue 1, Pages 129-138, May 8, 2019, doi: 10.5194/gi-8-129-2019

Jingqing Mu, Jerome Soumagne, Suren Byna, Quincey Koziol, Houjun Tang, Richard Warren, "Interfacing HDF5 with A Scalable Object-centric Storage System on Hierarchical Storage", Cray User Group (CUG) 2019, May 7, 2019,

Charlene Yang, Thorsten Kurth, Samuel Williams, "Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System", Cray User Group (CUG), May 2019,

Wenjing Ma, Yulong Ao, Chao Yang, Samuel Williams, "Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight", Cluster Computing, May 2019, doi: 10.1007/s10586-019-02938-w

M. Mustafa, D. Bard, W. Bhimji, Z. Lukić, R. Al-Rfou, J. Kratochvil, "CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks", Computational Astrophysics and Cosmology, 2019, 6:1, doi: 10.1186/s40668-019-0029-9

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "TIGER: topology-aware task assignment approach using ising machines", Proceedings of the 16th ACM International Conference on Computing Frontiers, April 2019,

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "Extending classical processors to support future large scale quantum accelerators", Proceedings of the 16th ACM International Conference on Computing Frontiers Pages, April 2019,

Boris Lo, Phillip Colella, "An Adaptive Local Discrete Convolution Method for the Numerical Solution of Maxwell's Equations", Communications in Applied Mathematics and Computational Science, April 26, 2019, 14:105-119, doi: DOI: 10.2140/camcos.2019.14.105

Doru Thom Popovici, Martin D. Schatz, Franz Franchetti, Tze Meng Low, "A Flexible Framework for Parallel Multi-Dimensional DFTs", April 23, 2019,

D.L. Brown, S. Crivelli, M. A. Leung, "Sustainable Research Pathways: Building Connections across Communities to Diversify the National Laboratory Workforce", CoNECD 2019 - Collaborative Network for Engineering and Computing Diversity., April 14, 2019,

G Tzimpragos, A Madhavan, D Vasudevan, D Strukov and T Sherwood, "Boosted Race Trees for Low Energy Classification - Best Paper Award", ("Best Paper Award"), ASPLOS 2019, April 2019, doi: 10.1145/3297858.3304036

Francois P. Hamon, Martin Schreiber, Michael L. Minion, "Parallel-in-Time Multi-Level Integration of the Shallow-Water Equations on the Rotating Sphere", April 12, 2019,

Submitted to Journal of Computational Physics

D.F. Martin, H.S. Johansen, P.O. Schwartz, E.G. Ng, "Improved Discretization of Grounding Lines and Calving Fronts using an Embedded-Boundary Approach in BISICLES", European Geosciences Union General Assembly, April 10, 2019,

B. Peng, R. Van Beeumen, D.B. Williams-Young, K. Kowalski, C. Yang, "Approximate Green’s function coupled cluster method employing effective dimension reduction", Journal of Chemical Theory and Computation, 2019, 15:3185-3196, doi: 10.1021/acs.jctc.9b00172

P. Benner, V. Khoromskaia, B. N. Khoromskij and C. Yang, "Computing the density of states for optical spectra of molecules by low-rank and QTT tensor approximation", Journal of Computational Physics, April 1, 2019, 382:221-239, doi: https://doi.org/10.1016/j.jcp.2019.01.011

J. Kim, A. Sim, B. Tierney, S. Suh, I. Kim, "Multivariate Network Traffic Analysis using Clustered Patterns", Journal of Computing, April 2019, 101(4):339-361, doi: 10.1007/s00607-018-0619-4

J. Kim, A. Sim, "A new approach to multivariate network traffic analysis", Journal of Computer Science and Technology, 2019, 34(2):388–402, doi: 10.1007/s11390-019-1915-y

Drew Paine, Lavanya Ramakrishnan, "Surfacing Data Change in Scientific Work", iConference 2019, Springer Verlag, March 19, 2019, 15-26, doi: 10.1007/978-3-030-15742-5_2

Charlene Yang, Samuel Williams, Performance Analysis of GPU-Accelerated Applications using the Roofline Model, GPU Technology Conference (GTC), March 2019,

Sean Peisert, Brooks Evans, Michael Liang, Barclay Osborn, David Rusting, David Thurston, Security Without Moats and Walls: Zero-Trust Networking for Enhancing Security in R&E Environments, CENIC Annual Conference, March 19, 2019,

Mauro Del Ben, H Felipe, Gabriel Antonius, Tonatiuh Rangel, Steven G Louie, Jack Deslippe, Andrew Canning, "Static Subspace Approximation for the Evaluation of G0W0 Quasiparticle Energies within a Sum-Over-Bands Approach", Physical Review B, 2019, 99 (12):125128, doi: 10.1103/PhysRevB.99.125128

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2019, LBNL 2001191, doi: 10.25344/S4F301

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Babak Behzad, Suren Byna, Prabhat, and Marc Snir, "Optimizing I/O Performance of HPC Applications with Autotuning", ACM Transactions on Parallel Computing (TOPC), February 28, 2019,

Daniel Martin, Modeling Antarctic Ice Sheet Dynamics using Adaptive Mesh Refinement, 2019 SIAM Conference on Computational Science and Engineering, February 26, 2019,

Tyler Leibengood, Alina Lazar, Alex Sim, Kesheng Wu, "Network Traffic Performance Prediction with Multivariate Clusters in Time Windows", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Alexandra Ballow, Alina Lazar, Alex Sim, Kesheng Wu, "Joint Sequence Analysis Challenges: How to Handle Missing Values and Mixed Variable Types", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Patricia Gonzalez-Guerrero, Xinfei Guo, Mircea R Stan, "ASC-FFT: Area-efficient low-latency FFT design based on asynchronous stochastic computing", 10th Latin American Symposium on Circuits & Systems (LASCAS), Armenia, Colombia, IEEE, February 24, 2019, doi: 10.1109/LASCAS.2019.8667599

Asynchronous Stochastic Computing (ASC) is a new paradigm that addresses Synchronous Stochastic Computing (SSC) drawbacks, expensive stochastic number generation (SNG) and long latency, by using continuous time streams (CTS). To go beyond the basic operations of addition and multiplication in ASC we need to incorporate a memory element. Although for SSC the natural memory element is a clocked-flip-flop, using the same approach with no synchronized data leads to unacceptable large error. In this paper, we propose to use a capacitor embedded in a feedback loop as the ASC memory element. Based on this idea, we design a low-error asynchronous adder that stores the carry information in the capacitor. Our adder enables the implementation of more complex computation logic. As an example, we implement an asynchronous stochastic Fast Fourier Transform (ASC-FFT) using a FinFET1X 1 technology. The proposed adder requires 76%-24% less hardware cost compared against conventional and SSC adders respectively. Besides, the ASC-FFT shows 3X less latency when compared with SSC-FFT approaches and significant improvements in latency and area over conventional FFT architectures with no degradation of the computation accuracy measured by the FFT Signal to Noise Ratio (SNR).

Samuel Williams, Performance Modeling and Analysis, CS267 Lecture, University of California at Berkeley, February 14, 2019,

Sean Peisert, Experiences in Building a Mission-Driven Security R&D Program for Science and Energy, Computer Science Colloquium Seminar, University of California, Davis, February 7, 2019,

Sean Peisert, Daniel Arnold, Using Physics to Improve Cybersecurity for the Distribution Grid and Distributed Energy Resources, Naval Postgraduate School, February 5, 2019,

George Michelogiannakis, Jeremiah Wilke, Min Yee Teh, Madeleine Glick, John Shalf, Keren Bergman, "Challenges and opportunities in system-level evaluation of photonics", Proceedings Volume 10946, Metro and Data Center Optical Networks and Short-Reach Links II, February 2019, doi: https://doi.org/10.1117/12.2510443

Mauro Del Ben, H Felipe, Andrew Canning, Nathan Wichmann, Karthik Raman, Ruchira Sasanka, Chao Yang, Steven G Louie, Jack Deslippe, "Large-Scale GW Calculations on Pre-Exascale HPC Systems", Computer Physics Communications, 2019, 235:187-195, doi: 10.1016/j.cpc.2018.09.003

M Kiran, A Chhabra, "Understanding flows in high-speed scientific networks: A Netflow data study", Future Generation Computer Systems, February 1, 2019, 94:72-79,

M. Walther, J. Onorbe, J. F. Hennawi, Z. Lukić, "New Constraints on IGM Thermal Evolution from the Ly-alpha Forest Power Spectrum", The Astrophysical Journal, 2019, 872:13, doi: 10.3847/1538-4357/aafad1

Aleksandar Donev, Alejandro L. Garcia, Jean-Philippe Péraud, Andrew J. Nonaka, John B. Bell, "Fluctuating Hydrodynamics and Debye-Hückel-Onsager Theory for Electrolytes", Current Opinion in Electrochemistry, 2019, 13:1 - 10, doi: https://doi.org/10.1016/j.coelec.2018.09.004

Yu-Hang Tang, Wibe A. de Jong, "Prediction of atomization energy using graph kernel and active learning", The Journal of Chemical Physics, January 25, 2019, 150:044107, doi: 10.1063/1.5078640

Sean Peisert, Building a Mission-Driven, Applied Cybersecurity R&D Program from Scratch, VISA Research, January 23, 2019,

Sebastian Götschel , Michael Minion, "An Efficient Parallel-in-Time Method for Optimization with Parabolic PDEs", SIAM Journal on Scientific Computing, January 21, 2019,

In submission

Stefano Marchesini, Anne Sakdinawat, "Shaping Coherent X-rays with Binary Optics", Optics Express Vol. 27, Issue 2, pp. 907-917 (2019), January 21, 2019,

Oliver Rübel, Andrew Tritt, Benjamin Dichter, Thomas Braun, Nicholas Cain, Nathan Clack, Thomas J. Davidson, Max Dougherty, Jean-Christophe Fillion-Robin, Nile Graddis, Michael Grauer, Justin T. Kiggins, Lawrence Niu, Doruk Ozturk, William Schroeder, Ivan Soltesz, Friedrich T. Sommer, Karel Svoboda, Lydia Ng, Loren M. Frank, Kristofer Bouchard, "NWB:N 2.0: An Accessible Data Standard for Neurophysiology", bioRxiv, January 17, 2019, doi: https://doi.org/10.1101/523035

George Michelogiannakis, Computation and Communication in a Post Moore’s Law Era, Post Exascale workshop part of HiPEAC conference, January 2019,

Charlene Yang, Performance Analysis with Roofline on GPUs, Roofline Tutorial, ECP Annual Meeting, January 2019,

Jack Deslippe, Optimization Use Cases with the Roofline Model, Roofline Tutorial, ECP Annual Meeting, January 2019,

Samuel Williams, Roofline on CPU-based Systems, Roofline Tutorial, ECP Annual Meeting, January 2019,

Samuel Williams, Introduction to the Roofline Model, Roofline Tutorial, ECP Annual Meeting, January 2019,

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Scott B. Baden, Paul H. Hargrove, Dan Bonachea, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - GASNet-EX", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Daniel F. Martin, Stephen L. Cornford, Antony J. Payne, "Millennial‐scale Vulnerability of the Antarctic Ice Sheet to Regional Ice Shelf Collapse", Geophysical Research Letters, January 9, 2019, doi: 10.1029/2018gl081229

Abstract: 

The Antarctic Ice Sheet (AIS) remains the largest uncertainty in projections of future sea level rise. A likely climate‐driven vulnerability of the AIS is thinning of floating ice shelves resulting from surface‐melt‐driven hydrofracture or incursion of relatively warm water into subshelf ocean cavities. The resulting melting, weakening, and potential ice‐shelf collapse reduces shelf buttressing effects. Upstream ice flow accelerates, causing thinning, grounding‐line retreat, and potential ice sheet collapse. While high‐resolution projections have been performed for localized Antarctic regions, full‐continent simulations have typically been limited to low‐resolution models. Here we quantify the vulnerability of the entire present‐day AIS to regional ice‐shelf collapse on millennial timescales treating relevant ice flow dynamics at the necessary ∼1km resolution. Collapse of any of the ice shelves dynamically connected to the West Antarctic Ice Sheet (WAIS) is sufficient to trigger ice sheet collapse in marine‐grounded portions of the WAIS. Vulnerability elsewhere appears limited to localized responses.

Plain Language Summary:

The biggest uncertainty in near‐future sea level rise (SLR) comes from the Antarctic Ice Sheet. Antarctic ice flows in relatively fast‐moving ice streams. At the ocean, ice flows into enormous floating ice shelves which push back on their feeder ice streams, buttressing them and slowing their flow. Melting and loss of ice shelves due to climate changes can result in faster‐flowing, thinning and retreating ice leading to accelerated rates of global sea level rise.To learn where Antarctica is vulnerable to ice‐shelf loss, we divided it into 14 sectors, applied extreme melting to each sector's floating ice shelves in turn, then ran our ice flow model 1000 years into the future for each case. We found three levels of vulnerability. The greatest vulnerability came from attacking any of the three ice shelves connected to West Antarctica, where much of the ice sits on bedrock lying below sea level. Those dramatic responses contributed around 2m of sea level rise. The second level came from four other sectors, each with a contribution between 0.5‐1m. The remaining sectors produced little to no contribution. We examined combinations of sectors, determining that sectors behave independently of each other for at least a century.

M. Emmett, E. Motheau, W. Zhang, M. Minion, J. B. Bell, "A Fourth-Order Adaptive Mesh Refinement Algorithm for the Multicomponent, Reacting Compressible Navier-Stokes Equations", Combustion Theory and Modeling, 2019,

C. Varadharajan, S. Cholia, C. Snavely, V. Hendrix, C. Procopiou, D. Swantek, W. J. Riley, and D. A. Agarwal, "Launching an accessible archive of environmental data", Eos, 100, January 8, 2019, doi: https://doi.org/10.1029/2019EO111263

Mariam Kiran, Anshuman Chhabra, "Understanding flows in high-speed scientific networks: A Netflow data study", Future Generation Computer Science, 2019,

Bradley Mitchell, Ravi Naik, Unpil Baek, Dar Dahlen, John Mark Kreikebaum, Kevin O Brien, Vinay Ramasesh, Machiel Blok, Wim Lavrijsen, Costin Iancu, others, Experimental Methods for Improving Heuristic Quantum Algorithms on NISQ Devices, APS March Meeting Abstracts, Pages: C42--012 2019,

Akash Dixit, David Schuster, Aaron Chou, Ankur Agrawal, Srivatsan Chakram, Ravi Naik, Axion Dark Matter Detection with Superconducting Qubits, APS March Meeting Abstracts, Pages: K28--010 2019,

Srivatsan Chakram, Ravi Naik, Akash Dixit, Yao Lu, Alexander Anferov, Nelson Leung, Andrew Oriani, David Schuster, Quantum information processing using 3D multimode circuit QED, APS March Meeting Abstracts, Pages: C29--008 2019,

Ravi Naik, Bradley Mitchell, Unpil Baek, Dar Dahlen, John Mark Kreikebaum, Vinay Ramasesh, Machiel Blok, Irfan Siddiqi, Limitations and improvements of two qubit gates in superconducting circuit QED, APS March Meeting Abstracts, Pages: L29--006 2019,

N Leung, Y Lu, S Chakram, RK Naik, N Earnest, R Ma, K Jacobs, AN Cleland, DI Schuster, "Deterministic bidirectional communication and remote entanglement generation between superconducting qubits", npj Quantum Information, 2019, 5:1--5, doi: 10.1038/s41534-019-0128-0

Alina Lazar, Alexandra Ballow, Ling Jin, C Anna Spurlock, Alexander Sim, Kesheng Wu, Machine Learning for Prediction of Mid to Long Term Habitual Transportation Mode Use, 2019 IEEE International Conference on Big Data (Big Data), Pages: 4520--4524 2019,

Payton Linton, William Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Gilberto Pastorello, Lavanya Ramakrishnan, Kesheng Wu, Understanding Data Similarity in Large-Scale Scientific Datasets, 2019 IEEE International Conference on Big Data (Big Data), Pages: 4525--4531 2019,

Sunggon Kim, Alex Sim, Kesheng Wu, Suren Byna, Teng Wang, Yongseok Son, Hyeonsang Eom, DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File Systems., CCGRID, Pages: 351--360 2019,

Kesheng Wu, Florin Rusu, Special issue on scientific and statistical data management, Distributed and Parallel Databases, Pages: 1--3 2019,

J Atalaya, S Hacohen-Gourgy, I Siddiqi, AN Korotkov, "Correlators Exceeding One in Continuous Measurements of Superconducting Qubits.", Physical review letters, 2019, 122:223603, doi: 10.1103/physrevlett.122.223603

A Eddins, JM Kreikebaum, DM Toyli, EM Levenson-Falk, A Dove, WP Livingston, BA Levitan, LCG Govia, AA Clerk, I Siddiqi, "High-Efficiency Measurement of an Artificial Atom Embedded in a Parametric Amplifier", Physical Review X, 2019, 9, doi: 10.1103/PhysRevX.9.011004

Devarshi Ghoshal, Kesheng Wu, Eric Pouyoul, Erich Strohmaier, "Analysis and Prediction of Data Transfer Throughput for Data-Intensive Workloads", 2019 IEEE International Conference on Big Data (Big Data), 2019, 3648--3657,

Kesheng Wu, Alex Sim, Jonathan Wang, Seongwook Hwangbo, Methods, systems, and devices for accurate signal timing of power component events, 2019,

US Patent app no. 20190138371, “Methods, systems, and devices for accurate signal timing of power component events”

Payton A Linton, William M Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Kesheng Wu, Gilberto Pastorello, Lavanya Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", 2019,

Beytullah Yildiz, Kesheng Wu, Suren Byna, Arie Shoshani, "Parallel membership queries on very large scientific data sets using bitmap indexes", Concurrency and Computation: Practice and Experience, January 1, 2019, 31:e5157,

Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating‐point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word‐Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.

Junmin Gu, Burlen Loring, Kesheng Wu, E Wes Bethel, HDF5 as a vehicle for in transit data movement, Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, Pages: 39--43 2019,

Jung Heon Song, Marcos L\ opez de Prado, Horst D Simon, Kesheng Wu, Extracting Signals from High-Frequency Trading with Digital Signal Processing Tools, The Journal of Financial Data Science, Pages: 124--138 2019,

Jongbeen Han, Heemin Kim, Hyeonsang Eom, Jonathan Coignard, Kesheng Wu, Yongseok Son, "Enabling SQL-Query Processing for Ethereum-based Blockchain Systems", Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, 2019, 1--7,

Daan Camps, Nicola Mastronardi, Raf Vandebril, Paul Van Dooren, "Swapping 2 × 2 blocks in the Schur and generalized Schur form", Journal of Computational and Applied Mathematics, 2019, doi: https://doi.org/10.1016/j.cam.2019.05.022

Pole swapping methods for the eigenvalue problem - Rational QR algorithms, Daan Camps, 2019,

Daan Camps, Karl Meerbergen, Raf Vandebril, "An implicit filter for rational Krylov using core transformations", Linear Algebra Appl., 2019, 561:113--140, doi: 10.1016/j.laa.2018.09.021

Daan Camps, Karl Meerbergen, Raf Vandebril, "A rational QZ method", SIAM J. Matrix Anal. Appl., 2019, 40:943--972, doi: 10.1137/18M1170480

Burak Cetin, Alina Lazar, Jinoh Kim, Alex Sim, Kesheng Wu, "Federated Wireless Network Intrusion Detection", 2019 IEEE International Conference on Big Data (Big Data), Pages: 6004--6006 2019,

Qiao Kang, Ankit Agrawal, Alok Choudhary, Alex Sim, Kesheng Wu, Rajkumar Kettimuthu, Peter H Beckman, Zhengchun Liu, Wei-keng Liao, "Spatiotemporal Real-Time Anomaly Detection for Supercomputing Systems", 2019 IEEE International Conference on Big Data (Big Data), 2019, 4381--4389,

W Cui, G Tzimpragos, Y Tao, J Mcmahan, D Dangwal, N Tsiskaridze, G Michelogiannakis, DP Vasudevan, T Sherwood, "Language Support for Navigating Architecture Design in Closed Form", ACM Journal on Emerging Technologies in Computing Systems, January 2019, 16:1--28, doi: 10.1145/3360047

Dipak Ghosal, Sambit Shukla, Alex Sim, Aditya V Thakur, Kesheng Wu, "A Reinforcement Learning Based Network Scheduler For Deadline-Driven Data Transfers", 2019 IEEE Global Communications Conference (GLOBECOM), 2019, 1--6,

Bin Dong, Patrick Kilian, Xiaocan Li, Fan Guo, Suren Byna, Kesheng Wu, "Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study", Proceedings of the 31st International Conference on Scientific and Statistical Database Management, January 1, 2019, 202--205,

D Vasudevan, G Michclogiannakis, D Donofrio, J Shalf, "PARADISE - Post-Moore Architecture and Accelerator Design Space Exploration Using Device Level Simulation and Experiments", 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE, January 2019, doi: 10.1109/ispass.2019.00022

Astha Syal, Alina Lazar, Jinoh Kim, Alex Sim, Kesheng Wu, "Automatic detection of network traffic anomalies and changes", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 3--10,

Olivia Del Guercio, Rafael Orozco, Alex Sim, Kesheng Wu, "Similarity-based Compression with Multidimensional Pattern Matching", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 19--24,

Mengtian Jin, Youkow Homma, Alex Sim, Wilko Kroeger, Kesheng Wu, "Performance prediction for data transfers in LCLS workflow", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 37--44,

Hanul Sung, Jiwoo Bang, Alexander Sim, Kesheng Wu, Hyeonsang Eom, "Understanding Parallel I/O Performance Trends Under Various HPC Configurations", Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, 2019, 29--36,

Francois P. Hamon, Martin Schreiber, Michael L. Minion, "Multi-Level Spectral Deferred Corrections Scheme for the Shallow Water Equations on the Rotating Sphere", Journal of Computational Physics, January 1, 2019, 376:435-454,

Sambit Shukla, Dipak Ghosal, Kesheng Wu, Alex Sim, Matthew Farrens, "Co-optimizing Latency and Energy for IoT services using HMP servers in Fog Clusters", 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC), 2019, 121--128,

Alina Lazar, Ling Jin, C Anna Spurlock, Kesheng Wu, Alex Sim, Annika Todd, "Evaluating the effects of missing values and mixed data types on social sequence clustering using t-SNE visualization", Journal of Data and Information Quality (JDIQ), 2019, 11:1--22,

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis", International Conference on High Performance Computing, January 1, 2019, 61--80,

M. Del Ben, F.H. da Jornada, A. Canning, N. Wichmann, K. Raman, R. Sasanka, C. Yang, S.G. Louie, J. Deslippe, "Large-scale GW calculations on pre-exascale HPC systems", Computer Physics Communications, 2019, 235:187-195, doi: 10.1016/j.cpc.2018.09.003

Victor Yu, William Dawson, Alberto Garcia, Ville Havu, Ben Hourahine, William Huhn, Mathias Jacquelin, Weile Jia, Murat Keceli, Raul Laasner, others, Large-Scale Benchmark of Electronic Structure Solvers with the ELSI Infrastructure, Bulletin of the American Physical Society, 2019,

R. Oguz Selvitopi, Gunduz Vehbi Demirci, Ata Turk, Cevdet Aykanat, "Locality-aware and load-balanced static task scheduling for MapReduce", Future Generation Computer Systems (FGCS), January 2019, 90:49-61, doi: https://doi.org/10.1016/j.future.2018.06.035

Y. Liu, W. Sid-Lakhdar, E. Rebrova, P. Ghysels, X. Sherry Li, "A parallel hierarchical blocked adaptive cross approximation algorithm", The International Journal of High Performance Computing Applications, January 1, 2019,

Olivia Del Guercio, Rafael Orozco, Alex Sim, Kesheng Wu, "Multidimensional Compression with Pattern Matching", 2019 Data Compression Conference (DCC), Pages: 567--567 2019,

Catherine A Watkinson, Sambit K. Giri, Hannah E. Ross, Keri L. Dixon, Ilian T. Iliev, Garrelt Mellema, Jonathan R. Pritchard, "The 21-cm bispectrum as a probe of non-Gaussianities due to X-ray heating", Monthly Notices of the Royal Astronomical Society, January 2019, 482:2653-2669, doi: 10.1093/mnras/sty2740

W. Langhans, J. Mueller, W.D. Collins, "Optimization of the Eddy-Diffusivity/Mass-Flux shallow cumulus and boundary-layer parametrization using surrogate models", Journal of Advances in Modeling Earth Systems (JAMES), Vol 11, Issue 2,, 2019,

J. Müller, M. Day, "Surrogate Optimization of Computationally Expensive Black-Box Problems with Hidden Constraints", INFORMS Journal on Computing, 2019, 31:633-845,

O Karslıoğlu, M Gehlmann, J Müller, S Nemšák, JA Sethian, A Kaduwela, H Bluhm, C Fadley, "An Efficient Algorithm for Automatic Structure Optimization in X-ray Standing-Wave Experiments", Journal of Electron Spectroscopy and Related Phenomena, January 1, 2019,

M. M. Phillips, C. Contreras, E. Y. Hsiao, N., C. R. Burns, M. Stritzinger, C. Ashall, W. L., P. Hoeflich, S. E. Persson, A. L., N. B. Suntzeff, S. A. Uddin, J. Anais, E., L. Busta, A. Campillay, S. Castell\ on, C., T. Diamond, C. Gall, C. Gonzalez, S., K. Krisciunas, M. Roth, J. Ser\ on, F., S. Torres, J. P. Anderson, C. Baltay, G., L. Galbany, A. Goobar, E. Hadjiyska, M., M. Kasliwal, C. Lidman, P. E. Nugent, S., D. Rabinowitz, S. D. Ryder, B. P. Schmidt, B. J. Shappee, E. S. Walker, "Carnegie Supernova Project-II: Extending the Near-infrared Hubble Diagram for Type Ia Supernovae to z\nbsp\sim\nbsp0.1", Publications of the ASP, 2019, 131:014001, doi: 10.1088/1538-3873/aae8bd

E. Y. Hsiao, M. M. Phiilips, G. H. Marion, R. P., N. Morrell, D. J. Sand, C. R. Burns, C., P. Hoeflich, M. D. Stritzinger, S., J. P. Anderson, C. Ashall, C. Baltay, E., D. P. K. Banerjee, S. Davis, T. R. Diamond, G., W. L. Freedman, F. Foerster, L., C. Gall, S. Gonzalez-Gaitan, A., M. Hamuy, S. Holmbo, M. M. Kasliwal, K., S. Kumar, C. Lidman, J. Lu, P. E., S. Perlmutter, S. E. Persson, A. L., D. Rabinowitz, M. Roth, S. D. Ryder, B. P., M. Shahbandeh, N. B. Suntzeff, F. Taddia, S. Uddin, L. Wang, Carnegie Supernova Project-II: The Near-infrared Spectroscopy Program, Publications of the ASP, Pages: 014002 2019, doi: 10.1088/1538-3873/aae961