Careers | Phone Book | A - Z Index

2019 Publications

Mark F Adams

2019

Mark Adams, Stephen Cornford, Daniel Martin, Peter McCorquodale, "Composite matrix construction for structured grid adaptive mesh refinement", Computer Physics Communications, November 2019, 244:35-39, doi: 10.1016/j.cpc.2019.07.006

Deb Agarwal

2019

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019,

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

C. Varadharajan, S. Cholia, C. Snavely, V. Hendrix, C. Procopiou, D. Swantek, W. J. Riley, and D. A. Agarwal, "Launching an accessible archive of environmental data", Eos, 100, January 8, 2019, doi: https://doi.org/10.1029/2019EO111263

Hadia Ahmed

2019

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Ann S. Almgren

2019

D. Fan, A. Nonaka, A.S. Almgren, A. Harpole, M. Zingale, "MAESTROeX: A Massively Parallel Low Mach Number Astrophysical Solver", August 14, 2019,

Knut Sverdrup, Ann S. Almgren, Nikolaos Nikiforakis, "An embedded boundary approach for efficient simulations of viscoplastic fluids in three dimensions", August 10, 2019,

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

John Bachan

2019

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001236, doi: 10.25344/S4V30R

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001191, doi: 10.25344/S4F301

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Scott Baden

2019

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001236, doi: 10.25344/S4V30R

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001191, doi: 10.25344/S4F301

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Scott B. Baden, Paul H. Hargrove, Dan Bonachea, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - GASNet-EX", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Vincent E. Beckner

2019

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

John B. Bell

2019

M. Zingale, M.P. Katz, J.B. Bell, M.L. Minion, A.J. Nonaka, W. Zhang, "Improved Coupling of Hydrodynamics and Nuclear Reactions via Spectral Deferred Corrections", August 14, 2019,

L. Esclapez, V. Ricchiuti, J.B. Bell, M.S. Day, "A spectral deferred correction strategy for low Mach number flows subject to electric fields", August 10, 2019,

D. R. Ladiges, A. J. Nonaka, J. B. Bell, A. L. Garcia, "On the Suppression and Distortion of Non-Equilibrium Fluctuations by Transpiration", August 10, 2019, doi: 10.1063/1.5093922

A. Donev, A. J. Nonaka, C. Kim, A. L. Garcia, J. B. Bell, "Fluctuating hydrodynamics of electrolytes at electroneutral scales", August 10, 2019,

M. Zingale, K. Eiden, Y. Cavecchi, A. Harpole, J. B. Bell, M. Chang, I. Hawke, M. P. Katz, C.M. Malone, A. J. Nonaka, D. E. Willcox, W. Zhang, "Toward resolved simulations of burning fronts in thermonuclear X-ray bursts", Journal of Physics: Conference Series, 2019, 1225,

A. J. Aspden, M. S. Day, J. B. Bell, "Towards the Distributed Burning Regime in Turbulent Premixed Flames", Journal of Fluid Mechanics, 2019, 871:1-21,

J. Bell, M. Day, J. Goodman, R. Grout, M. Morzfeld, "A Bayesian approach to calibrating hydrogen flame kinetics using many experiments and parameters", Combustion and Flame, 2019,

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

Aleksandar Donev, Alejandro L. Garcia, Jean-Philippe Péraud, Andrew J. Nonaka, John B. Bell, "Fluctuating Hydrodynamics and Debye-Hückel-Onsager Theory for Electrolytes", Current Opinion in Electrochemistry, 2019, 13:1 - 10, doi: https://doi.org/10.1016/j.coelec.2018.09.004

M. Emmett, E. Motheau, W. Zhang, M. Minion, J. B. Bell, "A Fourth-Order Adaptive Mesh Refinement Algorithm for the Multicomponent, Reacting Compressible Navier-Stokes Equations", Combustion Theory and Modeling, 2019,

Johannes Blaschke

2019

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

Dan Bonachea

2019

Paul H. Hargrove, Dan Bonachea, "Efficient Active Message RMA in GASNet Using a Target-Side Reassembly Protocol (Extended Abstract)", IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), Lawrence Berkeley National Laboratory Technical Report, November 17, 2019, LBNL 2001‍238, doi: 10.25344/S4PC7M

GASNet is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models on future exascale machines. This paper investigates strategies for efficient implementation of GASNet’s “AM Long” API that couples an RMA (Remote Memory Access) transfer with an Active Message (AM) delivery.
We discuss several network-level protocols for AM Long and propose a new target-side reassembly protocol. We present a microbenchmark evaluation on the Cray XC Aries network hardware. The target-side reassembly protocol on this network improves AM Long end-to-end latency by up to 33%, and the effective bandwidth by up to 49%, while also enabling asynchronous source completion that drastically reduces injection overheads.
The improved AM Long implementation for Aries is available in GASNet-EX release v2019.9.0 and later.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001236, doi: 10.25344/S4V30R

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001191, doi: 10.25344/S4F301

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Scott B. Baden, Paul H. Hargrove, Dan Bonachea, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - GASNet-EX", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Kristofer Bouchard

2019

Oliver Rübel, Andrew Tritt, Benjamin Dichter, Thomas Braun, Nicholas Cain, Nathan Clack, Thomas J. Davidson, Max Dougherty, Jean-Christophe Fillion-Robin, Nile Graddis, Michael Grauer, Justin T. Kiggins, Lawrence Niu, Doruk Ozturk, William Schroeder, Ivan Soltesz, Friedrich T. Sommer, Karel Svoboda, Lydia Ng, Loren M. Frank, Kristofer Bouchard, "NWB:N 2.0: An Accessible Data Standard for Neurophysiology", bioRxiv, January 17, 2019, doi: https://doi.org/10.1101/523035

Joshua Boverhof

2019

Reinhard Gentz, Sean Peisert, Joshua Boverhof, Daniel Gunter, "SPARCS: Stream-Processing Architecture applied in Real-time Cyber-physical Security", Proceedings of the 15th IEEE International Conference on e-Science (eScience), San Diego, CA, IEEE, September 2019,

Aydin Buluç

2019

Marquita Ellis, Giulia Guidi, Aydın Buluç, Leonid Oliker, Katherine Yelick, "diBELLA: Distributed Long Read to Long Read Alignment", 48th International Conference on Parallel Processing (ICPP), June 25, 2019,

Anastasiia Butko

2019

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "Extending classical processors to support future large scale quantum accelerators", Proceedings of the 16th ACM International Conference on Computing Frontiers Pages, April 2019,

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "TIGER: topology-aware task assignment approach using ising machines", Proceedings of the 16th ACM International Conference on Computing Frontiers, April 2019,

Surendra Byna

2019

Richard Warren, Jerome Soumagne, Jingqing Mu, Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Analysis in the Data Path of an Object-centric Data Management System", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Houjun Tang, Suren Byna, Stephen Bailey, Zarija Lukic, Jialin Liu, Quincey Koziol, Bin Dong, "Tuning Object-centric Data Management Systems for Large Scale Scientific Applications", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Wei Zhang, Suren Byna, Chenxu Niu, Yong Chen, "Exploring Metadata Search Essentials for Scientific Data Management", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 17, 2019,

Tirthak Patel, Suren Byna, Glenn K. Lockwood, Devesh Tiwari, "Revisiting I/O Behavior in Large-Scale Storage Systems: The Expected and the Unexpected", Supercomputing 2019 (SC19), November 24, 2019, doi: 10.1145/3295500.3356183

Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas, "Comparison of Array Management Library Performance - A Neuroscience Use Case", SC19 Poster, November 20, 2019,

Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen, "MIQS: Metadata Indexing and erying Service for Self-Describing File Formats", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 19, 2019,

Houjun Tang, Quincey Koziol, Suren Byna, John Mainzer, Tonglin Li, "Enabling Transparent Asynchronous I/O using Background Threads", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW 2019), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00006

Megha Agarwal, Divyansh Singhvi, Preeti Malakar, Suren Byna, "Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00007

Glenn K. Lockwood, Shane Snyder, Suren Byna, Philip Carns, Nicholas J. Wright, "Understanding Data Motion in the Modern HPC Data Center", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00012

Bin Dong, Patrick Frank Heiner Kilian, Xiaocan Li, Fan Guo, Suren Byna and Kesheng Wu, "Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study", SSDBM 2019, July 23, 2019,

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-aware Programming Model for Composing Array Data Analysis", ISC 2019 ((Acceptance rate:24%),), June 16, 2019,

S. Kim, A. Sim, K. Wu, S. Byna, T. Wang, Y. Son, H. Eom, "DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File System", 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019), 2019, doi: 10.1109/CCGRID.2019.00049

Teng Wang, Suren Byna, Glenn Lockwood, Philip Carns, Shane Snyder, Sunggon Kim, Nicholas Wright, "A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks", IEEE/ACM CCGrid 2019, May 14, 2019,

Tonglin Li, Quincey Koziol, Houjun Tang, Jialin Liu, Suren Byna, "I/O Performance Analysis of Science Applications Using HDF5 File-level Provenance", Cray User Group (CUG) 2019, May 10, 2019,

Jingqing Mu, Jerome Soumagne, Suren Byna, Quincey Koziol, Houjun Tang, Richard Warren, "Interfacing HDF5 with A Scalable Object-centric Storage System on Hierarchical Storage", Cray User Group (CUG) 2019, May 7, 2019,

Babak Behzad, Suren Byna, Prabhat, and Marc Snir, "Optimizing I/O Performance of HPC Applications with Autotuning", ACM Transactions on Parallel Computing (TOPC), February 28, 2019,

Beytullah Yildiz, Kesheng Wu, Suren Byna, Arie Shoshanii,, "Parallel membership queries on very large scientific data sets using bitmap indexes", Concurrency and Computation: Practice and Experience, January 28, 2019, 31, doi: https://doi.org/10.1002/cpe.5157

Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating‐point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word‐Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.

Daan Camps

2019

Daan Camps, Karl Meerbergen, Raf Vandebril, "A rational QZ method", SIAM J. Matrix Anal. Appl., 2019, 40:943--972, doi: 10.1137/18M1170480

Daan Camps, Karl Meerbergen, Raf Vandebril, "An implicit filter for rational Krylov using core transformations", Linear Algebra Appl., 2019, 561:113--140, doi: 10.1016/j.laa.2018.09.021

Pole swapping methods for the eigenvalue problem - Rational QR algorithms, Daan Camps, 2019,

Daan Camps, Nicola Mastronardi, Raf Vandebril, Paul Van Dooren, "Swapping 2 × 2 blocks in the Schur and generalized Schur form", Journal of Computational and Applied Mathematics, 2019, doi: https://doi.org/10.1016/j.cam.2019.05.022

Andrew Canning

2019

M. Del Ben, F.H. da Jornada, A. Canning, N. Wichmann, K. Raman, R. Sasanka, C. Yang, S.G. Louie, J. Deslippe, "Large-scale GW calculations on pre-exascale HPC systems", Computer Physics Communications, 2019, 235:187-195, doi: 10.1016/j.cpc.2018.09.003

Cy Chan

2019

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

Shreyas Cholia

2019

C. Varadharajan, S. Cholia, C. Snavely, V. Hendrix, C. Procopiou, D. Swantek, W. J. Riley, and D. A. Agarwal, "Launching an accessible archive of environmental data", Eos, 100, January 8, 2019, doi: https://doi.org/10.1029/2019EO111263

Phillip Colella

2019

Boris Lo, Phillip Colella, "An Adaptive Local Discrete Convolution Method for the Numerical Solution of Maxwell's Equations", Communications in Applied Mathematics and Computational Science, April 26, 2019, 14:105-119, doi: DOI: 10.2140/camcos.2019.14.105

Marcus S. Day

2019

L. Esclapez, V. Ricchiuti, J.B. Bell, M.S. Day, "A spectral deferred correction strategy for low Mach number flows subject to electric fields", August 10, 2019,

N. T. Wimer, M. S. Day, C. Lapointe, A. S. Makowiecki, J. F. Glusman, J. W. Daily, G. B. Rieker, P. E. Hamlington, "High-resolution numerical simulations of a large-scale helium plume using adaptive mesh refinement", August 10, 2019,

M. T. Henry de Frahan, S. Yellapantula, R. King, M. S. Day, R. W. Grout, "Deep learning for presumed probability density function models", August 10, 2019,

D. Dasgupta, W. Sun, M. Day, A. Aspden, T. Lieuwen, "Analysis of chemical pathways for n-dodecane/air turbulent premixed flames", August 10, 2019,

A. J. Aspden, M. S. Day, J. B. Bell, "Towards the Distributed Burning Regime in Turbulent Premixed Flames", Journal of Fluid Mechanics, 2019, 871:1-21,

J. Bell, M. Day, J. Goodman, R. Grout, M. Morzfeld, "A Bayesian approach to calibrating hydrogen flame kinetics using many experiments and parameters", Combustion and Flame, 2019,

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

J Muller, M Day, "Surrogate Optimization of Computationally Expensive Black-box Problems with Hidden Constraints", INFORMS Journal on Computing, 2019,

Nan Ding

2019

Nan Ding, Samuel Williams, "An Instruction Roofline Model for GPUs", Performance Modeling, Benchmarking, and Simulation (PMBS), BEST PAPER AWARD, November 18, 2019,

Nan Ding, Samuel Williams, An Instruction Roofline Model for GPUs, Performance Modeling, Benchmarking, and Simulation (PMBS), BEST PAPER AWARD, November 18, 2019,

Nan Ding, Samuel Williams, Sherry Li, Yang Liu, "Leveraging One-Sided Communication for Sparse Triangular Solvers", SciDAC19, July 18, 2019,

Samuel Williams, Charlene Yang, Khaled Ibrahim, Thorsten Kurth, Nan Ding, Jack Deslippe, Leonid Oliker, "Performance Analysis using the Roofline Model", SciDAC PI Meeting, July 2019,

Bin Dong

2019

Richard Warren, Jerome Soumagne, Jingqing Mu, Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Analysis in the Data Path of an Object-centric Data Management System", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Houjun Tang, Suren Byna, Stephen Bailey, Zarija Lukic, Jialin Liu, Quincey Koziol, Bin Dong, "Tuning Object-centric Data Management Systems for Large Scale Scientific Applications", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Bin Dong, Patrick Frank Heiner Kilian, Xiaocan Li, Fan Guo, Suren Byna and Kesheng Wu, "Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study", SSDBM 2019, July 23, 2019,

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-aware Programming Model for Composing Array Data Analysis", ISC 2019 ((Acceptance rate:24%),), June 16, 2019,

David Donofrio

2019

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "Extending classical processors to support future large scale quantum accelerators", Proceedings of the 16th ACM International Conference on Computing Frontiers Pages, April 2019,

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "TIGER: topology-aware task assignment approach using ising machines", Proceedings of the 16th ACM International Conference on Computing Frontiers, April 2019,

D Vasudevan, G Michclogiannakis, D Donofrio, J Shalf, "PARADISE - Post-Moore Architecture and Accelerator Design Space Exploration Using Device Level Simulation and Experiments", 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE, January 2019, doi: 10.1109/ispass.2019.00022

Marquita Ellis

2019

Marquita Ellis, Giulia Guidi, Aydın Buluç, Leonid Oliker, Katherine Yelick, "diBELLA: Distributed Long Read to Long Read Alignment", 48th International Conference on Parallel Processing (ICPP), June 25, 2019,

Lucas Esclapez

2019

L. Esclapez, V. Ricchiuti, J.B. Bell, M.S. Day, "A spectral deferred correction strategy for low Mach number flows subject to electric fields", August 10, 2019,

Doreen Fan

2019

D. Fan, A. Nonaka, A.S. Almgren, A. Harpole, M. Zingale, "MAESTROeX: A Massively Parallel Low Mach Number Astrophysical Solver", August 14, 2019,

Reinhard Gentz

2019

Thomas W. Edgar, Aditya Ashok, Garret E. Seppala, K.M. Arthur-Durrett, M. Engels, Reinhard Gentz, Sean Peisert, "An Automated Disruption-Tolerant Key Management Framework for Critical Systems", Journal of Information Warfare, October 8, 2019,

Mahdi Jamei, Raksha Ramakrishna, Teklemariam Tesfay, Reinhard Gentz, Ciaran Roberts, Anna Scaglione, Sean Peisert, "Phasor Measurement Units Optimal Placement and Performance Limits for Fault Localization", IEEE Journal on Selected Areas in Communications (J-SAC), Special Issue on Communications and Data Analytics in Smart Grid, October 2, 2019, doi: 10.1109/jsac.2019.2951971

Reinhard Gentz, Sean Peisert, Joshua Boverhof, Daniel Gunter, "SPARCS: Stream-Processing Architecture applied in Real-time Cyber-physical Security", Proceedings of the 15th IEEE International Conference on e-Science (eScience), San Diego, CA, IEEE, September 2019,

Reinhard Gentz, Héctor García Martin, Edward Baidoo, Sean Peisert, "Workflow Automation in Liquid Chromatography Mass Spectrometry", Proceedings of the 15th IEEE International Conference on e-Science (eScience), San Diego, CA, IEEE, September 2019,

Ciaran Roberts, Anna Scaglione, Mahdi Jamei, Reinhard Gentz, Sean Peisert, Emma M. Stewart, Chuck McParland, Alex McEachern, Daniel Arnold, "Learning Behavior of Distribution System Discrete Control Devices for Cyber-Physical Security", IEEE Transaction on Smart Grid, July 31, 2019, doi: 0.1109/TSG.2019.2936016

Melissa Stockman, Dipankar Dwivedi, Reinhard Gentz, Sean Peisert, "Detecting Programmable Logic Controller Code Using Machine Learning", International Journal of Critical Infrastructure Protection, July 2019, doi: 10.1016/j.ijcip.2019.100306

Devarshi Ghoshal

2019

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019,

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Pieter Ghysels

2019

Y. Liu, W. Sid-Lakhdar, E. Rebrova, P. Ghysels, X. Sherry Li, "A Hierarchical Low-Rank Decomposition Algorithm Based on Blocked Adaptive Cross Approximation Algorithms", arXiv e-prints, January 1, 2019,

Anna Giannakou

2019

Anna Giannakou, Dipankar Dwivedi, Sean Peisert, "A Machine Learning Approach for Packet Loss Prediction in ScienceFlows", Future Generation Computer Systems, July 2019, doi: 10.1016/j.future.2019.07.053

Dan Graves

2019

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

Giulia Guidi

2019

Marquita Ellis, Giulia Guidi, Aydın Buluç, Leonid Oliker, Katherine Yelick, "diBELLA: Distributed Long Read to Long Read Alignment", 48th International Conference on Parallel Processing (ICPP), June 25, 2019,

Daniel Gunter

2019

Reinhard Gentz, Sean Peisert, Joshua Boverhof, Daniel Gunter, "SPARCS: Stream-Processing Architecture applied in Real-time Cyber-physical Security", Proceedings of the 15th IEEE International Conference on e-Science (eScience), San Diego, CA, IEEE, September 2019,

Francois Hamon

2019

Francois P. Hamon, Martin Schreiber, Michael L. Minion, "Parallel-in-Time Multi-Level Integration of the Shallow-Water Equations on the Rotating Sphere", April 12, 2019,

Submitted to Journal of Computational Physics

Francois P. Hamon, Martin Schreiber, Michael L. Minion, "Multi-Level Spectral Deferred Corrections Scheme for the Shallow Water Equations on the Rotating Sphere", Journal of Computational Physics, January 1, 2019, 376:435-454,

Paul H. Hargrove

2019

Paul H. Hargrove, Dan Bonachea, "Efficient Active Message RMA in GASNet Using a Target-Side Reassembly Protocol (Extended Abstract)", IEEE/ACM Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), Lawrence Berkeley National Laboratory Technical Report, November 17, 2019, LBNL 2001‍238, doi: 10.25344/S4PC7M

GASNet is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models on future exascale machines. This paper investigates strategies for efficient implementation of GASNet’s “AM Long” API that couples an RMA (Remote Memory Access) transfer with an Active Message (AM) delivery.
We discuss several network-level protocols for AM Long and propose a new target-side reassembly protocol. We present a microbenchmark evaluation on the Cray XC Aries network hardware. The target-side reassembly protocol on this network improves AM Long end-to-end latency by up to 33%, and the effective bandwidth by up to 49%, while also enabling asynchronous source completion that drastically reduces injection overheads.
The improved AM Long implementation for Aries is available in GASNet-EX release v2019.9.0 and later.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001236, doi: 10.25344/S4V30R

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001191, doi: 10.25344/S4F301

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Scott B. Baden, Paul H. Hargrove, Dan Bonachea, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - GASNet-EX", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Valerie Hendrix

2019

C. Varadharajan, S. Cholia, C. Snavely, V. Hendrix, C. Procopiou, D. Swantek, W. J. Riley, and D. A. Agarwal, "Launching an accessible archive of environmental data", Eos, 100, January 8, 2019, doi: https://doi.org/10.1029/2019EO111263

Steven Hofmeyr

2019

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001236, doi: 10.25344/S4V30R

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001191, doi: 10.25344/S4F301

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Khaled Ibrahim

2019

Khaled Ibrahim, Samuel Williams, Leonid Oliker, "Performance Analysis of GPU Programming Models using the Roofline Scaling Trajectories", International Symposium on Benchmarking, Measuring and Optimizing (Bench), BEST PAPER AWARD, November 2019,

Samuel Williams, Charlene Yang, Khaled Ibrahim, Thorsten Kurth, Nan Ding, Jack Deslippe, Leonid Oliker, "Performance Analysis using the Roofline Model", SciDAC PI Meeting, July 2019,

Mathias Jacquelin

2019

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001236, doi: 10.25344/S4V30R

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001191, doi: 10.25344/S4F301

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Victor Yu, William Dawson, Alberto Garcia, Ville Havu, Ben Hourahine, William Huhn, Mathias Jacquelin, Weile Jia, Murat Keceli, Raul Laasner, others, Large-Scale Benchmark of Electronic Structure Solvers with the ELSI Infrastructure, Bulletin of the American Physical Society, 2019,

Hans Johansen

2019

Tuowen Zhao, Mary Hall, Samuel Williams, Hans Johansen, "Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs", Supercomputing (SC), November 2019,

D.F. Martin, H.S. Johansen, P.O. Schwartz, E.G. Ng, "Improved Discretization of Grounding Lines and Calving Fronts using an Embedded-Boundary Approach in BISICLES", European Geosciences Union General Assembly, April 10, 2019,

Amir Kamil

2019

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001236, doi: 10.25344/S4V30R

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001191, doi: 10.25344/S4F301

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Mariam Kiran

2019

Mariam Kiran, Anshuman Chhabra, "Understanding flows in high-speed scientific networks: A Netflow data study", Future Generation Computer Science, 2019,

Daniel Ladiges

2019

D. R. Ladiges, A. J. Nonaka, J. B. Bell, A. L. Garcia, "On the Suppression and Distortion of Non-Equilibrium Fluctuations by Transpiration", August 10, 2019, doi: 10.1063/1.5093922

Xiaoye Li

2019

Y. Liu, W. Sid-Lakhdar, E. Rebrova, P. Ghysels, X. Sherry Li, "A Hierarchical Low-Rank Decomposition Algorithm Based on Blocked Adaptive Cross Approximation Algorithms", arXiv e-prints, January 1, 2019,

Lin Lin

2019

Victor Yu, William Dawson, Alberto Garcia, Ville Havu, Ben Hourahine, William Huhn, Mathias Jacquelin, Weile Jia, Murat Keceli, Raul Laasner, others, Large-Scale Benchmark of Electronic Structure Solvers with the ELSI Infrastructure, Bulletin of the American Physical Society, 2019,

Yang Liu

2019

Y. Liu, W. Sid-Lakhdar, E. Rebrova, P. Ghysels, X. Sherry Li, "A Hierarchical Low-Rank Decomposition Algorithm Based on Blocked Adaptive Cross Approximation Algorithms", arXiv e-prints, January 1, 2019,

Boris Lo

2019

Boris Lo, Phillip Colella, "An Adaptive Local Discrete Convolution Method for the Numerical Solution of Maxwell's Equations", Communications in Applied Mathematics and Computational Science, April 26, 2019, 14:105-119, doi: DOI: 10.2140/camcos.2019.14.105

Zarija Lukic

2019

Timur Takhtaganov, Zarija Lukić, Juliane Mueller, Dmitriy Morozov, "Cosmic Inference: Constraining Parameters With Observations and Highly Limited Number of Simulations", Astrophysical Journal (in review), 2019,

J. Onorbe, F. B. Davies, Z. Lukić, J. F. Hennawi, D. Sorini, "Inhomogeneous Reionization Models in Cosmological Hydrodynamical Simulations", Monthly Notices of Royal Astronomical Society, 2019, 486:4075, doi: 10.1093/mnras/stz984

Vikram Khaire, Michael Walther, Joseph F. Hennawi, Jose Oñorbe, Zarija Lukić, Xavier J. Prochaska, Todd M. Tripp, Joseph N. Burchett, Christian Rodriguez, "The power spectrum of the Lyman-α Forest at z < 0.5", Monthly Notices of the Royal Astronomical Society, 2019, 486:769, doi: 10.1093/mnras/stz344

M. Mustafa, D. Bard, W. Bhimji, Z. Lukić, R. Al-Rfou, J. Kratochvil, "CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks", Computational Astrophysics and Cosmology, 2019, 6:1, doi: 10.1186/s40668-019-0029-9

M. Walther, J. Onorbe, J. F. Hennawi, Z. Lukić, "New Constraints on IGM Thermal Evolution from the Ly-alpha Forest Power Spectrum", The Astrophysical Journal, 2019, 872:13, doi: 10.3847/1538-4357/aafad1

Stefano Marchesini

2019

Stefano Marchesini, Anne Sakdinawat, "Shaping Coherent X-rays with Binary Optics", Optics Express Vol. 27, Issue 2, pp. 907-917 (2019), January 21, 2019,

Daniel F. Martin

2019

Mark Adams, Stephen Cornford, Daniel Martin, Peter McCorquodale, "Composite matrix construction for structured grid adaptive mesh refinement", Computer Physics Communications, November 2019, 244:35-39, doi: 10.1016/j.cpc.2019.07.006

D.F. Martin, H.S. Johansen, P.O. Schwartz, E.G. Ng, "Improved Discretization of Grounding Lines and Calving Fronts using an Embedded-Boundary Approach in BISICLES", European Geosciences Union General Assembly, April 10, 2019,

Daniel Martin, Modeling Antarctic Ice Sheet Dynamics using Adaptive Mesh Refinement, 2019 SIAM Conference on Computational Science and Engineering, February 26, 2019,

Screen Shot 2019 02 25 at 8.59.45 AM

Daniel F. Martin, Stephen L. Cornford, Antony J. Payne, "Millennial‐scale Vulnerability of the Antarctic Ice Sheet to Regional Ice Shelf Collapse", Geophysical Research Letters, January 9, 2019, doi: 10.1029/2018gl081229

Abstract: 

The Antarctic Ice Sheet (AIS) remains the largest uncertainty in projections of future sea level rise. A likely climate‐driven vulnerability of the AIS is thinning of floating ice shelves resulting from surface‐melt‐driven hydrofracture or incursion of relatively warm water into subshelf ocean cavities. The resulting melting, weakening, and potential ice‐shelf collapse reduces shelf buttressing effects. Upstream ice flow accelerates, causing thinning, grounding‐line retreat, and potential ice sheet collapse. While high‐resolution projections have been performed for localized Antarctic regions, full‐continent simulations have typically been limited to low‐resolution models. Here we quantify the vulnerability of the entire present‐day AIS to regional ice‐shelf collapse on millennial timescales treating relevant ice flow dynamics at the necessary ∼1km resolution. Collapse of any of the ice shelves dynamically connected to the West Antarctic Ice Sheet (WAIS) is sufficient to trigger ice sheet collapse in marine‐grounded portions of the WAIS. Vulnerability elsewhere appears limited to localized responses.

Plain Language Summary:

The biggest uncertainty in near‐future sea level rise (SLR) comes from the Antarctic Ice Sheet. Antarctic ice flows in relatively fast‐moving ice streams. At the ocean, ice flows into enormous floating ice shelves which push back on their feeder ice streams, buttressing them and slowing their flow. Melting and loss of ice shelves due to climate changes can result in faster‐flowing, thinning and retreating ice leading to accelerated rates of global sea level rise.To learn where Antarctica is vulnerable to ice‐shelf loss, we divided it into 14 sectors, applied extreme melting to each sector's floating ice shelves in turn, then ran our ice flow model 1000 years into the future for each case. We found three levels of vulnerability. The greatest vulnerability came from attacking any of the three ice shelves connected to West Antarctica, where much of the ice sits on bedrock lying below sea level. Those dramatic responses contributed around 2m of sea level rise. The second level came from four other sectors, each with a contribution between 0.5‐1m. The remaining sectors produced little to no contribution. We examined combinations of sectors, determining that sectors behave independently of each other for at least a century.

Peter McCorquodale

2019

Mark Adams, Stephen Cornford, Daniel Martin, Peter McCorquodale, "Composite matrix construction for structured grid adaptive mesh refinement", Computer Physics Communications, November 2019, 244:35-39, doi: 10.1016/j.cpc.2019.07.006

Charles McParland

2019

Ciaran Roberts, Anna Scaglione, Mahdi Jamei, Reinhard Gentz, Sean Peisert, Emma M. Stewart, Chuck McParland, Alex McEachern, Daniel Arnold, "Learning Behavior of Distribution System Discrete Control Devices for Cyber-Physical Security", IEEE Transaction on Smart Grid, July 31, 2019, doi: 0.1109/TSG.2019.2936016

George Michelogiannakis

2019

George Michelogiannakis, Yiwen Shen, Min Yeh Teh, Xian Meng, Benjamin Aivazi, Taylor Groves, John Shalf, Madeleine Glick, Manya Ghobadi, Larry Dennison, Keren Bergman, "Bandwidth Steering in HPC Using Silicon Nanophotonics", SC19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2019,

George Michelogiannakis, Bandwidth Steering in HPC Using Silicon Nanophotonics, SC19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, November 20, 2019,

Pooria Mohammadiyaghni, George Michelogiannakis, Paul V. Gratz, "SpecLock: Speculative Lock Forwarding", International Conference on Computer Design (ICCD), November 2019,

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "Extending classical processors to support future large scale quantum accelerators", Proceedings of the 16th ACM International Conference on Computing Frontiers Pages, April 2019,

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "TIGER: topology-aware task assignment approach using ising machines", Proceedings of the 16th ACM International Conference on Computing Frontiers, April 2019,

George Michelogiannakis, Jeremiah Wilke, Min Yee Teh, Madeleine Glick, John Shalf, Keren Bergman, "Challenges and opportunities in system-level evaluation of photonics", Proceedings Volume 10946, Metro and Data Center Optical Networks and Short-Reach Links II, February 2019, doi: https://doi.org/10.1117/12.2510443

George Michelogiannakis, Computation and Communication in a Post Moore’s Law Era, Post Exascale workshop part of HiPEAC conference, January 2019,

D Vasudevan, G Michclogiannakis, D Donofrio, J Shalf, "PARADISE - Post-Moore Architecture and Accelerator Design Space Exploration Using Device Level Simulation and Experiments", 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE, January 2019, doi: 10.1109/ispass.2019.00022

S Werner, P Fotouhi, X Xiao, M Fariborz, SJB Yoo, G Michelogiannakis, D Vasudevan, "3D photonics as enabling technology for deep 3D DRAM stacking", Proceedings of the International Symposium on Memory Systems - MEMSYS 19, ACM Press, January 2019, doi: 10.1145/3357526.3357559

W Cui, G Tzimpragos, Y Tao, J Mcmahan, D Dangwal, N Tsiskaridze, G Michelogiannakis, DP Vasudevan, T Sherwood, "Language Support for Navigating Architecture Design in Closed Form", ACM Journal on Emerging Technologies in Computing Systems, January 2019, 16:1--28, doi: 10.1145/3360047

Michael Minion

2019

M. Zingale, M.P. Katz, J.B. Bell, M.L. Minion, A.J. Nonaka, W. Zhang, "Improved Coupling of Hydrodynamics and Nuclear Reactions via Spectral Deferred Corrections", August 14, 2019,

Francois P. Hamon, Martin Schreiber, Michael L. Minion, "Parallel-in-Time Multi-Level Integration of the Shallow-Water Equations on the Rotating Sphere", April 12, 2019,

Submitted to Journal of Computational Physics

Sebastian Götschel , Michael Minion, "An Efficient Parallel-in-Time Method for Optimization with Parabolic PDEs", SIAM Journal on Scientific Computing, January 21, 2019,

In submission

M. Emmett, E. Motheau, W. Zhang, M. Minion, J. B. Bell, "A Fourth-Order Adaptive Mesh Refinement Algorithm for the Multicomponent, Reacting Compressible Navier-Stokes Equations", Combustion Theory and Modeling, 2019,

Francois P. Hamon, Martin Schreiber, Michael L. Minion, "Multi-Level Spectral Deferred Corrections Scheme for the Shallow Water Equations on the Rotating Sphere", Journal of Computational Physics, January 1, 2019, 376:435-454,

Emmanuel Motheau

2019

M. Emmett, E. Motheau, W. Zhang, M. Minion, J. B. Bell, "A Fourth-Order Adaptive Mesh Refinement Algorithm for the Multicomponent, Reacting Compressible Navier-Stokes Equations", Combustion Theory and Modeling, 2019,

Juliane Mueller

2019

O Karslıoğlu, M Gehlmann, J Müller, S Nemšák, JA Sethian, A Kaduwela, H Bluhm, C Fadley, "An Efficient Algorithm for Automatic Structure Optimization in X-ray Standing-Wave Experiments", Journal of Electron Spectroscopy and Related Phenomena, January 1, 2019,

J Muller, M Day, "Surrogate Optimization of Computationally Expensive Black-box Problems with Hidden Constraints", INFORMS Journal on Computing, 2019,

W. Langhans, J. Mueller, W.D. Collins, "Optimization of the Eddy-Diffusivity/Mass-Flux shallow cumulus and boundary-layer parametrization using surrogate models", Journal of Advances in Modeling Earth Systems (JAMES), 2019,

Andrew Myers

2019

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

Esmond G. Ng

2019

D.F. Martin, H.S. Johansen, P.O. Schwartz, E.G. Ng, "Improved Discretization of Grounding Lines and Calving Fronts using an Embedded-Boundary Approach in BISICLES", European Geosciences Union General Assembly, April 10, 2019,

Tan Thanh Nhat Nguyen

2019

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

Andy Nonaka

2019

D. Fan, A. Nonaka, A.S. Almgren, A. Harpole, M. Zingale, "MAESTROeX: A Massively Parallel Low Mach Number Astrophysical Solver", August 14, 2019,

M. Zingale, M.P. Katz, J.B. Bell, M.L. Minion, A.J. Nonaka, W. Zhang, "Improved Coupling of Hydrodynamics and Nuclear Reactions via Spectral Deferred Corrections", August 14, 2019,

D. R. Ladiges, A. J. Nonaka, J. B. Bell, A. L. Garcia, "On the Suppression and Distortion of Non-Equilibrium Fluctuations by Transpiration", August 10, 2019, doi: 10.1063/1.5093922

A. Donev, A. J. Nonaka, C. Kim, A. L. Garcia, J. B. Bell, "Fluctuating hydrodynamics of electrolytes at electroneutral scales", August 10, 2019,

M. Zingale, K. Eiden, Y. Cavecchi, A. Harpole, J. B. Bell, M. Chang, I. Hawke, M. P. Katz, C.M. Malone, A. J. Nonaka, D. E. Willcox, W. Zhang, "Toward resolved simulations of burning fronts in thermonuclear X-ray bursts", Journal of Physics: Conference Series, 2019, 1225,

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

Aleksandar Donev, Alejandro L. Garcia, Jean-Philippe Péraud, Andrew J. Nonaka, John B. Bell, "Fluctuating Hydrodynamics and Debye-Hückel-Onsager Theory for Electrolytes", Current Opinion in Electrochemistry, 2019, 13:1 - 10, doi: https://doi.org/10.1016/j.coelec.2018.09.004

Peter Nugent

2019

M. M. Phillips, C. Contreras, E. Y. Hsiao, N., C. R. Burns, M. Stritzinger, C. Ashall, W. L., P. Hoeflich, S. E. Persson, A. L., N. B. Suntzeff, S. A. Uddin, J. Anais, E., L. Busta, A. Campillay, S. Castell\ on, C., T. Diamond, C. Gall, C. Gonzalez, S., K. Krisciunas, M. Roth, J. Ser\ on, F., S. Torres, J. P. Anderson, C. Baltay, G., L. Galbany, A. Goobar, E. Hadjiyska, M., M. Kasliwal, C. Lidman, P. E. Nugent, S., D. Rabinowitz, S. D. Ryder, B. P. Schmidt, B. J. Shappee, E. S. Walker, "Carnegie Supernova Project-II: Extending the Near-infrared Hubble Diagram for Type Ia Supernovae to z\nbsp\sim\nbsp0.1", Publications of the ASP, 2019, 131:014001, doi: 10.1088/1538-3873/aae8bd

Leonid Oliker

2019

Khaled Ibrahim, Samuel Williams, Leonid Oliker, "Performance Analysis of GPU Programming Models using the Roofline Scaling Trajectories", International Symposium on Benchmarking, Measuring and Optimizing (Bench), BEST PAPER AWARD, November 2019,

Samuel Williams, Charlene Yang, Khaled Ibrahim, Thorsten Kurth, Nan Ding, Jack Deslippe, Leonid Oliker, "Performance Analysis using the Roofline Model", SciDAC PI Meeting, July 2019,

Marquita Ellis, Giulia Guidi, Aydın Buluç, Leonid Oliker, Katherine Yelick, "diBELLA: Distributed Long Read to Long Read Alignment", 48th International Conference on Parallel Processing (ICPP), June 25, 2019,

Gilberto Pastorello

2019

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019,

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Sean Peisert

2020

Ross Gegan, Christina Mao, Dipak Ghosal, Matt Bishop, Sean Peisert, "Anomaly Detection for Science DMZ Using System Performance Data", Proceedings of the 2020 IEEE International Conference on Computing, Networking and Communications (ICNC 2020), Big Island, HI, February 2020,

2019

Amir Teshome Wonjiga, Louis Rilling, Christine Morin, Sean Peisert, "Blockchain as a Trusted Component in Cloud SLA Verification", Proceedings of the International Workshop on Cloud, IoT and Fog Security (CIFS), co-located with the 12th IEEE/ACM International Conference on Utility and Cloud Computing (UCC), Auckland, New Zealand, December 2019,

Thomas W. Edgar, Aditya Ashok, Garret E. Seppala, K.M. Arthur-Durrett, M. Engels, Reinhard Gentz, Sean Peisert, "An Automated Disruption-Tolerant Key Management Framework for Critical Systems", Journal of Information Warfare, October 8, 2019,

Mahdi Jamei, Raksha Ramakrishna, Teklemariam Tesfay, Reinhard Gentz, Ciaran Roberts, Anna Scaglione, Sean Peisert, "Phasor Measurement Units Optimal Placement and Performance Limits for Fault Localization", IEEE Journal on Selected Areas in Communications (J-SAC), Special Issue on Communications and Data Analytics in Smart Grid, October 2, 2019, doi: 10.1109/jsac.2019.2951971

Reinhard Gentz, Sean Peisert, Joshua Boverhof, Daniel Gunter, "SPARCS: Stream-Processing Architecture applied in Real-time Cyber-physical Security", Proceedings of the 15th IEEE International Conference on e-Science (eScience), San Diego, CA, IEEE, September 2019,

Reinhard Gentz, Héctor García Martin, Edward Baidoo, Sean Peisert, "Workflow Automation in Liquid Chromatography Mass Spectrometry", Proceedings of the 15th IEEE International Conference on e-Science (eScience), San Diego, CA, IEEE, September 2019,

Ciaran Roberts, Anna Scaglione, Mahdi Jamei, Reinhard Gentz, Sean Peisert, Emma M. Stewart, Chuck McParland, Alex McEachern, Daniel Arnold, "Learning Behavior of Distribution System Discrete Control Devices for Cyber-Physical Security", IEEE Transaction on Smart Grid, July 31, 2019, doi: 0.1109/TSG.2019.2936016

Andrew Adams, Kay Avila, Jim Basney, Dana Brunson, Robert Cowles, Jeannette Dopheide, Terry Fleury, Elisa Heymann, Florence Hudson, Craig Jackson, Ryan Kiser, Mark Krenz, Jim Marsteller, Barton P. Miller, Sean Peisert, Scott Russell, Susan Sons, Von Welch, John Zage, "Trusted CI Experiences in Cybersecurity and Service to Open Science", Proceedings of the Conference on Practice and Experience in Advanced Research Computing (PEARC), ACM, July 2019,

Anna Giannakou, Dipankar Dwivedi, Sean Peisert, "A Machine Learning Approach for Packet Loss Prediction in ScienceFlows", Future Generation Computer Systems, July 2019, doi: 10.1016/j.future.2019.07.053

Melissa Stockman, Dipankar Dwivedi, Reinhard Gentz, Sean Peisert, "Detecting Programmable Logic Controller Code Using Machine Learning", International Journal of Critical Infrastructure Protection, July 2019, doi: 10.1016/j.ijcip.2019.100306

Sean Peisert, Brooks Evans, Michael Liang, Barclay Osborn, David Rusting, David Thurston, Security Without Moats and Walls: Zero-Trust Networking for Enhancing Security in R&E Environments, CENIC Annual Conference, March 19, 2019,

Sean Peisert, Experiences in Building a Mission-Driven Security R&D Program for Science and Energy, Computer Science Colloquium Seminar, University of California, Davis, February 7, 2019,

Sean Peisert, Daniel Arnold, Using Physics to Improve Cybersecurity for the Distribution Grid and Distributed Energy Resources, Naval Postgraduate School, February 5, 2019,

Sean Peisert, Building a Mission-Driven, Applied Cybersecurity R&D Program from Scratch, VISA Research, January 23, 2019,

Doru Thom Popovici

2019

Doru Thom Popovici, Devangi N. Parikh, Daniele G. Spampinato, Tze Meng Low, "Exploiting Symmetries of Small Prime-Sized DFTs", PPAM 2019, 2019,

Elliott Binder, Tze Meng Low, Doru Thom Popovici, "Portable GPU Framework for SNP Comparisons", HiCOMB 2019, 2019,

Doru Thom Popovici, Martin D. Schatz, Franz Franchetti, Tze Meng Low, "A Flexible Framework for Parallel Multi-Dimensional DFTs", April 23, 2019,

Prabhat

2019

Babak Behzad, Suren Byna, Prabhat, and Marc Snir, "Optimizing I/O Performance of HPC Applications with Autotuning", ACM Transactions on Parallel Computing (TOPC), February 28, 2019,

Lavanya Ramakrishnan

2019

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019,

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Hannah Ross

2019

Hannah E. Ross, Keri L. Dixon, Raghunath Ghara, Ilian T. Iliev, Garrelt Mellema,, "Evaluating the QSO contribution to the 21-cm signal from the Cosmic Dawn", Monthly Notices of the Royal Astronomical Society, July 2019, 487:1101-1119, doi: 10.1093/mnras/stz1220

Catherine A Watkinson, Sambit K. Giri, Hannah E. Ross, Keri L. Dixon, Ilian T. Iliev, Garrelt Mellema, Jonathan R. Pritchard, "The 21-cm bispectrum as a probe of non-Gaussianities due to X-ray heating", Monthly Notices of the Royal Astronomical Society, January 2019, 482:2653-2669, doi: 10.1093/mnras/sty2740

Oliver Rübel

2019

Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas, "Comparison of Array Management Library Performance - A Neuroscience Use Case", SC19 Poster, November 20, 2019,

Oliver Rübel, Andrew Tritt, Benjamin Dichter, Thomas Braun, Nicholas Cain, Nathan Clack, Thomas J. Davidson, Max Dougherty, Jean-Christophe Fillion-Robin, Nile Graddis, Michael Grauer, Justin T. Kiggins, Lawrence Niu, Doruk Ozturk, William Schroeder, Ivan Soltesz, Friedrich T. Sommer, Karel Svoboda, Lydia Ng, Loren M. Frank, Kristofer Bouchard, "NWB:N 2.0: An Accessible Data Standard for Neurophysiology", bioRxiv, January 17, 2019, doi: https://doi.org/10.1101/523035

Anna Scaglione

2019

Mahdi Jamei, Raksha Ramakrishna, Teklemariam Tesfay, Reinhard Gentz, Ciaran Roberts, Anna Scaglione, Sean Peisert, "Phasor Measurement Units Optimal Placement and Performance Limits for Fault Localization", IEEE Journal on Selected Areas in Communications (J-SAC), Special Issue on Communications and Data Analytics in Smart Grid, October 2, 2019, doi: 10.1109/jsac.2019.2951971

Peter O. Schwartz

2019

D.F. Martin, H.S. Johansen, P.O. Schwartz, E.G. Ng, "Improved Discretization of Grounding Lines and Calving Fronts using an Embedded-Boundary Approach in BISICLES", European Geosciences Union General Assembly, April 10, 2019,

Oguz Selvitopi

2019

R. Oguz Selvitopi, Gunduz Vehbi Demirci, Ata Turk, Cevdet Aykanat, "Locality-aware and load-balanced static task scheduling for MapReduce", Future Generation Computer Systems (FGCS), January 2019, 90:49-61, doi: https://doi.org/10.1016/j.future.2018.06.035

John M. Shalf

2019

George Michelogiannakis, Yiwen Shen, Min Yeh Teh, Xian Meng, Benjamin Aivazi, Taylor Groves, John Shalf, Madeleine Glick, Manya Ghobadi, Larry Dennison, Keren Bergman, "Bandwidth Steering in HPC Using Silicon Nanophotonics", SC19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2019,

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "Extending classical processors to support future large scale quantum accelerators", Proceedings of the 16th ACM International Conference on Computing Frontiers Pages, April 2019,

Anastasiia Butko, George Michelogiannakis, David Donofrio, John Shalf, "TIGER: topology-aware task assignment approach using ising machines", Proceedings of the 16th ACM International Conference on Computing Frontiers, April 2019,

George Michelogiannakis, Jeremiah Wilke, Min Yee Teh, Madeleine Glick, John Shalf, Keren Bergman, "Challenges and opportunities in system-level evaluation of photonics", Proceedings Volume 10946, Metro and Data Center Optical Networks and Short-Reach Links II, February 2019, doi: https://doi.org/10.1117/12.2510443

D Vasudevan, G Michclogiannakis, D Donofrio, J Shalf, "PARADISE - Post-Moore Architecture and Accelerator Design Space Exploration Using Device Level Simulation and Experiments", 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE, January 2019, doi: 10.1109/ispass.2019.00022

Arie Shoshani

2019

Beytullah Yildiz, Kesheng Wu, Suren Byna, Arie Shoshanii,, "Parallel membership queries on very large scientific data sets using bitmap indexes", Concurrency and Computation: Practice and Experience, January 28, 2019, 31, doi: https://doi.org/10.1002/cpe.5157

Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating‐point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word‐Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.

Wissam M. Sid-Lakhdar

2019

Y. Liu, W. Sid-Lakhdar, E. Rebrova, P. Ghysels, X. Sherry Li, "A Hierarchical Low-Rank Decomposition Algorithm Based on Blocked Adaptive Cross Approximation Algorithms", arXiv e-prints, January 1, 2019,

Alex Sim

2020

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, "Life Course as a Contextual System to Investigate the Effects of Life Events, Gender, and Generation on Travel Mode Use", Transportation Research Board (TRB) 99th Annual Meeting, 2020,

2019

D. Ghosal, S. Shukla, A. Sim, A. V. Thakur, K. Wu, "A Reinforcement Learning Based Network Scheduler For Deadline-Driven Data Transfers", IEEE Global Communications Conference (GLOBECOM 2019), 2019,

Q. Kang, A. Agrawal, A. Choudhary, A. Sim, K. Wu, R. Kettimuthu, P. Beckman, Z. Liu, W-K Liao, "Spatiotemporal Real-Time Anomaly Detection for Supercomputing Systems", Workshop on Big Data Predictive Maintenance using Artificial Intelligence, in conjunction with IEEE International Conference on Big Data (Big Data), 2019,

A. Lazar, A. Ballow, L. Jin, C. A. Spurlock, A. Sim, K. Wu, "Machine Learning for Prediction of Mid to LongTerm Habitual Transportation Mode Use", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019,

B. Cetin, A. Lazar, J. Kim, A. Sim, K. Wu, "Federated Wireless Network Intrusion Detection", IEEE International Conference on Big Data (Big Data), 2019,

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, Life course as a contextual system to investigate the effects of life events, gender and generation on travel mode usage, The Behavior, Energy & Climate Change Conference (BECC), 2019,

J. Balcas, H. Newman, M. Spiropulu, X. Yang, T. Lehman, I. Monga, C. Guok, J. MacAuley, A. Sim, P. Demar, "SDN for End-to-End Networking at Exascale", the 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP2019), 2019,

Alexandra Ballow, Alina Lazar (Advisor), Alex Sim (Advisor), Kesheng Wu (Advisor), "Handling Missing Values in Joint Sequence Analysis", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2019), ACM Student Research Competition (SRC), First place winner, 2019,

J. Choi, A. Sim, Data reduction methods, systems and devices, U.S. Patent No. 10,366,078, 2019,

U.S. Patent No. 10,366,078, “DATA REDUCTION METHODS, SYSTEMS, AND DEVICES”, LBNL IB2013-133.

H. Sung, J. Bang, A. Sim, K. Wu, H. Eom, "Understanding Parallel I/O Performance Trends Under Various HPC Configurations", the 2nd International Workshop on Systems and Network Telemetry and Analytics (SNTA 2019), in conjunction with ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2019), 2019, doi: 10.1145/3322798.3329258

M. Jin, Y. Homma, A. Sim, W. Kroeger, K. Wu, "Performance Prediction for Data Transfers in LCLS Workflow", the 2nd International Workshop on Systems and Network Telemetry and Analytics (SNTA 2019), in conjunction with ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2019), 2019, doi: 10.1145/3322798.3329254

O. Del Guercio, R. Orozco, A. Sim, K. Wu, "Similarity-based Compression with Multidimensional Pattern Matching", the 2nd International Workshop on Systems and Network Telemetry and Analytics (SNTA 2019), in conjunction with ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2019), 2019, doi: 10.1145/3322798.3329252

A. Syal, A. Lazar, J. Kim, K. Wu, A. Sim, "Automatic Detection of Network Traffic Anomalies and Changes", the 2nd International Workshop on Systems and Network Telemetry and Analytics (SNTA 2019), in conjunction with ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2019), 2019, doi: 10.1145/3322798.3329255

S. Kim, A. Sim, K. Wu, S. Byna, T. Wang, Y. Son, H. Eom, "DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File System", 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019), 2019, doi: 10.1109/CCGRID.2019.00049

US Patent app no. 20190138371, “Methods, systems, and devices for accurate signal timing of power component events”

J. Kim, A. Sim, B. Tierney, S. Suh, I. Kim, "Multivariate Network Traffic Analysis using Clustered Patterns", Journal of Computing, April 2019, 101(4):339-361, doi: 10.1007/s00607-018-0619-4

J. Kim, A. Sim, "A new approach to multivariate network traffic analysis", Journal of Computer Science and Technology, 2019, 34(2):388–402, doi: 10.1007/s11390-019-1915-y

Olivia Del Guercio, Rafael Orozco, Alex Sim, Kesheng Wu, "Multidimensional Compression with Pattern Matching", Data Compression Conference (DCC), 2019, doi: 10.1109/DCC.2019.00079

Alexandra Ballow, Alina Lazar, Alex Sim, Kesheng Wu, "Joint Sequence Analysis Challenges: How to Handle Missing Values and Mixed Variable Types", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Tyler Leibengood, Alina Lazar, Alex Sim, Kesheng Wu, "Network Traffic Performance Prediction with Multivariate Clusters in Time Windows", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Alina Lazar, Ling Jin, C Anna Spurlock, Kesheng Wu, Alex Sim, Annika Todd, "Evaluating the Effects of Missing Values and Mixed Data Types on Social Sequence Clustering Using t-SNE Visualization", Journal of Data and Information Quality (JDIQ), 2019, 11:7,

Sambit Shukla, Dipak Ghosal, Kesheng Wu, Alex Sim, Matthew Farrens, "Co-optimizing Latency and Energy for IoT services using HMP servers in Fog Clusters", 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC), 2019, 121--128, doi: 10.1109/FMEC.2019.8795353

Horst D. Simon

2019

Jung Heon Song, Marcos L\ opez de Prado, Horst D. Simon, Kesheng Wu, Extracting Signals from High-Frequency Trading with Digital Signal Processing Tools, The Journal of Financial Data Science, Pages: 124--138 2019, doi: 10.3905/jfds.2019.1.4.124

Houjun Tang

2019

Richard Warren, Jerome Soumagne, Jingqing Mu, Houjun Tang, Suren Byna, Bin Dong, Quincey Koziol, "Analysis in the Data Path of an Object-centric Data Management System", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Houjun Tang, Suren Byna, Stephen Bailey, Zarija Lukic, Jialin Liu, Quincey Koziol, Bin Dong, "Tuning Object-centric Data Management Systems for Large Scale Scientific Applications", 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2019), December 18, 2019,

Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen, "MIQS: Metadata Indexing and erying Service for Self-Describing File Formats", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), November 19, 2019,

Houjun Tang, Quincey Koziol, Suren Byna, John Mainzer, Tonglin Li, "Enabling Transparent Asynchronous I/O using Background Threads", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW 2019), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00006

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-aware Programming Model for Composing Array Data Analysis", ISC 2019 ((Acceptance rate:24%),), June 16, 2019,

Tonglin Li, Quincey Koziol, Houjun Tang, Jialin Liu, Suren Byna, "I/O Performance Analysis of Science Applications Using HDF5 File-level Provenance", Cray User Group (CUG) 2019, May 10, 2019,

Jingqing Mu, Jerome Soumagne, Suren Byna, Quincey Koziol, Houjun Tang, Richard Warren, "Interfacing HDF5 with A Scalable Object-centric Storage System on Hierarchical Storage", Cray User Group (CUG) 2019, May 7, 2019,

Yu-Hang Tang

2019

Yu-Hang Tang, Wibe A. de Jong, "Prediction of atomization energy using graph kernel and active learning", The Journal of Chemical Physics, January 25, 2019, 150:044107, doi: 10.1063/1.5078640

David Trebotich

2019

11242 2019 1266 Fig3 HTML

Sergi Molins, David Trebotich, Bhavna Arora, Carl Steefel, Hang Deng, "Multi-scale Model of Reactive Transport in Fractured Media: Diffusion Limitations on Rates", Transport in Porous Media, March 20, 2019, 128:701-721, doi: 10.1007/s11242-019-01266-2

Andrew Tritt

2019

Oliver Rübel, Andrew Tritt, Benjamin Dichter, Thomas Braun, Nicholas Cain, Nathan Clack, Thomas J. Davidson, Max Dougherty, Jean-Christophe Fillion-Robin, Nile Graddis, Michael Grauer, Justin T. Kiggins, Lawrence Niu, Doruk Ozturk, William Schroeder, Ivan Soltesz, Friedrich T. Sommer, Karel Svoboda, Lydia Ng, Loren M. Frank, Kristofer Bouchard, "NWB:N 2.0: An Accessible Data Standard for Neurophysiology", bioRxiv, January 17, 2019, doi: https://doi.org/10.1101/523035

Brian Van Straalen

2019

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001236, doi: 10.25344/S4V30R

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001191, doi: 10.25344/S4F301

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Dilip Vasudevan

2019

D Vasudevan, G Tzimpragos, T Sherwood, A Madhavan, D Strukov, "Boosted Race Trees for Low Energy Classification", ("Best Paper Award"), ASPLOS 2019, April 2019, doi: 10.1145/3297858.3304036

D Vasudevan, G Michclogiannakis, D Donofrio, J Shalf, "PARADISE - Post-Moore Architecture and Accelerator Design Space Exploration Using Device Level Simulation and Experiments", 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE, January 2019, doi: 10.1109/ispass.2019.00022

S Werner, P Fotouhi, X Xiao, M Fariborz, SJB Yoo, G Michelogiannakis, D Vasudevan, "3D photonics as enabling technology for deep 3D DRAM stacking", Proceedings of the International Symposium on Memory Systems - MEMSYS 19, ACM Press, January 2019, doi: 10.1145/3357526.3357559

W Cui, G Tzimpragos, Y Tao, J Mcmahan, D Dangwal, N Tsiskaridze, G Michelogiannakis, DP Vasudevan, T Sherwood, "Language Support for Navigating Architecture Design in Closed Form", ACM Journal on Emerging Technologies in Computing Systems, January 2019, 16:1--28, doi: 10.1145/3360047

Teng Wang

2019

S. Kim, A. Sim, K. Wu, S. Byna, T. Wang, Y. Son, H. Eom, "DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File System", 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019), 2019, doi: 10.1109/CCGRID.2019.00049

Teng Wang, Suren Byna, Glenn Lockwood, Philip Carns, Shane Snyder, Sunggon Kim, Nicholas Wright, "A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks", IEEE/ACM CCGrid 2019, May 14, 2019,

Donald Willcox

2019

M. Zingale, K. Eiden, Y. Cavecchi, A. Harpole, J. B. Bell, M. Chang, I. Hawke, M. P. Katz, C.M. Malone, A. J. Nonaka, D. E. Willcox, W. Zhang, "Toward resolved simulations of burning fronts in thermonuclear X-ray bursts", Journal of Physics: Conference Series, 2019, 1225,

Samuel W. Williams

2019

Tuowen Zhao, Mary Hall, Samuel Williams, Hans Johansen, "Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs", Supercomputing (SC), November 2019,

Nan Ding, Samuel Williams, "An Instruction Roofline Model for GPUs", Performance Modeling, Benchmarking, and Simulation (PMBS), BEST PAPER AWARD, November 18, 2019,

Khaled Ibrahim, Samuel Williams, Leonid Oliker, "Performance Analysis of GPU Programming Models using the Roofline Scaling Trajectories", International Symposium on Benchmarking, Measuring and Optimizing (Bench), BEST PAPER AWARD, November 2019,

Charlene Yang, Thorsten Kurth, Samuel Williams, "Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC-9 Perlmutter system", Concurrency and Computation: Practice and Experience (CCPE), August 2019, doi: 10.1002/cpe.5547

Nan Ding, Samuel Williams, Sherry Li, Yang Liu, "Leveraging One-Sided Communication for Sparse Triangular Solvers", SciDAC19, July 18, 2019,

Samuel Williams, Charlene Yang, Khaled Ibrahim, Thorsten Kurth, Nan Ding, Jack Deslippe, Leonid Oliker, "Performance Analysis using the Roofline Model", SciDAC PI Meeting, July 2019,

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

Charlene Yang, Thorsten Kurth, Samuel Williams, "Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System", Cray User Group (CUG), May 2019,

Wenjing Ma, Yulong Ao, Chao Yang, Samuel Williams, "Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight", Cluster Computing, May 2019, doi: 10.1007/s10586-019-02938-w

Charlene Yang, Samuel Williams, Performance Analysis of GPU-Accelerated Applications using the Roofline Model, GPU Technology Conference (GTC), March 2019,

Samuel Williams, Performance Modeling and Analysis, CS267 Lecture, University of California at Berkeley, February 14, 2019,

Samuel Williams, Introduction to the Roofline Model, Roofline Tutorial, ECP Annual Meeting, January 2019,

Samuel Williams, Roofline on CPU-based Systems, Roofline Tutorial, ECP Annual Meeting, January 2019,

Nicholas J. Wright

2019

Glenn K. Lockwood, Shane Snyder, Suren Byna, Philip Carns, Nicholas J. Wright, "Understanding Data Motion in the Modern HPC Data Center", 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), November 19, 2019, doi: DOI 10.1109/PDSW49588.2019.00012

Teng Wang, Suren Byna, Glenn Lockwood, Philip Carns, Shane Snyder, Sunggon Kim, Nicholas Wright, "A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks", IEEE/ACM CCGrid 2019, May 14, 2019,

Kesheng Wu

2020

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, "Life Course as a Contextual System to Investigate the Effects of Life Events, Gender, and Generation on Travel Mode Use", Transportation Research Board (TRB) 99th Annual Meeting, 2020,

2019

D. Ghosal, S. Shukla, A. Sim, A. V. Thakur, K. Wu, "A Reinforcement Learning Based Network Scheduler For Deadline-Driven Data Transfers", IEEE Global Communications Conference (GLOBECOM 2019), 2019,

Q. Kang, A. Agrawal, A. Choudhary, A. Sim, K. Wu, R. Kettimuthu, P. Beckman, Z. Liu, W-K Liao, "Spatiotemporal Real-Time Anomaly Detection for Supercomputing Systems", Workshop on Big Data Predictive Maintenance using Artificial Intelligence, in conjunction with IEEE International Conference on Big Data (Big Data), 2019,

A. Lazar, A. Ballow, L. Jin, C. A. Spurlock, A. Sim, K. Wu, "Machine Learning for Prediction of Mid to LongTerm Habitual Transportation Mode Use", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019,

B. Cetin, A. Lazar, J. Kim, A. Sim, K. Wu, "Federated Wireless Network Intrusion Detection", IEEE International Conference on Big Data (Big Data), 2019,

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD), in conjunction with the IEEE International Conference on Big Data (Big Data), 2019,

L. Jin, A. Lazar, J. Sears, A. Todd, A. Sim, K. Wu, C. A. Spurlock, Life course as a contextual system to investigate the effects of life events, gender and generation on travel mode usage, The Behavior, Energy & Climate Change Conference (BECC), 2019,

P. Linton, W. Melodia, A. Lazar, D. Agarwal, L. Bianchi, D. Ghoshal, K. Wu, G. Pastorello, L. Ramakrishnan, "Identifying Time Series Similarity in Large-Scale Earth System Datasets", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19), 2019,

Alexandra Ballow, Alina Lazar (Advisor), Alex Sim (Advisor), Kesheng Wu (Advisor), "Handling Missing Values in Joint Sequence Analysis", ACM Richard Tapia Celebration of Diversity in Computing (TAPIA 2019), ACM Student Research Competition (SRC), First place winner, 2019,

Jung Heon Song, Marcos L\ opez de Prado, Horst D. Simon, Kesheng Wu, Extracting Signals from High-Frequency Trading with Digital Signal Processing Tools, The Journal of Financial Data Science, Pages: 124--138 2019, doi: 10.3905/jfds.2019.1.4.124

Bin Dong, Patrick Frank Heiner Kilian, Xiaocan Li, Fan Guo, Suren Byna and Kesheng Wu, "Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study", SSDBM 2019, July 23, 2019,

Jongbeen Han, Heemin Kim, Hyeonsang Eom, Jonathan Coignard, Kesheng Wu, Yongseok Son, "Enabling SQL-Query Processing for Ethereum-based Blockchain Systems", WIMS2019, New York, NY, USA, ACM, 2019, 9:1--9:7, doi: 10.1145/3326467.3326479

H. Sung, J. Bang, A. Sim, K. Wu, H. Eom, "Understanding Parallel I/O Performance Trends Under Various HPC Configurations", the 2nd International Workshop on Systems and Network Telemetry and Analytics (SNTA 2019), in conjunction with ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2019), 2019, doi: 10.1145/3322798.3329258

M. Jin, Y. Homma, A. Sim, W. Kroeger, K. Wu, "Performance Prediction for Data Transfers in LCLS Workflow", the 2nd International Workshop on Systems and Network Telemetry and Analytics (SNTA 2019), in conjunction with ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2019), 2019, doi: 10.1145/3322798.3329254

O. Del Guercio, R. Orozco, A. Sim, K. Wu, "Similarity-based Compression with Multidimensional Pattern Matching", the 2nd International Workshop on Systems and Network Telemetry and Analytics (SNTA 2019), in conjunction with ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2019), 2019, doi: 10.1145/3322798.3329252

A. Syal, A. Lazar, J. Kim, K. Wu, A. Sim, "Automatic Detection of Network Traffic Anomalies and Changes", the 2nd International Workshop on Systems and Network Telemetry and Analytics (SNTA 2019), in conjunction with ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2019), 2019, doi: 10.1145/3322798.3329255

Bin Dong, Kesheng Wu, Suren Byna, Houjun Tang, "SLOPE: Structural Locality-aware Programming Model for Composing Array Data Analysis", ISC 2019 ((Acceptance rate:24%),), June 16, 2019,

S. Kim, A. Sim, K. Wu, S. Byna, T. Wang, Y. Son, H. Eom, "DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File System", 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019), 2019, doi: 10.1109/CCGRID.2019.00049

US Patent app no. 20190138371, “Methods, systems, and devices for accurate signal timing of power component events”

Olivia Del Guercio, Rafael Orozco, Alex Sim, Kesheng Wu, "Multidimensional Compression with Pattern Matching", Data Compression Conference (DCC), 2019, doi: 10.1109/DCC.2019.00079

Alexandra Ballow, Alina Lazar, Alex Sim, Kesheng Wu, "Joint Sequence Analysis Challenges: How to Handle Missing Values and Mixed Variable Types", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Tyler Leibengood, Alina Lazar, Alex Sim, Kesheng Wu, "Network Traffic Performance Prediction with Multivariate Clusters in Time Windows", SIAM Conference on Computational Science and Engineering (CSE19), 2019,

Beytullah Yildiz, Kesheng Wu, Suren Byna, Arie Shoshanii,, "Parallel membership queries on very large scientific data sets using bitmap indexes", Concurrency and Computation: Practice and Experience, January 28, 2019, 31, doi: https://doi.org/10.1002/cpe.5157

Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating‐point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word‐Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.

Alina Lazar, Ling Jin, C Anna Spurlock, Kesheng Wu, Alex Sim, Annika Todd, "Evaluating the Effects of Missing Values and Mixed Data Types on Social Sequence Clustering Using t-SNE Visualization", Journal of Data and Information Quality (JDIQ), 2019, 11:7,

Sambit Shukla, Dipak Ghosal, Kesheng Wu, Alex Sim, Matthew Farrens, "Co-optimizing Latency and Energy for IoT services using HMP servers in Fog Clusters", 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC), 2019, 121--128, doi: 10.1109/FMEC.2019.8795353

Chao Yang

2019

Victor Yu, William Dawson, Alberto Garcia, Ville Havu, Ben Hourahine, William Huhn, Mathias Jacquelin, Weile Jia, Murat Keceli, Raul Laasner, others, Large-Scale Benchmark of Electronic Structure Solvers with the ELSI Infrastructure, Bulletin of the American Physical Society, 2019,

Katherine Yelick

2019

Marquita Ellis, Giulia Guidi, Aydın Buluç, Leonid Oliker, Katherine Yelick, "diBELLA: Distributed Long Read to Long Read Alignment", 48th International Conference on Parallel Processing (ICPP), June 25, 2019,

Weiqun Zhang

2019

M. Zingale, M.P. Katz, J.B. Bell, M.L. Minion, A.J. Nonaka, W. Zhang, "Improved Coupling of Hydrodynamics and Nuclear Reactions via Spectral Deferred Corrections", August 14, 2019,

M. Zingale, K. Eiden, Y. Cavecchi, A. Harpole, J. B. Bell, M. Chang, I. Hawke, M. P. Katz, C.M. Malone, A. J. Nonaka, D. E. Willcox, W. Zhang, "Toward resolved simulations of burning fronts in thermonuclear X-ray bursts", Journal of Physics: Conference Series, 2019, 1225,

Weiqun Zhang, Ann Almgren, Vince Beckner, John Bell, Johannes Blashke, Cy Chan, Marcus Day, Brian Friesen, Kevin Gott, Daniel Graves, Max P. Katz, Andrew Myers, Tan Nguyen, Andrew Nonaka, Michele Rosso, Samuel Williams, Michael Zingale, "AMReX: a framework for block-structured adaptive mesh refinement", Journal of Open Source Software, May 2019, doi: 10.21105/joss.01370

M. Emmett, E. Motheau, W. Zhang, M. Minion, J. B. Bell, "A Fourth-Order Adaptive Mesh Refinement Algorithm for the Multicomponent, Reacting Compressible Navier-Stokes Equations", Combustion Theory and Modeling, 2019,

Wibe Albert de Jong

2019

Yu-Hang Tang, Wibe A. de Jong, "Prediction of atomization energy using graph kernel and active learning", The Journal of Chemical Physics, January 25, 2019, 150:044107, doi: 10.1063/1.5078640

Other

2019

B. Peng, R. Van Beeumen, D. B. Williams-Young, K. Kowalski and C. Yang, "\Approximate Green's Function Coupled Cluster Method Employing E ective Dimension Reduction", Journal, April 15, 2019, 15:3185-3196, doi: https://doi.org/10.1021/acs.jctc.9b00172

Jack Deslippe, Optimization Use Cases with the Roofline Model, Roofline Tutorial, ECP Annual Meeting, January 2019,

Charlene Yang, Performance Analysis with Roofline on GPUs, Roofline Tutorial, ECP Annual Meeting, January 2019,

E. Y. Hsiao, M. M. Phiilips, G. H. Marion, R. P., N. Morrell, D. J. Sand, C. R. Burns, C., P. Hoeflich, M. D. Stritzinger, S., J. P. Anderson, C. Ashall, C. Baltay, E., D. P. K. Banerjee, S. Davis, T. R. Diamond, G., W. L. Freedman, F. Foerster, L., C. Gall, S. Gonzalez-Gaitan, A., M. Hamuy, S. Holmbo, M. M. Kasliwal, K., S. Kumar, C. Lidman, J. Lu, P. E., S. Perlmutter, S. E. Persson, A. L., D. Rabinowitz, M. Roth, S. D. Ryder, B. P., M. Shahbandeh, N. B. Suntzeff, F. Taddia, S. Uddin, L. Wang, Carnegie Supernova Project-II: The Near-infrared Spectroscopy Program, Publications of the ASP, Pages: 014002 2019, doi: 10.1088/1538-3873/aae961

Junmin Gu, Burlen Loring, Kesheng Wu, E. Wes Bethel, HDF5 As a Vehicle for in Transit Data Movement, ISAV 19, Pages: 39--43 2019, doi: 10.1145/3364228.3364237