Careers | Phone Book | A - Z Index

Paul H. Hargrove

PaulHargrove.jpg
Paul H. Hargrove Ph.D.
Computer Systems Engineer IV
Computer Languages and Systems Software
Phone: +1 510 495 2352
Fax: +1 510 486 6900
Lawrence Berkeley National Laboratory
One Cyclotron Rd MS59R4104
Berkeley, CA 94720 US

Education

Stanford University, Palo Alto, CA, Ph.D., 2003.
Scientific Computing – Computational Mathematics Program

Cornell University, Ithaca, NY, B.A., 1994.
Triple major: Computer Science, Physics (magna cum laude), and Mathematics

Biographical Sketch

Since September 2000, Paul has been a full-time PI at Lawrence Berkeley National Laboratory (LBNL).  His current research interests include Resilience and Network Communications, both in the context of High-Performance Computing (HPC). Current software projects include Berkeley Lab Checkpoint/Restart (BLCR) for Linux, Global Address Space Networking (GASNet), and Berkeley Unified Parallel C (UPC).

BLCR began as a DOE funded effort to produce a production-quality system-level checkpointing implementation suitable for use in preemptive scheduling, migration and fault tolerance. The BLCR implementation work is now part of the larger DEGAS project at LBNL. Paul was the LBNL PI for two five-year multi-institution projects that funded the initial BLCR efforts, and is still lead developer of the BLCR implementation.

GASNet is a joint project between LBNL and the University of California at Berkeley, funded by the DOE Office of Science and the Department of Defense. GASNet provides an abstraction of a network interconnect suitable for implementation of Partitioned Global Address Space (PGAS) languages such as UPC, Titanium and Co-Array FORTRAN; and is the network runtime layer for the UPC and Titanium implementations at LBNL and UCB, plus several external PGAS compilers. GASNet is intended for use by library writers and as a compilation target, unlike MPI which is an application programmer’s interface. To support PGAS languages, GASNet is oriented toward one-sided memory-to-memory transfers with a rich set of non-blocking primitives. Paul’s most significant contributions to GASNet are the InfiniBand and PAMI ports of GASNet, implementation of a library abstracting hardware atomic operations, and ongoing work to add collective operations support to the GASNet specification and implementation. Paul maintains GASNet’s network-specific code for InfiniBand, Myrinet GM, Cray GNI. Cray Portals and IBM LAPI.  Paul is also an active developer and the lead maintainer of the network-independent portion of the Berkeley UPC runtime library. 

Journal Articles

Rajesh Nishtala, Yili Zheng, Paul Hargrove, Katherine A. Yelick, "Tuning collective communication for Partitioned Global Address Space programming models", Parallel Computing, 2011, 37(9):576-591,

Hongzhang Shan, Filip Blagojevic, Seung-Jai Min, Paul Hargrove, Haoqiang Jin, Karl Fuerlinger, Alice Koniges, Nicholas J. Wright, "A Programming Model Performance Study Using the NAS Parallel Benchmarks", Scientific Programming -Exploring Languages for Expressing Medium to Massive On-Chip Parallelism, August 1, 2010, vol.18, doi: 10.3233/SPR-2010-0306

B Bode, R Bradshaw, E DeBenedictus, N Desai, J Duell, G A Geist, P Hargrove, D Jackson, S Jackson, J Laros, C Lowe, E Lusk, W McLendon, J Mugler, T Naughton, J P Navarro, R Oldfield, N Pundit, S L Scott, M Showerman, C Steffen, K Walker, "Scalable system software: a component-based approach", Journal of Physics: Conference Series, January 2005, 16:546-550, doi: 10.1088/1742-6596/16/1/075

Conference Papers

Paul H. Hargrove, Dan Bonachea, "GASNet-EX Performance Improvements Due to Specialization for the Cray Aries Network", Parallel Applications Workshop, Alternatives To MPI (PAW-ATM), Dallas, Texas, USA, November 16, 2018, doi: 10.25344/S44S38

GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models on future exascale machines. This paper reports on the improvements in performance observed on Cray XC-series systems due to enhancements made to the GASNet-EX software. These enhancements, known as "specializations", primarily consist of replacing network-independent implementations of several recently added features with implementations tailored to the Cray Aries network. Performance gains from specialization include (1) Negotiated-Payload Active Messages improve bandwidth of a ping-pong test by up to 14%, (2) Immediate Operations reduce running time of a synthetic benchmark by up to 93%, (3) non-bulk RMA Put bandwidth is increased by up to 32%, (4) Remote Atomic performance is 70% faster than the reference on a point-to-point test and allows a hot-spot test to scale robustly, and (5) non-contiguous RMA interfaces see up to 8.6x speedups for an intra-node benchmark and 26% for inter-node. These improvements are all available in GASNet-EX version 2018.3.0 and later.

Dan Bonachea, Paul H. Hargrove, "GASNet-EX: A High-Performance, Portable Communication Library for Exascale", Languages and Compilers for Parallel Computing (LCPC'18), Salt Lake City, Utah, USA, October 11, 2018, LBNL 2001174, doi: 10.25344/S4QP4W

Partitioned Global Address Space (PGAS) models, typified by such languages as Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity.

GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in future exascale machines. The library is an evolution of the popular GASNet communication system, building upon over 15 years of lessons learned. We describe and evaluate several features and enhancements that have been introduced to address the needs of modern client systems. Microbenchmark results demonstrate the RMA performance of GASNet-EX is competitive with several MPI-3 implementations on current HPC systems.

John Bachan, Dan Bonachea, Paul H Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian Van Straalen, Scott Baden, "The UPC++ PGAS library for exascale computing", PAW 2017: 2nd Annual PGAS Applications Workshop - Held in conjunction with SC 2017, November 12, 2017, doi: 10.1145/3144779.3169108

We describe UPC++ V1.0, a C++11 library that supports APGAS programming. UPC++ targets distributed data structures where communication is irregular or fine-grained. The key abstractions are global pointers, asynchronous programming via RPC, and futures. Global pointers incorporate ownership information useful in optimizing for locality. Futures capture data readiness state, are useful for scheduling and also enable the programmer to chain operations to execute asynchronously as high-latency dependencies become satisfied, via continuations. The interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and closely resemble those used in modern C++. Communication in UPC++ runs at close to hardware speeds by utilizing the low-overhead GASNet-EX communication library.

D. Ozog, A. Kamil, Y. Zheng, P. Hargrove, J. Hammond, A. Malony, W.A. de Jong, K. Yelick, "A Hartree-Fock Application using UPC++ and the New DArray Library", 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 23, 2016, doi: 10.1109/IPDPS.2016.108

 

George Almasi, Paul Hargrove, Gabriel Tanase and Yili Zheng, "UPC Collectives Library 2.0", Fifth Conference on Partitioned Global Address Space Programming Models (PGAS11), October 17, 2011,

Chang-Seo Park, Koushik Sen, Paul Hargrove, Costin Iancu, "Efficient data race detection for distributed memory parallel programs", Supercomputing (SC), 2011,

Keren Bergman, Gilbert Hendry, Paul Hargrove, John Shalf, Bruce Jacob, K. Scott Hemmert, Arun Rodrigues, David Resnick, "Let there be light!: the future of memory systems is photonics and 3D stacking", Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC'11), San Jose, California, June 5, 2011, 43-48,

Filip Blagojevic, Paul Hargrove, Costin Iancu, and Katherine Yelick, "Hybrid PGAS Runtime Support for Multicore Nodes", Fourth Conference on Partitioned Global Address Space Programming Model (PGAS10), October 2010,

Joshua Hursey, Chris January, Mark O'Connor, Paul Hargrove, David Lecomber, Jeffrey M. Squyres, Andrew Lumsdaine, "Checkpoint/Restart-Enabled Parallel Debugging", Recent Advances in the Message Passing Interface (Lecture Notes in Computer Science Volume 6305), Springer, 2010, 219-228, doi: 10.1007/978-3-642-15646-5_23

Rinku Gupta, Peter H. Beckman, Byung-Hoon Park, Ewing L. Lusk, Paul Hargrove, Al Geist, Dhabaleswar K. Panda, Andrew Lumsdaine, Jack Dongarra, "CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems", International Conference on Parallel Processing (ICPP 2009), Vienna, Austria, September 2009, 237-245, doi: 10.1.1.159.7917

Rajesh Nishtala, Paul Hargrove, Dan Bonachea, Katherine Yelick, "Scaling Communication-Intensive Applications on BlueGene/P Using One-Sided Communication and Overlap", 23rd International Parallel & Distributed Processing Symposium (IPDPS), Rome, May 2009, doi: 10.1109/IPDPS.2009.5161076

In earlier work, we showed that the one-sided communication model found in PGAS languages (such as UPC) offers significant advantages in communication efficiency by decoupling data transfer from processor synchronization. We explore the use of the PGAS model on IBM Blue-Gene/P, an architecture that combines low-power, quad-core processors with extreme scalability. We demonstrate that the PGAS model, using a new port of the Berkeley UPC compiler and GASNet one-sided communication layer, outperforms two-sided (MPI) communication in both microbenchmarks and a case study of the communication-limited benchmark, NAS FT. We scale the benchmark up to 16,384 cores of the BlueGene/P and demonstrate that UPC consistently outperforms MPI by as much as 66% for some processor configurations and an average of 32%. In addition, the results demonstrate the scalability of the PGAS model and the Berkeley implementation of UPC, the viability of using it on machines with multicore nodes, and the effectiveness of the BG/P communication layer for supporting one-sided communication and PGAS languages.

Dan Bonachea, Paul Hargrove, Mike Welcome, Katherine Yelick, "Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT", Cray Users Group (CUG), May 2009, doi: 10.25344/S4RP46

Partitioned Global Address Space (PGAS) Languages are an emerging alternative to MPI for HPC applications development. The GASNet library from Lawrence Berkeley National Lab and the University of California at Berkeley provides the network runtime for multiple implementations of four PGAS Languages: Unified Parallel C (UPC), Co-Array Fortran (CAF), Titanium and Chapel. GASNet provides a low overhead one-sided communication layer has enabled portability and high performance of PGAS languages. This paper describes our experiences porting GASNet to the Portals network API on the Cray XT series.

Katherine Yelick, Dan Bonachea, Wei-Yu Chen, Phillip Colella, Kaushik Datta, Jason Duell, Susan L. Graham, Paul Hargrove, Paul Hilfinger, Parry Husbands, Costin Iancu, Amir Kamil, Rajesh Nishtala, Jimmy Su, Michael Welcome, Tong Wen, "Productivity and Performance Using Partitioned Global Address Space Languages", Parallel Symbolic Computation (PASCO'07), July 2007, doi: 10.1145/1278177.1278183

Partitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a consortium that boasts multiple proprietary and open source compilers. Another PGAS language, Titanium, is a dialect of Java T M designed for high performance scientific computation. In this paper we describe some of the highlights of two related projects, the Titanium project centered at U.C. Berkeley and the UPC project centered at Lawrence Berkeley National Laboratory. Both compilers use a source-to-source strategy that translates the parallel languages to C with calls to a communication layer called GASNet. The result is portable highperformance compilers that run on a large variety of shared and distributed memory multiprocessors. Both projects combine compiler, runtime, and application efforts to demonstrate some of the performance and productivity advantages to these languages.

Paul Hargrove, Jason Duell, Eric Roman, "Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters", Proceedings of SciDAC 2006, June 27, 2006,

Costin Iancu, Parry Husbands, Paul Hargrove, "HUNTing the Overlap", IEEE Parallel Architectures and Compilation Techniques (PACT), 2005,

S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, E. Roman, "The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing", Los Alamos Computer Science Institute Symposium Proceedings (LACSI'03), Santa Fe, NM, October 2003,

Christian Bell, Dan Bonachea, Yannick Cote, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Michael L. Welcome, Katherine A. Yelick, "An Evaluation of Current High-Performance Networks", IPDPS - IEEE International Parallel & Distributed Processing Symposium, 2003, doi: 10.1109/IPDPS.2003.1213106

High-end supercomputers are increasingly built out of commodity components, and lack tight integration between the processor and network. This often results in inefficiencies in the communication subsystem, such as high software overheads and/or message latencies. In this paper we use a set of microbenchmarks to quantify the cost of this commoditization, measuring software overhead, latency, and bandwidth on five contemporary supercomputing networks. We compare the performance of the ubiquitous MPI layer to that of lower-level communication layers, and quantify the advantages of the latter for small message performance. We also provide data on the potential for various communication-related optimizations, such as overlapping communication with computation or other communication. Finally, we determine the minimum size needed for a message to be considered 'large' (i.e., bandwidth-bound) on these platforms, and provide historical data on the software overheads of a number of supercomputers over the past decade.

Presentation/Talks

Erik Paulson, Dan Bonachea, Paul Hargrove, GASNet ofi-conduit, Presentation at the Open Fabrics Interface BoF at Supercomputing 2017, November 2017,

Paul H. Hargrove, UPC Language Full-day Tutorial, Workshop at UC Berkeley, July 12, 2012,

Paul H. Hargrove, UPC Language Half-day Tutorial, Workshop at UC Berkeley, June 15, 2011,

Paul H. Hargrove, Introduction to UPC, CScADS Workshop, July 21, 2010,

Yili Zheng, Filip Blagojevic, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Costin Iancu, Seung-Jai Min, Katherine Yelick, Getting Multicore Performance with UPC, SIAM Conference on Parallel Processing for Scientific Computing, February 2010,

Rajesh Nishtala, Yili Zheng, Paul H. Hargrove, Katherine Yelick, UPC at Scale, SIAM Conference on Parallel Processing for Scientific Computing, February 25, 2010,

Yili Zheng, Costin Iancu, Paul H. Hargrove, Seung-Jai Min, Katherine Yelick, Extending Unified Parallel C for GPU Computing, SIAM Conference on Parallel Processing for Scientific Computing, February 24, 2010,

Paul H. Hargrove, A Brief Introduction to BLCR (Berkeley Lab Checkpoint/Restart), SIAM Conference on Parallel Processing for Scientific Computing, February 24, 2010,

Paul Hargrove, Jason Duell, Eric Roman, Berkeley Lab Checkpoint/Restart (BLCR): Status and Future Plans, Dagstuhl Seminar: Fault Tolerance in High-Performance Computing and Grids, May 2009,

Paul Hargrove, Jason Duell, Eric Roman, System-level Checkpoint/Restart with BLCR, TeraGrid 2009 Fault Tolerance Workshop, March 19, 2009,

Paul Hargrove, Jason Duell, Eric Roman, System-level Checkpoint/Restart with BLCR, Los Alamos Computer Science Symposium (LACSS08), October 15, 2008,

Paul Hargrove, Jason Duell, Eric Roman, Advanced Checkpoint Fault Tolerance Solutions for HPC, Workshop on Trends, Technologies and Collaborative Opportunities in High Performance and Grid Computing, Bangkok and Phuket Thailand, June 9, 2008,

Paul H. Hargrove, Dan Bonachea, Christian Bell, Experiences Implementing Partitioned Global Address Space (PGAS) Languages on InfiniBand, OpenFabrics Alliance 2008 International Sonoma Workshop, April 2008,

Paul Hargrove, Jason Duell and Eric Roman, An Overview of Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters, Presentation to ParLab group at UC Berkeley, March 18, 2008,

Paul Hargrove, Eric Roman, Jason Duell, Job Preemption with BLCR, Urgent Computing Workshop, April 25, 2007,

Dan Bonachea, Rajesh Nishtala, Paul Hargrove, Katherine Yelick, Efficient Point-to-Point Synchronization in UPC, 2nd Conf. on Partitioned Global Address Space Programming Models (PGAS06), October 4, 2006,

J. Duell, P. Hargrove, E. Roman, An Overview of Berkeley Lab's Linux Checkpoint/Restart, Presentation at LLNL, January 2004,

Reports

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 8", Lawrence Berkeley National Laboratory Tech Report, September 26, 2018, LBNL 2001179, doi: 10.25344/S45P4X

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2018.9.0", Lawrence Berkeley National Laboratory Tech Report, September 26, 2018, LBNL 2001180, doi: 10.25344/S49G6V

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer’s Guide, v1.0-2018.3.0", Lawrence Berkeley National Laboratory Tech Report, March 31, 2018, LBNL 2001136, doi: 10.2172/1430693

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

Dan Bonachea, Paul Hargrove, "GASNet-EX Performance Improvements Due to Specialization for the Cray Aries Network", Lawrence Berkeley National Laboratory Tech Report, March 27, 2018, LBNL 2001134, doi: 10.2172/1430690

This document is a deliverable for milestone STPM17-6 of the Exascale Computing Project, delivered by WBS 2.3.1.14. It reports on the improvements in performance observed on Cray XC-series systems due to enhancements made to the GASNet-EX software. These enhancements, known as “specializations”, primarily consist of replacing network-independent implementations of several recently added features with implementations tailored to the Cray Aries network. Performance gains from specialization include (1) Negotiated-Payload Active Messages improve bandwidth of a ping-pong test by up to 14%, (2) Immediate Operations reduce running time of a synthetic benchmark by up to 93%, (3) non-bulk RMA Put bandwidth is increased by up to 32%, (4) Remote Atomic performance is 70% faster than the reference on a point-to-point test and allows a hot-spot test to scale robustly, and (5) non-contiguous RMA interfaces see up to 8.6x speedups for an intra-node benchmark and 26% for inter-node. These improvements are available in the GASNet-EX 2018.3.0 release.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Bryce Lelbach, Brian van Straalen,, "UPC++ Specification v1.0, Draft 6", Lawrence Berkeley National Laboratory Tech Report, March 26, 2018, LBNL 2001135, doi: 10.2172/1430689

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer’s Guide, v1.0-2017.9", Lawrence Berkeley National Laboratory Tech Report, September 29, 2017, LBNL 2001065, doi: 10.2172/1398522

This document has been superseded by: UPC++ Programmer’s Guide, v1.0-2018.3.0 (LBNL-2001136)

UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Bryce Lelbach, Brian van Straalen,, "UPC++ Specification v1.0, Draft 4", Lawrence Berkeley National Laboratory Tech Report, September 27, 2017, LBNL 2001066, doi: 10.2172/1398521

This document has been superseded by: UPC++ Specification v1.0, Draft 6 (LBNL-2001135)

UPC++ is a C++11 library providing classes and functions that support Asynchronous Partitioned Global Address Space (APGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Dan Bonachea, Paul Hargrove, "GASNet Specification, v1.8.1", Lawrence Berkeley National Laboratory Tech Report, August 31, 2017, LBNL 2001064, doi: 10.2172/1398512

GASNet is a language-independent, low-level networking layer that provides network-independent, high-performance communication primitives tailored for implementing parallel global address space SPMD languages and libraries such as UPC, UPC++, Co-Array Fortran, Legion, Chapel, and many others. The interface is primarily intended as a compilation target and for use by runtime library writers (as opposed to end users), and the primary goals are high performance, interface portability, and expressiveness. GASNet stands for "Global-Address Space Networking".

W. Kramer, J. Carter, D. Skinner, L. Oliker, P. Husbands, P. Hargrove, J. Shalf, O. Marques, E. Ng, A. Drummond, K. Yelick, "Software Roadmap to Plug and Play Petaflop/s", 2006,

J. Duell, P. Hargrove, E. Roman, "The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart", LBNL Technical Report, December 2002, LBNL 54941,

J. Duell, P. Hargrove, E. Roman, "Requirements for Linux Checkpoint/Restart", LBNL Technical Report, May 2002, LBNL 49659,

Posters

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet-EX: PGAS Support for Exascale Applications and Runtimes", The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'18), November 13, 2018,

Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work is driven by the emerging need for adaptive, lightweight communication in irregular applications at exascale. We present an overview of UPC++ and GASNet-EX, including examples and performance results.

GASNet-EX is a portable, high-performance communication library, leveraging hardware support to efficiently implement Active Messages and Remote Memory Access (RMA). UPC++ provides higher-level abstractions appropriate for PGAS programming such as: one-sided communication (RMA), remote procedure call, locality-aware APIs for user-defined distributed objects, and robust support for asynchronous execution to hide latency. Both libraries have been redesigned relative to their predecessors to meet the needs of exascale computing. While both libraries continue to evolve, the system already demonstrates improvements in microbenchmarks and application proxies.

Scott Baden, Dan Bonachea, Paul Hargrove, "GASNet-EX: PGAS Support for Exascale Apps and Runtimes", ECP Annual Meeting 2018, February 2018,

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes", Poster at Exascale Computing Project (ECP) Annual Meeting 2018., February 2018,

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian Van Straalen, "UPC++: a PGAS C++ Library", ACM/IEEE Conference on Supercomputing, SC'17, November 2017,

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes", Poster at Exascale Computing Project (ECP) Annual Meeting 2017., January 2017,

Dan Bonachea, Rajesh Nishtala, Paul Hargrove, Mike Welcome, Kathy Yelick, "Optimized Collectives for PGAS Languages with One-Sided Communication", Poster Session at SuperComputing 2006, November 2006, doi: 10.1145/1188455.1188604

Optimized collective operations are a crucial performance factor for many scientific applications. This work investigates the design and optimization of collectives in the context of Partitioned Global Address Space (PGAS) languages such as Unified Parallel C (UPC). Languages with one-sided communication permit a more flexible and expressive collective interface with application code, in turn enabling more aggressive optimization and more effective utilization of system resources. We investigate the design tradeoffs in a collectives implementation for UPC, ranging from resource management to synchronization mechanisms and target-dependent selection of optimal communication patterns. Our collectives are implemented in the Berkeley UPC compiler using the GASNet communication system, tuned across a wide variety of supercomputing platforms, and benchmarked against MPI collectives. Special emphasis is placed on the newly added Cray XT3 backend for UPC, whose characteristics are benchmarked in detail.

P.H. Hargrove, J.C. Duell, "Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters", Proceedings of SciDAC 2006, June 27, 2006,

Dan O Bonachea, Christian Bell, Rajesh Nishtala, Kaushik Datta, Parry Husbands, Paul Hargrove, Katherine Yelick, "The Performance and Productivity Benefits of Global Address Space Languages", Poster Session at SuperComputing 2005, November 2005,

Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Wei Tu, Mike Welcome, Kathy Yelick, "GASNet 2 - An Alternative High-Performance Communication Interface", Poster Session at SuperComputing 2004, November 9, 2004,

Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove,
Parry Husbands, Costin Iancu, Mike Welcome, Kathy Yelick,
"GASNet: Project Overview", SuperComputing 2003, November 2003,

Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove,
Parry Husbands, Costin Iancu, Mike Welcome, Kathy Yelick,
"GASNet: Project Overview", SuperComputing 2002, November 2002,

Others

Paul Hargrove, Brock Palen, Jeff Squyres, RCE 12: BLCR, RCE Podcast (interview), June 19, 2009,

Brock Palen and Jeff Squyres speak with Paul Hargrove of the Berkley Labratory Checkpoint Restart (BLCR) project