Careers | Phone Book | A - Z Index

Mathias Jacquelin

photo2
Mathias Jacquelin
Research Scientist
Computational Research Division
Phone: +1 510 495 2605
Lawrence Berkeley National Laboratory
1 Cyclotron Rd, M/S 50A3111
Berkeley, CA 94720 US

Position


Research scientist,  Computational Research Division, November 2015 - present.

Education


  • Ph.D., Ecole Normale Superieure de Lyon, Juillet 2011.
  • M.Sc., Institut National des Sciences Appliquees de Lyon, Juillet 2008.

Short Bio


Mathias Jacquelin is a research scientist in the Scalable Solvers Group in the Computational Research Division at the Lawrence Berkeley National Laboratory since November 2015. He is currently working on highly scalable algorithms for solving large sparse symmetric linear systems.

He obtained a Master’s Degree in Computer Science from INSA of Lyon, France (National Institute of Applied Sciences). He went on to the Ecole Normale Superieure of Lyon, to complete his Ph.D. in Computer Science, focusing on algorithms and scheduling, and how hierarchical memory architectures should be handled. He joined Laura Grigori's team at INRIA of Saclay, France, for 18 months, working on communication avoiding algorithms for dense linear algebra. He then joined LBNL in 2013 as a postdoctoral fellow in the Scientific Computing Group and then the Scalable Solvers Group, focusing on various sparse linear algebra computations.

He is the developer of the symPACK solver, a direct linear solver for sparse symmetric matrices which is implemented using a task-based approach and relies on the UPC++ communication framework. 

He is also a member of the UPC++ development team, a high performance communication framework relying on the PGAS approach. Please refer to the Pagoda project for more details.

He is a member of the Performance team of the E3SM climate modeling project, in which he does code performance analysis and optimization on current and upcoming hardware platforms.

Research interests


  • Algorithm design, task scheduling
  • Sparse linear algebra
  • Communication avoiding, communication frameworks
  • Multicore architecture
  • Parallel computing
  • Code optimization, code profiling

For more information, please visit his personal website:

http://graal.ens-lyon.fr/~mjacquel/index

Journal Articles

Mathias Jacquelin, Lin Lin, Chao Yang, "PSelInv--A distributed memory parallel algorithm for selected inversion: The non-symmetric case", Parallel Computing, 2018, 74:84--98,

Victor Wen-zhe Yu, Fabiano Corsetti, Alberto Garcia, William P Huhn, Mathias Jacquelin, Weile Jia, Bjorn Lange, Lin Lin, Jianfeng Lu, Wenhui Mi, others, "ELSI: A unified software interface for Kohn--Sham electronic structure solvers", Computer Physics Communications, 2018, 222:267--285,

V. Yu. F. Corsetti, A. García, W. P. Huhn, M. Jacquelin, W. Jia, B. Lange, L. Lin, J. Lu, W. Mi, A. Seifitokaldan, Á. Vazquez-Mayagoitia, C. Yang, H. Yang, V. Blum, "ELSI: A unified software interface for Kohn–Sham electronic structure solvers", Computer Physics Communications, September 7, 2017, 222:267-285, doi: 10.1016/j.cpc.2017.09.007

Mathias Jacquelin, Lin Lin, Chao Yang, "PSelInv—A distributed memory parallel algorithm for selected inversion: The symmetric case", ACM Transactions on Mathematical Software (TOMS), 2017, 43:21,

M. Jacquelin, L. Lin and C. Yang, "A Distributed Memory Parallel Algorithm for Selected Inversion: the non-symmetric case", PMAA, December 30, 2016,

M. Jacquelin, L. Lin, W. Jia, Y. Zhao and C. Yang, "A Left-looking selected inversion algorithm and task parallelism on shared memory systems", April 9, 2016,

Mathias Jacquelin, Lin Lin, Chao Yang, "A Distributed Memory Parallel Algorithm for Selected Inversion : the Symmetric Case", To appear in ACM Transactions on Mathematical Software (TOMS), May 28, 2015,

Grey Ballard, James Demmel, Laura Grigori, Mathias Jacquelin, Hong Diep Nguyen, Edgar Solomonik, "Reconstructing Householder vectors from tall-skinny QR", Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, January 1, 2014, 1159--1170,

Jack Dongarra, Mathieu Faverge, Thomas Herault, Mathias Jacquelin, Julien Langou, Yves Robert, "Hierarchical QR factorization algorithms for multi-core clusters", Parallel Computing, 2013, 39:212--232,

Tudor David, Mathias Jacquelin, Loris Marchal, "Scheduling streaming applications on a complex multicore platform", Concurrency and Computation: Practice and Experience, 2012, 24:1726--1750,

Conference Papers

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).
We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.
UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

Mathias Jacquelin, Lin Lin, Weile Jia, Yonghua Zhao, Chao Yang, "A Left-Looking Selected Inversion Algorithm and Task Parallelism on Shared Memory Systems", Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, January 1, 2018, 54--63,

Yang Liu, Mathias Jacquelin, Pieter Ghysels, Xiaoye S Li, "Highly scalable distributed-memory sparse triangular solution algorithms", 2018 Proceedings of the Seventh SIAM Workshop on Combinatorial Scientific Computing, 2018, 87--96,

Mathias Jacquelin, Esmond G Ng, Barry W Peyton, "Fast and effective reordering of columns within supernodes using partition refinement", 2018 Proceedings of the Seventh SIAM Workshop on Combinatorial Scientific Computing, 2018, 76--86,

Grey Ballard, James Demmel, Laura Grigori, Mathias Jacquelin, Nicholas Knight, "A 3D Parallel Algorithm for QR Decomposition", SPAA '18, 2018,

William Huhn, Alberto Garcia, Luigi Genovese, Ville Havu, Mathias Jacquelin, Weile Jia, Murat Keceli, Raul Laasner, Yingzhou Li, Lin Lin, others, "Unified Access To Kohn-Sham DFT Solvers for Different Scales and HPC: The ELSI Project", Bulletin of the American Physical Society, American Physical Society, 2018,

John Bachan, Dan Bonachea, Paul H Hargrove, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Scott B Baden, "The UPC++ PGAS library for Exascale Computing", Proceedings of the Second Annual PGAS Applications Workshop (PAW17), November 13, 2017, 7, doi: 10.1145/3144779.3169108

We describe UPC++ V1.0, a C++11 library that supports APGAS programming. UPC++ targets distributed data structures where communication is irregular or fine-grained. The key abstractions are global pointers, asynchronous programming via RPC, and futures. Global pointers incorporate ownership information useful in optimizing for locality. Futures capture data readiness state, are useful for scheduling and also enable the programmer to chain operations to execute asynchronously as high-latency dependencies become satisfied, via continuations. The interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and closely resemble those used in modern C++. Communication in UPC++ runs at close to hardware speeds by utilizing the low-overhead GASNet-EX communication library.

E.J. Bylaska, J. Hammond, M. Jacquelin, W.A. de Jong, M. Klemm, "Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel® Xeon Phi© Processor", High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science, Springer, Cham, October 21, 2017, 404-418, doi: 10.1007/978-3-319-67630-2_30

Ariful Azad, Mathias Jacquelin, Aydin Bulu\cc, Esmond G Ng, "The reverse Cuthill-McKee algorithm in distributed-memory", Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International, January 2017, 22--31,

Mathias Jacquelin, Wibe De Jong, Eric Bylaska, "Towards highly scalable Ab initio molecular dynamics (AIMD) simulations on the Intel knights landing manycore processor", Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International, January 1, 2017, 234--243,

W.A. de Jong, M. Jacquelin, E.J. Bylaska, "Advancing Algorithms to Increase Performance of Correlated and Dynamical Electronic Structure Simulation", CMMSE 2016: Proceedings of the 16th International Conference on Mathematical Methods in Science and Engineering, September 1, 2016, 5:1342-1346,

Mathias Jacquelin, Yili Zheng, Esmond Ng, Katherine Yelick, "An Asynchronous Task-based Fan-Both Sparse Cholesky Solver", Submitted to SuperComputing'16, May 10, 2016,

M. Jacquelin, L. Lin, N. Wichmann and C. Yang, "Enhancing the scalability tree-based asynchronous communication", accepted IPDPS16, November 25, 2015,

G. Ballard, J. Demmel, L. Grigori, M. Jacquelin, Hong Diep Nguyen, E. Solomonik, "Reconstructing Householder Vectors from Tall-Skinny QR", Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, 2014, 1159-1170, doi: 10.1109/IPDPS.2014.120

Laura Grigori, Mathias Jacquelin, Amal Khabou, "Performance predictions of multilevel communication optimal LU and QR factorizations on hierarchical platforms", International Supercomputing Conference, 2014, 76--92,

Mathias Jacquelin, Loris Marchal, Yves Robert, Bora U\ccar, "On optimal tree traversals for sparse matrix factorization", Parallel \& Distributed Processing Symposium (IPDPS), 2011 IEEE International, 2011, 556--567,

Henricus Bouwmeester, Mathias Jacquelin, Julien Langou, Yves Robert, "Tiled QR factorization algorithms", Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, 2011, 7,

Franck Cappello, Mathias Jacquelin, Loris Marchal, Yves Robert, Marc Snir, "Comparing archival policies for Blue Waters", High Performance Computing (HiPC), 2011 18th International Conference on, 2011, 1--10,

Matthieu Gallet, Mathias Jacquelin, Loris Marchal, "Scheduling complex streaming applications on the Cell processor", Parallel \& Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, 2010, 1--8,

Mathias Jacquelin, Loris Marchal, Yves Robert, "Complexity analysis and performance evaluation of matrix product on multicore architectures", Parallel Processing, 2009. ICPP 09. International Conference on, 2009, 196--203,

Book Chapters

E. J. Bylaska, E. Apra, K. Kowalski, M. Jacquelin, W.A. de Jong, A. Vishnu, B. Palmer, J. Daily, T.P. Straatsma, J.R. Hammond, M. Klemm, "Transitioning NWChem to the Next Generation of Many Core Machines", Exascale Scientific Applications Scalability and Performance Portability, edited by Tjerk P. Straatsma, Katerina B. Antypas, Timothy J. Williams, (Taylor & Francis: November 9, 2017)

Presentation/Talks

Victor Yu, William Dawson, Alberto Garcia, Ville Havu, Ben Hourahine, William Huhn, Mathias Jacquelin, Weile Jia, Murat Keceli, Raul Laasner, others, Large-Scale Benchmark of Electronic Structure Solvers with the ELSI Infrastructure, Bulletin of the American Physical Society, 2019,

Mathias Jacquelin, Scheduling Sparse Symmetric Fan-Both Cholesky Factorization, The 11th Scheduling for Large Scale Systems Workshop, May 18, 2016,

Mathias Jacquelin, Scheduling Sparse Symmetric Fan-Both Cholesky Factorization, SIAM PP'16, April 15, 2016,

Mathias Jacquelin, Lin Lin, Nathan Wichmann, Chao Yang, Enhancing scalability and load balancing of Parallel Selected Inversion via tree-based asynchronous communication, Parallel and Distributed Processing Symposium, 2016 IEEE International, Pages: 192--201 January 1, 2016,

Reports

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001236, doi: 10.25344/S4V30R

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001191, doi: 10.25344/S4F301

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 8", Lawrence Berkeley National Laboratory Tech Report, September 26, 2018, LBNL 2001179, doi: 10.25344/S45P4X

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2018.9.0", Lawrence Berkeley National Laboratory Tech Report, September 26, 2018, LBNL 2001180, doi: 10.25344/S49G6V

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

J Bachan, S Baden, D Bonachea, PH Hargrove, S Hofmeyr, K Ibrahim, M Jacquelin, A Kamil, B van Straalen, "UPC++ Programmer’s Guide, v1.0-2018.3.0", March 31, 2018, LBNL 2001136, doi: 10.2172/1430693

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

J Bachan, S Baden, D Bonachea, P Hargrove, S Hofmeyr, K Ibrahim, M Jacquelin, A Kamil, B Lelbach, B van Straalen, "UPC++ Specification v1.0, Draft 6", March 26, 2018, LBNL 2001135, doi: 10.2172/1430689

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer’s Guide, v1.0-2017.9", Lawrence Berkeley National Laboratory Tech Report, September 29, 2017, LBNL 2001065, doi: 10.2172/1398522

This document has been superseded by: UPC++ Programmer’s Guide, v1.0-2018.3.0 (LBNL-2001136)

UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

J Bachan, S Baden, D Bonachea, P Hargrove, S Hofmeyr, K Ibrahim, M Jacquelin, A Kamil, B Lelbach, B van Straalen, "UPC++ Specification v1.0, Draft 4", September 27, 2017, LBNL 2001066, doi: 10.2172/1398521

This document has been superseded by: UPC++ Specification v1.0, Draft 6 (LBNL-2001135)

UPC++ is a C++11 library providing classes and functions that support Asynchronous Partitioned Global Address Space (APGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

Posters

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet-EX: PGAS Support for Exascale Applications and Runtimes", The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'18), November 13, 2018,

Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work is driven by the emerging need for adaptive, lightweight communication in irregular applications at exascale. We present an overview of UPC++ and GASNet-EX, including examples and performance results.

GASNet-EX is a portable, high-performance communication library, leveraging hardware support to efficiently implement Active Messages and Remote Memory Access (RMA). UPC++ provides higher-level abstractions appropriate for PGAS programming such as: one-sided communication (RMA), remote procedure call, locality-aware APIs for user-defined distributed objects, and robust support for asynchronous execution to hide latency. Both libraries have been redesigned relative to their predecessors to meet the needs of exascale computing. While both libraries continue to evolve, the system already demonstrates improvements in microbenchmarks and application proxies.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes", Poster at Exascale Computing Project (ECP) Annual Meeting 2018., February 2018,

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian Van Straalen, "UPC++: a PGAS C++ Library", ACM/IEEE Conference on Supercomputing, SC'17, November 2017,

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes", Poster at Exascale Computing Project (ECP) Annual Meeting 2017., January 2017,

Mathias Jacquelin, "Memory-aware algorithms and scheduling techniques: from multicore processors to petascale supercomputers", Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, Pages: 2038--2041 2011,