Brian Van Straalen
Brian Van Straalen received his BASc Mechanical Engineering in 1993 and MMath in Applied Mathematics in 1995 from University of Waterloo. He has been working in the area of scientific computing since he was an undergraduate. He worked with Advanced Scientific Computing Ltd. developing CFD codes written largely in Fortran 77 running on VAX and UNIX workstations. He then worked as part of the thermal modeling group with Bell Northern Research. His Master's thesis work was in the area of a posteriori error estimation for Navier-Stokes equations, which is an area that is still relevant to Department of Energy scientific computing. He worked for Beam Technologies developing the PDESolve package: a combined symbolic manipulation package and finite element solver running in parallel on some of the earliest NSF and DOE MPP parallel computers. He came to LBNL in 1998 to work with Phil Colella and start up the Chombo Project, now in its 13th year of development. He is currently working on his Ph.D. in the Computer Science department at UC Berkeley.
Journal Articles
Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Phillip Colella, Mary Hall, "Compiler-Based Code Generation and Autotuning for Geometric Multigrid on GPU-Accelerated Supercomputers", Parallel Computing (PARCO), April 2017, doi: 10.1016/j.parco.2017.04.002
Andrew Myers, Phillip Colella, Brian Van Straalen, "A 4th-Order Particle-in-Cell Method with Phase-Space Remapping for the Vlasov-Poisson Equation", submitted to SISC, February 1, 2016,
Andrew Myers, Phillip Colella, Brian Van Straalen, "The Convergence of Particle-in-Cell Schemes for Cosmological Dark Matter Simulations", The Astrophysical Journal, Volume 816, Issue 2, article id. 56, 2016,
A Chien, P Balaji, P Beckman, N Dun, A Fang, H Fujita, K Iskra, Z Rubenstein, Z Zheng, R Schreiber, others, "Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience", Journal of Computational Science, 2015,
Anshu Dubey, Ann Almgren, John Bell, Martin Berzins, Steve Brandt, Greg Bryan, Phillip Colella, Daniel Graves, Michael Lijewski, Frank L\ offler, others, "A survey of high level frameworks in block-structured adaptive mesh refinement packages", Journal of Parallel and Distributed Computing, 2014, 74:3217--3227, doi: 10.1016/j.jpdc.2014.07.001
A Dubey, B Van Straalen, "Experiences from software engineering of large scale AMR multiphysics code frameworks", arXiv preprint arXiv:1309.1781, January 1, 2013, doi: http://dx.doi.org/10.5334/jors.am
Vay, J.L., Colella, P., McCorquodale, P., Van Straalen, B., Friedman, A., Grote, D.P., "Mesh Refinement for Particle-in-Cell Plasma Simulations: Applications to and Benefits for Heavy Ion Fusion", Laser and Particle Beams. Vol.20 N.4 (2002), pp. 569-575, 2002,
- Download File: A151.pdf (pdf: 624 KB)
Conference Papers
John Bachan, Dan Bonachea, Paul H Hargrove, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Scott B Baden, "The UPC++ PGAS library for Exascale Computing", Proceedings of the Second Annual PGAS Applications Workshop (PAW17), November 13, 2017, doi: 10.1145/3144779.3169108
We describe UPC++ V1.0, a C++11 library that supports APGAS programming. UPC++ targets distributed data structures where communication is irregular or fine-grained. The key abstractions are global pointers, asynchronous programming via RPC, and futures. Global pointers incorporate ownership information useful in optimizing for locality. Futures capture data readiness state, are useful for scheduling and also enable the programmer to chain operations to execute asynchronously as high-latency dependencies become satisfied, via continuations. The interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and closely resemble those used in modern C++. Communication in UPC++ runs at close to hardware speeds by utilizing the low-overhead GASNet-EX communication library.
Dharshi Devendran, Suren Byna, Bin Dong, Brian van Straalen, Hans Johansen, Noel Keen, and Nagiza Samatova,, "Collective I/O Optimizations for Adaptive Mesh Refinement Data Writes on Lustre File System", Cray User Group (CUG) 2016, May 10, 2016,
Andrey Ovsyannikov, Melissa Romanus, Brian Van Straalen, Gunther H. Weber, David Trebotich, "Scientific Workflows at DataWarp-Speed: Accelerated Data-Intensive Science using NERSC s Burst Buffer", Proceedings of the 1st Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems, IEEE Press, 2016, 1--6, doi: 10.1109/PDSW-DISCS.2016.005
Protonu Basu, Samuel Williams, Brian Van Straalen, Mary Hall, Leonid Oliker, Phillip Colella, "Compiler-Directed Transformation for Higher-Order Stencils", International Parallel and Distributed Processing Symposium (IPDPS), May 2015,
- Download File: ipdps15CHiLL.pdf (pdf: 1.8 MB)
Yu Jung Lo, Samuel Williams, Brian Van Straalen, Terry J. Ligocki, Matthew J. Cordery, Leonid Oliker, Mary W. Hall, "Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2014, doi: 10.1007/978-3-319-17248-4_7
- Download File: PMBS14-Roofline.pdf (pdf: 340 KB)
Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Mary Hall, "Converting Stencils to Accumulations for Communication-Avoiding Optimization in Geometric Multigrid", Workshop on Stencil Computations (WOSC), October 2014,
- Download File: wosc14chill.pdf (pdf: 973 KB)
Samuel Williams, Mike Lijewski, Ann Almgren, Brian Van Straalen, Erin Carson, Nicholas Knight, James Demmel, "s-step Krylov subspace methods as bottom solvers for geometric multigrid", Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, January 2014, 1149--1158, doi: 10.1109/IPDPS.2014.119
- Download File: ipdps14cabicgstabfinal.pdf (pdf: 943 KB)
- Download File: ipdps14CABiCGStabtalk.pdf (pdf: 944 KB)
Protonu Basu, Anand Venkat, Mary Hall, Samuel Williams, Brian Van Straalen, Leonid Oliker, "Compiler generation and autotuning of communication-avoiding operators for geometric multigrid", 20th International Conference on High Performance Computing (HiPC), December 2013, 452--461,
- Download File: hipc13chill.pdf (pdf: 989 KB)
P. Basu, A. Venkat, M. Hall, S. Williams, B. Van Straalen, L. Oliker, "Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid", Workshop on Stencil Computations (WOSC), 2013,
Christopher D. Krieger, Michelle Mills Strout, Catherine Olschanowsky, Andrew Stone, Stephen Guzik, Xinfeng Gao, Carlo Bertolli, Paul H.J. Kelly, Gihan Mudalige, Brian Van Straalen, Sam Williams, "Loop chaining: A programming abstraction for balancing locality and parallelism", Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International, May 2013, 375--384, doi: 10.1109/IPDPSW.2013.68
S. Williams, D. Kalamkar, A. Singh, A. Deshpande, B. Van Straalen, M. Smelyanskiy, A. Almgren, P. Dubey, J. Shalf, L. Oliker, "Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), November 2012, doi: 10.1109/SC.2012.85
- Download File: sc12-mg.pdf (pdf: 808 KB)
- Download File: sc12mgtalk.pdf (pdf: 1.9 MB)
B. Van Straalen, P. Colella, D. T. Graves, N. Keen, "Petascale Block-Structured AMR Applications Without Distributed Meta-data", Euro-Par 2011 Parallel Processing - 17th International Conference, Euro-Par 2011, August 29 - September 2, 2011, Proceedings, Part II. Lecture Notes in Computer Science 6853 Springer 2011, ISBN 978-3-642-23396-8, Bordeaux, France, 2011,
- Download File: EuroPar2011bvs.pdf (pdf: 400 KB)
Deines E., Weber, G.H., Garth, C., Van Straalen, B. Borovikov, S., Martin, D.F., and Joy, K.I., "On the computation of integral curves in adaptive mesh refinement vector fields", Proceedings of Dagstuhl Seminar on Scientific Visualization 2009, Schloss Dagstuhl, 2011, 2:73-91, LBNL 4972E,
- Download File: 7.pdf (pdf: 799 KB)
Chaopeng Shen, David Trebotich, Sergi Molins, Daniel T Graves, BV Straalen, DT Graves, T Ligocki, CI Steefel, "High performance computations of subsurface reactive transport processes at the pore scale", Proceedings of SciDAC, 2011,
- Download File: SciDAC2011sim.pdf (pdf: 1.1 MB)
Gunther Weber, "Recent advances in visit: Amr streamlines and query-driven visualization", 2010,
- Download File: LBNL-3185E.pdf (pdf: 2.1 MB)
Brian van Straalen, Shalf, J. Ligocki, Keen, Woo-Sun Yang, "Scalability challenges for massively parallel AMR applications", IPDPS, 2009, 1-12,
- Download File: ipdps09submit.pdf (pdf: 529 KB)
B.V. Straalen, J. Shalf, T. Ligocki, N. Keen, and W. Yang, "Scalability Challenges for Massively Parallel AMR Application", 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009., 2009,
- Download File: ipdps09finalcertified.pdf (pdf: 366 KB)
G.H. Weber, V. Beckner, H. Childs, T. Ligocki, M. Miller, B. van Straalen, E.W. Bethel, "Visualization of Scalar Adaptive Mesh Refinement Data", Numerical Modeling of Space Plasma Flows: Astronum-2007 (Astronomical Society of the Pacific Conference Series), April 2008, 385:309-320, LBNL 220E,
- Download File: LBNL-220E.pdf (pdf: 1.5 MB)
D. Trebotich, B.V. Straalen, D. Graves and P. Colella, "Performance of Embedded Boundary Methods for CFD with Complex Geometry", 2008 J. Phys.: Conf. Ser. 125 012083, 2008,
- Download File: SciDAC2008-EBPerform.pdf (pdf: 167 KB)
P. Colella, D. Graves, T. Ligocki, D. Trebotich and B.V. Straalen, "Embedded Boundary Algorithms and Software for Partial Differential Equations", 2008 J. Phys.: Conf. Ser. 125 012084, 2008,
- Download File: SciDAC2008-EBAlgor.pdf (pdf: 972 KB)
Phillip Colella, John Bell, Noel Keen, Terry Ligocki, Michael Lijewski, Brian van Straalen, "Performance and Scaling of Locally-Structured Grid Methods for Partial Differential Equations", presented at SciDAC 2007 Annual Meeting, 2007,
- Download File: AMRPerformance.pdf (pdf: 386 KB)
Kevin Long, Brian Van Straalen, "PDESolve: an object-oriented PDE analysis environment", Object Oriented Methods for Interoperable Scientific and Engineering Computing: Proceedings of the 1998 SIAM Workshop, 1998, 99:225,
Book Chapters
B. Van Straalen, D. Trebotich, A. Ovsyannikov and D.T. Graves, "Scalable Structured Adaptive Mesh Refinement with Complex Geometry", Exascale Scientific Applications: Programming Approaches for Scalability, Performance, and Portability, edited by Straatsma, T., Antypas, K., Williams, T., (Chapman and Hall/CRC: November 9, 2017)
Presentation/Talks
Samuel Williams, Mark Adams, Brian Van Straalen, Performance Portability in Hybrid and Heterogeneous Multigrid Solvers, Copper Moutain, March 2016,
- Download File: CU16SWWilliams.pptx (pptx: 1 MB)
Reports
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.9.0", Lawrence Berkeley National Laboratory Tech Report LBNL-2001560, December 2023, doi: 10.25344/S4P01J
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes. UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes.
UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2020.10.0", Lawrence Berkeley National Laboratory Tech Report, October 2020, LBNL 2001368, doi: 10.25344/S4HG6Q
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2020.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2020, LBNL 2001269, doi: 10.25344/S4P88Z
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.
John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2019, LBNL 2001236, doi: 10.25344/S4V30R
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.
John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.
John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2019, LBNL 2001191, doi: 10.25344/S4F301
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.
John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.
John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2018.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2018, LBNL 2001180, doi: 10.25344/S49G6V
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.
John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 8", Lawrence Berkeley National Laboratory Tech Report, September 26, 2018, LBNL 2001179, doi: 10.25344/S45P4X
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.
John Bachan, Scott Baden, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Bryce Lelbach, Brian Van Straalen, "UPC++ Specification v1.0, Draft 6", Lawrence Berkeley National Laboratory Tech Report, March 26, 2018, LBNL 2001135, doi: 10.2172/1430689
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.
John Bachan, Scott Baden, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian Van Straalen, "UPC++ Programmer’s Guide, v1.0-2018.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2018, LBNL 2001136, doi: 10.2172/1430693
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.
John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer’s Guide, v1.0-2017.9", Lawrence Berkeley National Laboratory Tech Report, September 2017, LBNL 2001065, doi: 10.2172/1398522
UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.
John Bachan, Scott Baden, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Bryce Lelbach, Brian Van Straalen, "UPC++ Specification v1.0, Draft 4", Lawrence Berkeley National Laboratory Tech Report, September 27, 2017, LBNL 2001066, doi: 10.2172/1398521
UPC++ is a C++11 library providing classes and functions that support Asynchronous Partitioned Global Address Space (APGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.
P. Colella, D. T. Graves, T. J. Ligocki, G.H. Miller , D. Modiano, P.O. Schwartz, B. Van Straalen, J. Pillod, D. Trebotich, M. Barad, "EBChombo Software Package for Cartesian Grid, Embedded Boundary Applications", Lawrence Berkeley National Laboratory Technical Report LBNL-6615E, January 9, 2015,
- Download File: ebmain.pdf (pdf: 681 KB)
M. Adams, P. Colella, D. T. Graves, J.N. Johnson, N.D. Keen, T. J. Ligocki. D. F. Martin. P.W. McCorquodale, D. Modiano. P.O. Schwartz, T.D. Sternberg, B. Van Straalen, "Chombo Software Package for AMR Applications - Design Document", Lawrence Berkeley National Laboratory Technical Report LBNL-6616E, January 9, 2015,
- Download File: chomboDesign.pdf (pdf: 994 KB)
Mark F. Adams, Jed Brown, John Shalf, Brian Van Straalen, Erich Strohmaier, Samuel Williams, "HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems", LBNL Technical Report, 2014, LBNL 6630E,
- Download File: hpgmg.pdf (pdf: 183 KB)
Samuel Williams, Dhiraj D. Kalamkar, Amik Singh, Anand M. Deshpande, Brian Van Straalen, Mikhail Smelyanskiy,
Ann Almgren, Pradeep Dubey, John Shalf, Leonid Oliker,
"Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark",
December 2012,
LBNL 6676E,
- Download File: miniGMGLBNL-6676E.pdf (pdf: 906 KB)
Brian Van Straalen, David Trebotich, Terry Ligocki, Daniel T. Graves, Phillip Colella, Michael Barad, "An Adaptive Cartesian Grid Embedded Boundary Method for the Incompressible Navier Stokes Equations in Complex Geometry", LBNL Report Number: LBNL-1003767, 2012, LBNL LBNL Report Numb,
- Download File: paper5.pdf (pdf: 360 KB)
We present a second-order accurate projection method to solve the
incompressible Navier-Stokes equations on irregular domains in two
and three dimensions. We use a finite-volume discretization
obtained from intersecting the irregular domain boundary with a
Cartesian grid. We address the small-cell stability problem
associated with such methods by hybridizing a conservative
discretization of the advective terms with a stable, nonconservative
discretization at irregular control volumes, and redistributing the
difference to nearby cells. Our projection is based upon a
finite-volume discretization of Poisson's equation. We use a
second-order, $L^\infty$-stable algorithm to advance in time. Block
structured local refinement is applied in space. The resulting
method is second-order accurate in $L^1$ for smooth problems. We
demonstrate the method on benchmark problems for flow past a
cylinder in 2D and a sphere in 3D as well as flows in 3D geometries
obtained from image data.
M. Christen, N. Keen, T. Ligocki, L. Oliker, J. Shalf, B. van Straalen, S. Williams, "Automatic Thread-Level Parallelization in the Chombo AMR Library", LBNL Technical Report, 2011, LBNL 5109E,
Posters
Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++ (ECP'19)", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,
Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet-EX: PGAS Support for Exascale Applications and Runtimes", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'18) Research Poster, November 2018,
Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work is driven by the emerging need for adaptive, lightweight communication in irregular applications at exascale. We present an overview of UPC++ and GASNet-EX, including examples and performance results.
GASNet-EX is a portable, high-performance communication library, leveraging hardware support to efficiently implement Active Messages and Remote Memory Access (RMA). UPC++ provides higher-level abstractions appropriate for PGAS programming such as: one-sided communication (RMA), remote procedure call, locality-aware APIs for user-defined distributed objects, and robust support for asynchronous execution to hide latency. Both libraries have been redesigned relative to their predecessors to meet the needs of exascale computing. While both libraries continue to evolve, the system already demonstrates improvements in microbenchmarks and application proxies.