Scott B. Baden

My interests are in languages, run-time and source-to-source translation as a means of enhancing performance. I was a post-doc at LBNL in the late 80s and early 90s, after receiving my Ph.D. at U.C Berkeley in 1987.
From 2016-2019 I was PI of the ECP-funded Pagoda Project ("Lightweight Communication and Global Address Space Support for Exascale Applications"), LBNL funded LDRD project "Automated Translation of Code to Large Scale Programming Systems," and institutional PI for the ASCR-funded project "Validating Extreme-scale Resilience with Veracity."
I am Professor Emeritus in the Department of Computer Science and Engineering at UCSD, where I was a ladder-track faculty member for 27 years. You can learn about my UCSD research at my UCSD web page, including publications.
Journal Articles
Tan Nguyen, Pietro Cicotti, Eric Bylaska, Dan Quinlan, and Scott Baden, "Automatic Translation of MPI Source into a Latency-tolerant, Data-driven Form", Journal of Parallel and Distributed Computing, February 21, 2017,
Han Suk Kim, Didem Unat, Scott Baden, Jurgen Schulze, "A new approach to interactive viewpoint selection for volume data sets", Information Visualization, February 25, 2013, doi: 10.1177/1473871612467631
Tan Nguyen, Daniel Hefenbrock, Jason Oberg, Ryan Kastner and Scott Baden, "A software-based dynamic-warp scheduling approach for load-balancing the Viola-Jones face detection algorithm on GPUs", Journal of Parallel and Distributed Computing, January 31, 2013,
Mitesh Meswani, Laura Carrington, Didem Unat, Allan Snavely, Scott Baden, Stephen Poole, "Modeling and predicting performance of high performance computing applications on hardware accelerators", International Journal of High Performance Computing Applications, December 28, 2012,
Didem Unat, Jun Zhou, Yifeng Cui, Scott B. Baden, Xing Cai, "Accelerating a 3D Finite Difference Earthquake Simulation with a C-to-CUDA Translator", Computing in Science and Engineering, May 2012, Vol 14:48-59,
McCorquodale, P., Colella, P., Balls, G.T., and Baden, S.B., "A Local Corrections Algorithm for Solving Poisson's Equation in Three Dimensions", Communications in Applied Mathematics and Computational Science Vol. 2, No. 1 (2007), pp. 57-81., 2007, doi: 10.2140/camcos.2007.2.57
Conference Papers
Alexander Pöppl, Scott Baden, Michael Bader, "A UPC++ Actor Library and Its Evaluation On a Shallow Water Proxy Application", 2019 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI (PAW-ATM), ACM, November 17, 2019, doi: 10.1109/PAW-ATM49560.2019.00007
Programmability is one of the key challenges of Exascale Computing. Using the actor model for distributed computations may be one solution. The actor model separates computation from communication while still enabling their overlap. Each actor possesses specified communication endpoints to publish and receive information. Computations are undertaken based on the data available on these channels. We present a library that implements this programming model using UPC++, a PGAS library, and evaluate three different parallelization strategies, one based on rank-sequential execution, one based on multiple threads in a rank, and one based on OpenMP tasks. In an evaluation of our library using shallow water proxy applications, our solution compares favorably against an earlier implementation based on X10, and a BSP-based approach.
John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H
John Bachan, Dan Bonachea, Paul H Hargrove, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Scott B Baden, "The UPC++ PGAS library for Exascale Computing", Proceedings of the Second Annual PGAS Applications Workshop (PAW17), November 13, 2017, doi: 10.1145/3144779.3169108
We describe UPC++ V1.0, a C++11 library that supports APGAS programming. UPC++ targets distributed data structures where communication is irregular or fine-grained. The key abstractions are global pointers, asynchronous programming via RPC, and futures. Global pointers incorporate ownership information useful in optimizing for locality. Futures capture data readiness state, are useful for scheduling and also enable the programmer to chain operations to execute asynchronously as high-latency dependencies become satisfied, via continuations. The interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and closely resemble those used in modern C++. Communication in UPC++ runs at close to hardware speeds by utilizing the low-overhead GASNet-EX communication library.
SM Martin, MJ Berger, SB Baden, "Toucan-A Translator for Communication Tolerant MPI Applications", Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017, June 2017, 998-1007, doi: 10.1109/IPDPS.2017.44
We discuss early results with Toucan, a source-to-source translator that automatically restructures C/C++ MPI applications to overlap communication with computation. We co-designed the translator and runtime system to enable dynamic, dependence-driven execution of MPI applications, and require only a modest amount of programmer annotation. Co-design was essential to realizing overlap through dynamic code block reordering and avoiding the limitations of static code relocation and inlining. We demonstrate that Toucan hides significant communication in four representative applications running on up to 24K cores of NERSC's Edison platform. Using Toucan, we have hidden from 33% to 85% of the communication overhead, with performance meeting or exceeding that of painstakingly hand-written overlap variants. © 2017 IEEE.
Tan Nguyen and Scott Baden, "LU Factorization: Towards Hiding Communication Overheads With A Lookahead-free Algorithm", IEEE Cluster 2015, Chicago, IL, September 8, 2015,
Tan Nguyen and Scott Baden, "Bamboo - Preliminary scaling results on multiple hybrid nodes of Knights Corner and Sandy Bridge processors", WOLFHPC: Workshop on Domain-Specific Languages and High-Level Frameworks for HPC, November 19, 2013,
Didem Unat, Xing Cai, Scott Baden, "Optimizing the Aliev-Panfilov Model of Cardiac Excitation on Heterogeneous Systems", Para 2010: State of the Art in Scientific and Parallel Computing, June 6, 2013,
T. Nguyen, P. Cicotti, E. Bylaska, D. Quinlan and S. B. Baden, "Bamboo: Translating MPI applications to a latency-tolerant, data-driven form", Proceedings of the 2012 ACM/IEEE conference on Supercomputing (SC12), November 14, 2012,
Mitesh R. Meswani, Laura Carrington, Didem Unat, Allan Snavely, Scott B. Baden, Stephen Poole, "Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators (workshop version)", IPDPS Workshops, IEEE Computer Society, 2012,
Han Suk Kim, Didem Unat, Scott B. Baden, Jürgen P. Schulze, "Interactive Data-centric Viewpoint Selection", Visualization and Data Analysis, Proc. SPIE 8294, January 2012,
Mitesh R. Meswani, Laura Carrington, Didem Unat, Joshua Peraza, Allan Snavely, Scott Baden, Stephen Poole, "Modeling and Predicting Application Performance on Hardware Accelerators", International Symposium on Workload Characterization (IISWC), IEEE, November 2011, doi: 10.1109/IISWC.2011.6114198
Didem Unat, Xing Cai, Scott B. Baden, "Mint: realizing CUDA performance in 3D stencil methods with annotated C", ICS '11 Proceedings of the international conference on Supercomputing, ACM, June 2011, 214-224, doi: 10.1145/1995896.1995932
Daniel Hefenbrock, Jason Oberg, Nhat Tan Nguyen Thanh, Ryan Kastner and Scott B. Baden, "Accelerating Viola-Jones Face Detection to FPGA-Level using GPUs", Proc 18th Annual International IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '10), May 3, 2010,
Didem Unat, Theodore Hromadka III, Scott B. Baden, "An Adaptive Sub-sampling Method for In-memory Compression of Scientific Data", DCC, IEEE Computer Society, 2009,
McCorquodale, P., Colella, P., Balls, G., Baden, S.B., "A Scalable Parallel Poisson Solver with Infinite-Domain Boundary Conditions", Proceedings of the 7th Workshop on High Performance Scientific and Engineering Computing, Oslo, Norway, June 2005,
Balls, G.T., Baden, S.B., Colella, P., "SCALLOP: A Highly Scalable Parallel Poisson Solver in Three Dimensions", Proceedings, SC'03, Phoenix, Arizona, November, 2003, November 2003,
Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Rob Egan, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Kathy Yelick, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications (ALCF'20), Argonne Leadership Computing Facility (ALCF) Webinar Series, May 27, 2020,
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The UPC++ API offers low-overhead one-sided RMA communication and Remote Procedure Calls (RPC), along with futures and promises. These constructs enable the programmer to express dependencies between asynchronous computations and data movement. UPC++ supports the implementation of simple, regular data structures as well as more elaborate distributed data structures where communication is fine-grained, irregular, or both. The library’s support for asynchrony enables the application to aggressively overlap and schedule communication and computation to reduce wait times.
UPC++ is highly portable and runs on platforms from laptops to supercomputers, with native implementations for HPC interconnects. As a C++ library, it interoperates smoothly with existing numerical libraries and on-node programming models (e.g., OpenMP, CUDA).
In this webinar, hosted by DOE’s Exascale Computing Project and the ALCF, we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through basic algorithm implementations. We will also look at irregular applications and show how they can take advantage of UPC++ features to optimize their performance.
Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Rob Egan, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Kathy Yelick, UPC++: A PGAS/RPC Library for Asynchronous Exascale Communication in C++ (ECP'20), Tutorial at Exascale Computing Project (ECP) Annual Meeting 2020, February 6, 2020,
UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The UPC++ API offers low-overhead one-sided RMA communication and Remote Procedure Calls (RPC), along with futures and promises. These constructs enable the programmer to express dependencies between asynchronous computations and data movement. UPC++ supports the implementation of simple, regular data structures as well as more elaborate distributed data structures where communication is fine-grained, irregular, or both. The library’s support for asynchrony enables the application to aggressively overlap and schedule communication and computation to reduce wait times.
UPC++ is highly portable and runs on platforms from laptops to supercomputers, with native implementations for HPC interconnects. As a C++ library, it interoperates smoothly with existing numerical libraries and on-node programming models (e.g., OpenMP, CUDA).
In this tutorial we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through basic algorithm implementations. We will also look at irregular applications and show how they can take advantage of UPC++ features to optimize their performance.
Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Kathy Yelick, UPC++ Tutorial (NERSC Dec 2019), National Energy Research Scientific Computing Center (NERSC), December 16, 2019,
This event was a repeat of the tutorial delivered on November 1, but with the restoration of the hands-on component which was omitted due to uncertainty surrounding the power outage at NERSC.
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. UPC++ provides mechanisms for low-overhead one-sided communication, moving computation to data through remote-procedure calls, and expressing dependencies between asynchronous computations and data movement. It is particularly well-suited for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces are designed to be composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds.
In this tutorial we introduced basic concepts and advanced optimization techniques of UPC++. We discussed the UPC++ memory and execution models and walked through implementing basic algorithms in UPC++. We also discussed irregular applications and how to take advantage of UPC++ features to optimize their performance. The tutorial included hands-on exercises with basic UPC++ constructs. Registrants were given access to run their UPC++ exercises on NERSC’s Cori (currently the #14 fastest computer in the world).
Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Rob Egan, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Kathy Yelick, UPC++ Tutorial (NERSC Nov 2019), National Energy Research Scientific Computing Center (NERSC), November 1, 2019,
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. UPC++ provides mechanisms for low-overhead one-sided communication, moving computation to data through remote-procedure calls, and expressing dependencies between asynchronous computations and data movement. It is particularly well-suited for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces are designed to be composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds.
In this tutorial we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through implementing basic algorithms in UPC++. We will also look at irregular applications and how to take advantage of UPC++ features to optimize their performance.
Tan Nguyen and Scott Baden, Automating the communication-computation overlap with Bamboo, 2013 SIAM conference on Computational Science and Engineering, February 25, 2013,
John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.9.0", Lawrence Berkeley National Laboratory Tech Report LBNL-2001560, December 2023, doi: 10.25344/S4P01J
UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes. UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.
John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2020.10.0", Lawrence Berkeley National Laboratory Tech Report, October 2020, LBNL 2001368, doi: 10.25344/S4HG6Q
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.
John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.
John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.
John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 8", Lawrence Berkeley National Laboratory Tech Report, September 26, 2018, LBNL 2001179, doi: 10.25344/S45P4X
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.
John Bachan, Scott Baden, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Bryce Lelbach, Brian Van Straalen, "UPC++ Specification v1.0, Draft 6", Lawrence Berkeley National Laboratory Tech Report, March 26, 2018, LBNL 2001135, doi: 10.2172/1430689
UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.
John Bachan, Scott Baden, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Bryce Lelbach, Brian Van Straalen, "UPC++ Specification v1.0, Draft 4", Lawrence Berkeley National Laboratory Tech Report, September 27, 2017, LBNL 2001066, doi: 10.2172/1398521
UPC++ is a C++11 library providing classes and functions that support Asynchronous Partitioned Global Address Space (APGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.
Web Articles
"Pagoda: Communication Software Libraries for Exascale Computing", Mike Bernhardt, Lawrence Berkeley National Laboratory CS Area Communications, April 5, 2018,
A Berkeley Lab team leads the development of communication software libraries with low operating overheads to tap the high performance of DOE’s exascale computers.
Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++ (ECP'19)", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,
Scott B. Baden, Paul H. Hargrove, Dan Bonachea, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - GASNet-EX (ECP'19)", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,
Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet-EX: PGAS Support for Exascale Applications and Runtimes", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'18) Research Poster, November 2018,
Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work is driven by the emerging need for adaptive, lightweight communication in irregular applications at exascale. We present an overview of UPC++ and GASNet-EX, including examples and performance results.
GASNet-EX is a portable, high-performance communication library, leveraging hardware support to efficiently implement Active Messages and Remote Memory Access (RMA). UPC++ provides higher-level abstractions appropriate for PGAS programming such as: one-sided communication (RMA), remote procedure call, locality-aware APIs for user-defined distributed objects, and robust support for asynchronous execution to hide latency. Both libraries have been redesigned relative to their predecessors to meet the needs of exascale computing. While both libraries continue to evolve, the system already demonstrates improvements in microbenchmarks and application proxies.