Benjamin Brock is a Ph.D. candidate in computer science at UC Berkeley advised by Katherine Yelick and Aydın Buluç. His research focuses on building new programming environments, libraries, and tools for high-performance computing.
See his campus web page for current research projects and complete publications.
Katherine Yelick, Aydın Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi, Cristina Teodoropol, Leonid Oliker, "The parallelism motifs of genomic data analysis", Philosophical Transactions of The Royal Society A: Mathematical, Physical and Engineering Sciences, 2020,
Taylor Groves, Ben Brock, Yuxin Chen, Khaled Ibrahim, Lenny Oliker, Nicholas J. Wright, Samuel Williams, Katherine Yelick, "Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2020,
- Download File: PMBS20-NVSHMEM-final.pdf (pdf: 659 KB)
Benjamin A. Brock, Yuxin Chen, Jiakun Yan, John Owens, Aydın Buluç, Katherine Yelick, "RDMA vs. RPC for Implementing Distributed Data Structures", Proceedings of the 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3), Denver, CO, USA, IEEE, November 18, 2019, doi: 10.1109/IA349570.2019.00009
Distributed data structures are key to implementing scalable applications for scientific simulations and data analysis. In this paper we look at two implementation styles for distributed data structures: remote direct memory access (RDMA) and remote procedure call (RPC). We focus on operations that require individual accesses to remote portions of a distributed data structure, e.g., accessing a hash table bucket or distributed queue, rather than global operations in which all processors collectively exchange information. We look at the trade-offs between the two styles through microbenchmarks and a performance model that approximates the cost of each. The RDMA operations have direct hardware support in the network and therefore lower latency and overhead, while the RPC operations are more expressive but higher cost and can suffer from lack of attentiveness from the remote side. We also run experiments to compare the real-world performance of RDMA- and RPC-based data structure operations with the predicted performance to evaluate the accuracy of our model, and show that while the model does not always precisely predict running time, it allows us to choose the best implementation in the examples shown. We believe this analysis will assist developers in designing data structures that will perform well on current network architectures, as well as network architects in providing better support for this class of distributed data structures.
Benjamin Brock, Aydın Buluç, Katherine Yelick, "BCL: A Cross-Platform Distributed Data Structures Library", ICPP 2019: Proceedings of the 48th International Conference on Parallel Processing, Kyoto, Japan, Association for Computing Machinery, August 2019, doi: 10.1145/3337821.3337912
One-sided communication is a useful paradigm for irregular parallel applications, but most one-sided programming environments, including MPI's one-sided interface and PGAS programming languages, lack application-level libraries to support these applications. We present the Berkeley Container Library, a set of generic, cross-platform, high-performance data structures for irregular applications, including queues, hash tables, Bloom filters and more. BCL is written in C++ using an internal DSL called the BCL Core that provides one-sided communication primitives such as remote get and remote put operations. The BCL Core has backends for MPI, OpenSHMEM, GASNet-EX, and UPC++, allowing BCL data structures to be used natively in programs written using any of these programming environments. Along with our internal DSL, we present the BCL ObjectContainer abstraction, which allows BCL data structures to transparently serialize complex data types while maintaining efficiency for primitive types. We also introduce the set of BCL data structures and evaluate their performance across a number of high-performance computing systems, demonstrating that BCL programs are competitive with hand-optimized code, even while hiding many of the underlying details of message aggregation, serialization, and synchronization.