Benjamin Brock is a Ph.D. candidate in computer science at UC Berkeley advised by Katherine Yelick and Aydın Buluç. His research focuses on building new programming environments, libraries, and tools for high-performance computing.
See his campus web page for current research projects and complete publications.
Katherine Yelick, Aydın Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi, Cristina Teodoropol, Leonid Oliker, "The parallelism motifs of genomic data analysis", Philosophical Transactions of The Royal Society A: Mathematical, Physical and Engineering Sciences, 2020,
O Selvitopi, B Brock, I Nisa, A Tripathy, K Yelick, A Buluç, "Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication", Proceedings of the International Conference on Supercomputing, January 2021, 431--442, doi: 10.1145/3447818.3461472
Benjamin Brock, Aydin Buluç, Timothy G Mattson, Scott McMillan, José E Moreira, Roger Pearce, Oguz Selvitopi, Trevor Steil, "Considerations for a Distributed GraphBLAS API", IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, May 2020, doi: 10.1109/IPDPSW50202.2020.00048
T Groves, B Brock, Y Chen, KZ Ibrahim, L Oliker, NJ Wright, S Williams, K Yelick, "Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches", Proceedings of PMBS 2020: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis, January 2020, 126--137, doi: 10.1109/PMBS51919.2020.00016
- Download File: PMBS20-NVSHMEM-final.pdf (pdf: 659 KB)
B Brock, A Buluç, K Yelick, "BCL: A cross-platform distributed data structures library", ACM International Conference Proceeding Series, January 2019, doi: 10.1145/3337821.3337912
One-sided communication is a useful paradigm for irregular parallel applications, but most one-sided programming environments, including MPI's one-sided interface and PGAS programming languages, lack application-level libraries to support these applications. We present the Berkeley Container Library, a set of generic, cross-platform, high-performance data structures for irregular applications, including queues, hash tables, Bloom filters and more. BCL is written in C++ using an internal DSL called the BCL Core that provides one-sided communication primitives such as remote get and remote put operations. The BCL Core has backends for MPI, OpenSHMEM, GASNet-EX, and UPC++, allowing BCL data structures to be used natively in programs written using any of these programming environments. Along with our internal DSL, we present the BCL ObjectContainer abstraction, which allows BCL data structures to transparently serialize complex data types while maintaining efficiency for primitive types. We also introduce the set of BCL data structures and evaluate their performance across a number of high-performance computing systems, demonstrating that BCL programs are competitive with hand-optimized code, even while hiding many of the underlying details of message aggregation, serialization, and synchronization.
BA Brock, Y Chen, J Yan, J Owens, A Buluç, K Yelick, "RDMA vs. RPC for implementing distributed data structures", 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2019, January 1, 2019, 17--22, doi: 10.1109/IA349570.2019.00009
Distributed data structures are key to implementing scalable applications for scientific simulations and data analysis. In this paper we look at two implementation styles for distributed data structures: remote direct memory access (RDMA) and remote procedure call (RPC). We focus on operations that require individual accesses to remote portions of a distributed data structure, e.g., accessing a hash table bucket or distributed queue, rather than global operations in which all processors collectively exchange information. We look at the trade-offs between the two styles through microbenchmarks and a performance model that approximates the cost of each. The RDMA operations have direct hardware support in the network and therefore lower latency and overhead, while the RPC operations are more expressive but higher cost and can suffer from lack of attentiveness from the remote side. We also run experiments to compare the real-world performance of RDMA- and RPC-based data structure operations with the predicted performance to evaluate the accuracy of our model, and show that while the model does not always precisely predict running time, it allows us to choose the best implementation in the examples shown. We believe this analysis will assist developers in designing data structures that will perform well on current network architectures, as well as network architects in providing better support for this class of distributed data structures.