Steven Hofmeyr

Steven Hofmeyr Ph.D.

Computer Science Department

SHofmeyr@lbl.gov

Research Interests

Large-scale, high performance metagenome assembly.
Modeling of viral infections and immune responses within the lungs.
Large scale agent based modeling of disease spread in populations.
Modeling of complex systems, such as the Internet and financial markets.
Parallel computing, include HPC and GPU programming, and scheduling and load balancing.
Implementation of data structures such as filters and hash tables on GPUs.
Information security, particularly understanding the impact of policies and regulations and the dynamic interplay of chronic attack and defense on a large scale.
Operating systems to support parallel applications.
Genetic algorithms and evolutionary computation.

Awards

The 2020 IEEE Security and Privacy Test-of-Time award for the 1996 paper, "A Sense of Self for Unix Processes".
Infoworld Innovators of the Year 2004
MIT Technology Review TR100 Innovators of the Year 2003

Journal Articles

Jhe-Yu Liou, Muaaz Awan, Kirtus Leyba, Petr Sulc, Steven Hofmeyr, Carole-Jean Wu, Stephanice Forrest, "Evolving to find optimizations humans miss: using evolutionary computation to improve GPU code for bioinformatics applications", ACM Transactions on Evolutionary Learning and Optimization, November 15, 2024, doi: 10.1145/3703920

Hofmeyr S, Buluç A, Riley R, Egan R, Selvitopi O, Oliker L, Yelick K, Shakya M, Youtsey B, Azad A, "Exabiome: Advancing Microbial Science through Exascale Computing", Computing in Science & Engineering, April 1, 2024, doi: 10.1109/MCSE.2024.3402546

Oliver T, Varghese N, Roux S, Schulz F, Huntemann M, Clum A, Foster B, Foster B, Riley R, LaButti K, Egan R, Hajek P, Mukherjee S, Ovchinnikova G, Reddy TBK, Calhoun S, Hayes RD, Rohwer RR, Zhou Z, Daum C, Copeland A, Chen I-MA, Ivanova NN, Kyrpides NC, Mouncey NJ, del Rio TG, Grigoriev IV, Hofmeyr S, Oliker L, Yelick K, Anantharaman K, McMahon KD, Woyke T, Eloe-Fadrosh EA, "Coassembly and binning of a twenty-year metagenomic time-series from Lake Mendota", Nature Scientific Data, January 1, 2024, doi: 10.1038/S41597-024-03826-8

Riley R, Bowers RM, Camargo AP, Campbell A, Egan R, Eloe-Fadrosh EA, Foster B, Hofmeyr S, Huntemann M, Kellom M, Kimbrel JA, Oliker L, Yelick K, Pett-Ridge J, Salamov A, Varghese NJ, Clum A, "Terabase-Scale Coassembly of a Tropical Soil Microbiome", Microbiology Spectrum, August 17, 2023, doi: 10.1128/SPECTRUM.00200-23

Meyer F, Fritz A, Deng Z-L, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh H-J, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC, "Critical Assessment of Metagenome Interpretation: the second round of challenges", Nature Methods, April 1, 2022, doi: 10.1038/S41592-022-01431-4

Melanie E. Moses, Steven Hofmeyr, Judy L Cannon, Akil Andrews, Rebekah Gridley, Monica Hinga, Kirtus Leyba, Abigail Pribisova, Vanessa Surjadidjaja, Humayra Tasnim, Stephanie Forrest, "Spatially distributed infection increases viral load in a computational model of SARS-CoV-2 lung infection", PLOS Computational Biology, December 2021, 17(12), doi: 10.1371/journal.pcbi.1009735

Muaaz G Awan, Jack Deslippe, Aydin Buluc, Oguz Selvitopi, Steven Hofmeyr, Leonid Oliker, Katherine Yelick, "ADEPT: a domain independent sequence alignment strategy for gpu architectures", BMC Bioinformatics, September 2020, 21, doi: 10.1186/s12859-020-03720-1

Steven Hofmeyr, Rob Egan, Evangelos Georganas, Alex C Copeland, Robert Riley, Alicia Clum, Emiley Eloe-Fadrosh, Simon Roux, Eugene Goltsman, Aydin Buluc, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, "Terabase-scale metagenome coassembly with MetaHipMer", Scientific Reports, June 1, 2020, 10, doi: https://doi.org/10.1038/s41598-020-67416-5

Download File: s41598-020-67416-5.pdf (pdf: 1.4 MB)

Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample (multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer’s scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.

Katherine Yelick, Aydın Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi, Cristina Teodoropol, Leonid Oliker, "The parallelism motifs of genomic data analysis", Philosophical Transactions of The Royal Society A: Mathematical, Physical and Engineering Sciences, 2020,

M. Ferroni, JA Colmenares, S Hofmeyr, JD Kubiatowicz, MD Santambrogio, "Enabling power-awareness for the Xen Hypervisor", ACM SIGBED Review, March 20, 2018, 1:36-42,

M Ferroni, A Corna, A Damiani, R Brondolin, JA Colmenares, S Hofmeyr, JD Kubiatowicz, MD Santambrogio, "Power consumption models for multi-tenant server infrastructures", ACM Transactions on Architecture and Code Optimization, 2017, 14, doi: 10.1145/3148965

Khaled Z. Ibrahim, Steven Hofmeyr, Costin Iancu, "The Case for Partitioning Virtual Machines on Manycore Architectures", IEEE TPDS, April 17, 2014,

Download File: TPDSsubmit.pdf (pdf: 5.6 MB)

S Hofmeyr, J Colmenares, J Kubiatowicz, C Iancu, "Juggle: Addressing Extrinsic Load Imbalances in SPMD Applications on Multicore Computer", Cluster Computing, 2012,

S. Hofmeyr, "The information security technology arms race", Crosstalk: The Journal of Defense Software Engineering, October 1, 2005,

S. Hofmeyr, "New approaches to security: lessons from nature", Secure Convergence Journal, June 1, 2005,

S. Hofmeyr, "Host intrusion detection: part of the operating system or on top of the operating system", Computers & Security, February 1, 2005,

S Hofmeyr, "The implications of immunology for secure systems design", Computers and Security, January 1, 2004, 23:453--455, doi: 10.1016/S0167-4048(04)00166-X

S Hofmeyr, "A new approach to security: Learning from immunology", Information Systems Security, January 1, 2003, 12:29--35, doi: 10.1201/1086/43648.12.4.20030901/77303.6

S. Hofmeyr, "Why today's security technologies are so inadequate: history, implications and new approaches", Information Systems Security, January 1, 2003,

S. Forrest, S. Hofmeyr, "Engineering an immune system", Graft, June 1, 2001,

SA Hofmeyr, S Forrest, "Architecture for an artificial immune system.", Evolutionary computation, 2000, 8:443--473, doi: 10.1162/106365600568257

SA Hofmeyr, S Forrest, A Somayaji, "Intrusion Detection Using Sequences of System Calls", J. Comput. Secur., January 1, 1998, 6:151--180, doi: 10.3233/JCS-980109

AP Kosoresow, SA Hofmeyr, "Intrusion detection via system call traces", IEEE Software, January 1, 1997, 14:35--41, doi: 10.1109/52.605929

S Forrest, SA Hofmeyr, A Somayaji, "Computer Immunology", Communications of the ACM, January 1, 1997, 40:88--96, doi: 10.1145/262793.262811

Conference Papers

Leyba K, Hofmeyr S, Forrest S, Cannon J, Moses M, "SIMCoV-GPU: Accelerating an Agent-Based Model for Exascale", HPDC '24, August 30, 2024, doi: 10.1145/3625549.3658692

Popovici DT, Awan MG, Guidi G, Egan R, Hofmeyr S, Oliker L, Yelick K, "Designing Efficient SIMD Kernels for High Performance Sequence Alignment", 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 19, 2023, doi: 10.1109/IPDPSW59300.2023.00038

McCoy H, Hofmeyr S, Yelick K, Pandey P, "High-Performance Filters for GPUs", Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, February 25, 2023, doi: 10.1145/3572848.3577507

"Singleton Sieving: Overcoming the Memory/Speed Trade-Off in Exascale k-mer Analysis", SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23), January 1, 2023, doi: 10.25344/S4TP4T

Liou J-Y, Awan M, Hofmeyr S, Forrest S, Wu C-J, "Understanding the Power of Evolutionary Computation for GPU Code Optimization", 2022 IEEE International Symposium on Workload Characterization (IISWC), August 11, 2022, doi: 10.1109/IISWC55918.2022.00025

MG Awan, S Hofmeyr, R Egan, N Ding, A Buluc, J Deslippe, L Oliker, K Yelick, "Accelerating Large Scale de novo Metagenome Assembly Using GPUs", International Conference for High Performance Computing, Networking, Storage and Analysis, SC, January 1, 2021, doi: 10.1145/3458817.3476212

C. Imes, S. Hofmeyr, D. I. D. Kang, J. P. Walters, "A Case Study and Characterization of a Many-socket, Multi-tier NUMA HPC Platform", 2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar), November 12, 2020, doi: 10.1109/LLVMHPCHiPar51896.2020.00013

A Zeni, G Guidi, M Ellis, N Ding, MD Santambrogio, S Hofmeyr, A Buluc, L Oliker, K Yelick, "LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment", Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020, 2020, 462--471, doi: 10.1109/IPDPS47924.2020.00055

F Peverelli, LD Tucci, MD Santambrogio, N Ding, S Hofmeyr, A Buluc, L Oliker, K Yelick, "GPU accelerated partial order multiple sequence alignment for long reads self-correction", Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020, 2020, 174--182, doi: 10.1109/IPDPSW50202.2020.00039

John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, Hadia Ahmed, "UPC++: A High-Performance Communication Framework for Asynchronous Computation", 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS'19), Rio de Janeiro, Brazil, IEEE, May 2019, doi: 10.25344/S4V88H

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC).

We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x.

UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

E Georganas, R Egan, S Hofmeyr, E Goltsman, B Arndt, A Tritt, A Buluc, L Oliker, K Yelick, "Extreme scale de novo metagenome assembly", Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, 2019, 122--134, doi: 10.1109/SC.2018.00013

C Imes, S Hofmeyr, H Hofmann, "Energy-efficient application resource scheduling using machine learning classifiers", ACM International Conference Proceeding Series, 2018, doi: 10.1145/3225058.3225088

L Di Tucci, D Conficconi, A Comodi, S Hofmeyr, D Donofrio, MD Santambrogio, "A parallel, energy efficient hardware architecture for the merAligner on FPGA using chisel HCL", Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018, 2018, 214--217, doi: 10.1109/IPDPSW.2018.00041

John Bachan, Dan Bonachea, Paul H Hargrove, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Scott B Baden, "The UPC++ PGAS library for Exascale Computing", Proceedings of the Second Annual PGAS Applications Workshop (PAW17), November 13, 2017, doi: 10.1145/3144779.3169108

We describe UPC++ V1.0, a C++11 library that supports APGAS programming. UPC++ targets distributed data structures where communication is irregular or fine-grained. The key abstractions are global pointers, asynchronous programming via RPC, and futures. Global pointers incorporate ownership information useful in optimizing for locality. Futures capture data readiness state, are useful for scheduling and also enable the programmer to chain operations to execute asynchronously as high-latency dependencies become satisfied, via continuations. The interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and closely resemble those used in modern C++. Communication in UPC++ runs at close to hardware speeds by utilizing the low-overhead GASNet-EX communication library.

M Ellis, E Georganas, R Egan, S Hofmeyr, A Buluç, B Cook, L Oliker, K Yelick, "Performance characterization of de novo genome assembly on leading parallel systems", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, 10417 LN:79--91, doi: 10.1007/978-3-319-64203-1_6

E Georganas, M Ellis, R Egan, S Hofmeyr, A Buluç, B Cook, L Oliker, K Yelick, "MerBench: PGAS benchmarks for high performance genome assembly", Proceedings of PAW 2017: 2nd Annual PGAS Applications Workshop - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis, 2017, 2017-Jan:1--4, doi: 10.1145/3144779.3169109

B Edwards, S Hofmeyr, S Forrest, "Hype and heavy tails: A closer look at data breaches", Journal of Cybersecurity, 2016, 2:3--14, doi: 10.1093/cybsec/tyw003

S Hofmeyr, C Iancu, J Colmenares, E Roman, B Austin, "Time-Sharing Redux for Large-Scale HPC Systems", Proceedings - 18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2016, 2016, 301--308, doi: 10.1109/HPCC-SmartCity-DSS.2016.0051

M Ferroni, JA Colmenares, S Hofmeyr, JD Kubiatowicz, MD Santambrogio, "Enabling power-awareness for the Xen hypervisor", CEUR Workshop Proceedings, 2016, 1697,

E Georganas, A Buluç, J Chapman, S Hofmeyr, C Aluru, R Egan, L Oliker, D Rokhsar, K Yelick, "HipMer: An extreme-scale de novo genome assembler", International Conference for High Performance Computing, Networking, Storage and Analysis, SC, January 1, 2015, 15-20-No, doi: 10.1145/2807591.2807664

B Edwards, S Hofmeyr, S Forrest, M Van Eeten, "Analyzing and modeling longitudinal security data: Promise and pitfalls", ACM International Conference Proceeding Series, 2015, 7-11-Dec:391--400, doi: 10.1145/2818000.2818010

JA Colmenares, G Eads, S Hofmeyr, S Bird, M Moretó, D Chou, B Gluzman, E Roman, DB Bartolini, N Mor, K Asanovi, JD Kubiatowicz, "Tessellation: Refactoring the OS around explicit resource containers with continuous adaptation", Proceedings - Design Automation Conference, 2013, doi: 10.1145/2463209.2488827

B Edwards, T Moore, G Stelle, S Hofmeyr, S Forrest, "Beyond the blacklist: Modeling malware spread and the effect of interventions", Proceedings New Security Paradigms Workshop, January 1, 2012, 53--65,

S. Hofmeyr, T. Moore, S. Forrest, B. Edwards, G. Stelle, "Modeling Internet-Scale Policies for Cleaning up Malware", Workshop on the Economics of Information Security (WEIS 2011), June 14, 2011, doi: 10.1007/978-1-4614-1981-5_7

Download File: weis2011-cleaning-malware.pdf (pdf: 1.1 MB)

KZ Ibrahim, S Hofmeyr, C Iancu, E Roman, "Optimized pre-copy live migration for memory intensive applications", Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2011, doi: 10.1145/2063384.2063437

Download File: VMMigrationsubmit.pdf (pdf: 791 KB)

S Hofmeyr, JA Colmenares, C Iancu, J Kubiatowicz, "Juggle: Proactive load balancing on multicore computers", Proceedings of the IEEE International Symposium on High Performance Distributed Computing, 2011, 3--14, doi: 10.1145/1996130.1996134

KZ Ibrahim, S Hofmeyr, C Iancu, "Characterizing the performance of parallel applications on multi-socket virtual machines", Proceedings - 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2011, 2011, 1--12, doi: 10.1109/CCGrid.2011.50

Download File: ccgrid11submit.pdf (pdf: 427 KB)

JA Colmenares, S Bird, G Eads, S Hofmeyr, A Kim, R Poddar, H Alkaff, K Asanović, J Kubiatowicz, "Tessellation operating system: Building a real-time, responsive, high-throughput client OS for many-core architectures", 2011 IEEE Hot Chips 23 Symposium, HCS 2011, 2011, doi: 10.1109/HOTCHIPS.2011.7477518

J. A. Colmenares, S. Bird, H. Cook, P. Pearce, D. Zhu, J. Shalf, S. Hofmeyr, K. Asanovic, J. Kubiatowicz, "Resource Management in the Tessellation Manycore OS", 2nd Usenix Workshop on Hot Topics in Parallelism (HotPar), June 15, 2010,

Costin Iancu, Steven Hofmeyr, Filip Blagojević, Yili Zheng, "Oversubscription on multicore processors", Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS), April 2010, doi: 10.1109/IPDPS.2010.5470434

Download File: ovsub.pdf (pdf: 449 KB)

Existing multicore systems already provide deep levels of thread parallelism; hybrid programming models and composability of parallel libraries are very active areas of research within the scientific programming community. As more applications and libraries become parallel, scenarios where multiple threads compete for a core are unavoidable. In this paper we evaluate the impact of task oversubscription on the performance of MPI, OpenMP and UPC implementations of the NAS Parallel Benchmarks on UMA and NUMA multi-socket architectures. We evaluate explicit thread affinity management against the default Linux load balancing and discuss sharing and partitioning system management techniques. Our results indicate that oversubscription provides beneficial effects for applications running in competitive environments. Sharing all the available cores between applications provides better throughput than explicit partitioning. Modest levels of oversubscription improve system throughput by 27% and provide better performance isolation of applications from their co-runners: best overall throughput is always observed when applications share cores and each is executed with multiple threads per core. Rather than “resource” symbiosis, our results indicate that the determining behavioral factor when applications share a system is the granularity of the synchronization operations.

S Hofmeyr, C Iancu, F Blagojević, "Load balancing on speed", Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, January 1, 2010, 147--157, doi: 10.1145/1693453.1693475

Download File: ppopp141-hofmeyr.pdf (pdf: 1.1 MB)

R. Liu, K. Klues, S. Bird, S. Hofmeyr, K. Asanovic, J. D. Kubiatowicz, "Tessellation: Space-Time Partitioning in a Manycore Client OS", First USENIX Workshop on Hot Topics in Parallelism, June 15, 2009,

C Iancu, S Hofmeyr, "Runtime optimization of vector operations on large scale SMP clusters", Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, 2008, 122--132, doi: 10.1145/1454115.1454134

Download File: pact120-iancu.pdf (pdf: 1.2 MB)

S Forrest, S Hofmeyr, A Somayaji, "The evolution of system-call monitoring", Proceedings - Annual Computer Security Applications Conference, ACSAC, January 1, 2008, 418--430, doi: 10.1109/ACSAC.2008.54

SA Hofmeyr, S Forrest, "Immunity by Design: An Artificial Immune System", GECCO’99, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., January 1, 1999, 1289--1296,

S. Hofmeyr, S. Forrest, P. D'haeseleer, "An immunological approach to distributed network intrusion detection", Recent Advances in Intrusion Detection (RAID), September 14, 1998,

A Somayaji, S Hofmeyr, S Forrest, "Principles of a computer immune system", Proceedings New Security Paradigms Workshop, January 1, 1998, Part F12:75--82, doi: 10.1145/283699.283742

S Forrest, SA Hofmeyr, A Somayaji, TA Longstaff, "A Sense of Self for Unix Processes", SP ’96, Washington, DC, USA, IEEE Computer Society, January 1, 1996, 120---120-,

Liou J-Y, Awan M, Hofmeyr S, Forrest S, Wu C-J, "Understanding the Power of Evolutionary Computation for GPU Code Optimization", 2022 IEEE International Symposium on Workload Characterization (IISWC), December 31, 1969, doi: 10.1109/IISWC55918.2022.00025

Book Chapters

E. Georganas, S. Hofmeyr, L. Oliker, R. Egan, D. Rokhsar, A. Buluc, K. Yelick, "Extreme-scale de novo genome assembly", Exascale Scientific Applications: Scalability and Performance Portability, edited by T.P. Straatsma, K. B. Antypas, T. J. Williams, ( November 13, 2017) doi: 10.1201/b21930

S. Hofmeyr, "An interpretive introduction to the immune system", Design Principles for the Immune System and Other Distributed Autonomous Systems, ( June 14, 2001)

S Forrest, S Hofmeyr, "Immunology as Information Processing", Design Principles for the Immune Systems and other Distributed Autonomous Systems, ( June 1, 2001)

Presentation/Talks

Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Rob Egan, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Kathy Yelick, UPC++: An Asynchronous RMA/RPC Library for Distributed C++ Applications (ALCF'20), Argonne Leadership Computing Facility (ALCF) Webinar Series, May 27, 2020,

UPC++ is a C++ library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. The UPC++ API offers low-overhead one-sided RMA communication and Remote Procedure Calls (RPC), along with futures and promises. These constructs enable the programmer to express dependencies between asynchronous computations and data movement. UPC++ supports the implementation of simple, regular data structures as well as more elaborate distributed data structures where communication is fine-grained, irregular, or both. The library’s support for asynchrony enables the application to aggressively overlap and schedule communication and computation to reduce wait times.

UPC++ is highly portable and runs on platforms from laptops to supercomputers, with native implementations for HPC interconnects. As a C++ library, it interoperates smoothly with existing numerical libraries and on-node programming models (e.g., OpenMP, CUDA).

In this webinar, hosted by DOE’s Exascale Computing Project and the ALCF, we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through basic algorithm implementations. We will also look at irregular applications and show how they can take advantage of UPC++ features to optimize their performance.

ALCF'20 Event page

ALCF'20 Video recording

Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Rob Egan, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Kathy Yelick, UPC++: A PGAS/RPC Library for Asynchronous Exascale Communication in C++ (ECP'20), Tutorial at Exascale Computing Project (ECP) Annual Meeting 2020, February 6, 2020,

In this tutorial we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through basic algorithm implementations. We will also look at irregular applications and show how they can take advantage of UPC++ features to optimize their performance.

ECP'20 Event page

Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Kathy Yelick, UPC++ Tutorial (NERSC Dec 2019), National Energy Research Scientific Computing Center (NERSC), December 16, 2019,

This event was a repeat of the tutorial delivered on November 1, but with the restoration of the hands-on component which was omitted due to uncertainty surrounding the power outage at NERSC.

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. UPC++ provides mechanisms for low-overhead one-sided communication, moving computation to data through remote-procedure calls, and expressing dependencies between asynchronous computations and data movement. It is particularly well-suited for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces are designed to be composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds.

In this tutorial we introduced basic concepts and advanced optimization techniques of UPC++. We discussed the UPC++ memory and execution models and walked through implementing basic algorithms in UPC++. We also discussed irregular applications and how to take advantage of UPC++ features to optimize their performance. The tutorial included hands-on exercises with basic UPC++ constructs. Registrants were given access to run their UPC++ exercises on NERSC’s Cori (currently the #14 fastest computer in the world).

NERSC Dec 2019 Event page

Amir Kamil, John Bachan, Scott B. Baden, Dan Bonachea, Rob Egan, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Kathy Yelick, UPC++ Tutorial (NERSC Nov 2019), National Energy Research Scientific Computing Center (NERSC), November 1, 2019,

In this tutorial we will introduce basic concepts and advanced optimization techniques of UPC++. We will discuss the UPC++ memory and execution models and walk through implementing basic algorithms in UPC++. We will also look at irregular applications and how to take advantage of UPC++ features to optimize their performance.

NERSC Nov 2019 Event Page

Yili Zheng, Filip Blagojevic, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Costin Iancu, Seung-Jai Min, Katherine Yelick, Getting Multicore Performance with UPC, SIAM Conference on Parallel Processing for Scientific Computing, February 2010,

Download File: Multicore-Performance-with-UPC-SIAMPP10-Zheng.pdf (pdf: 933 KB)

Steven Hofmeyr, New approaches to security: lessons from nature, CSI Netsec, June 1, 2005,

Reports

John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.9.0", Lawrence Berkeley National Laboratory Tech Report LBNL-2001560, December 2023, doi: 10.25344/S4P01J

UPC++ is a C++ library that supports Partitioned Global Address Space (PGAS) programming. It is designed for writing efficient, scalable parallel programs on distributed-memory parallel computers. The key communication facilities in UPC++ are one-sided Remote Memory Access (RMA) and Remote Procedure Call (RPC). The UPC++ control model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. The PGAS memory model additionally provides one-sided RMA communication to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ also features Remote Procedure Call (RPC) communication, making it easy to move computation to operate on data that resides on remote processes. UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.

John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2023.3.0", Lawrence Berkeley National Laboratory Tech Report, March 30, 2023, LBNL 2001517, doi: 10.25344/S43591

UPC++ was designed to support exascale high-performance computing, and the library interfaces and implementation are focused on maximizing scalability. In UPC++, all communication operations are syntactically explicit, which encourages programmers to consider the costs associated with communication and data movement. Moreover, all communication operations are asynchronous by default, encouraging programmers to seek opportunities for overlapping communication latencies with other useful work. UPC++ provides expressive and composable abstractions designed for efficiently managing aggressive use of asynchrony in programs. Together, these design principles are intended to enable programmers to write applications using UPC++ that perform well even on hundreds of thousands of cores.

John Bachan, Scott B. Baden, Dan Bonachea, Johnny Corbino, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.9.0", Lawrence Berkeley National Laboratory Tech Report, September 30, 2022, LBNL 2001479, doi: 10.25344/S4QW26

John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2022.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2022, LBNL 2001453, doi: 10.25344/S41C7Q

John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, Daniel Waters, "UPC++ v1.0 Programmer’s Guide, Revision 2021.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2021, LBNL 2001424, doi: 10.25344/S4SW2T

John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2020.10.0", Lawrence Berkeley National Laboratory Tech Report, October 2020, LBNL 2001368, doi: 10.25344/S4HG6Q

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott B. Baden, Dan Bonachea, Max Grossman, Paul H. Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2020.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2020, LBNL 2001269, doi: 10.25344/S4P88Z

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Programmer’s Guide, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2019, LBNL 2001236, doi: 10.25344/S4V30R

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ v1.0 Specification, Revision 2019.9.0", Lawrence Berkeley National Laboratory Tech Report, September 14, 2019, LBNL 2001237, doi: 10.25344/S4ZW2C

UPC++ is a C++11 library providing classes and functions that support Partitioned Global Address Space (PGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2019.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2019, LBNL 2001191, doi: 10.25344/S4F301

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 10", Lawrence Berkeley National Laboratory Tech Report, March 15, 2019, LBNL 2001192, doi: 10.25344/S4JS30

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer's Guide, v1.0-2018.9.0", Lawrence Berkeley National Laboratory Tech Report, September 2018, LBNL 2001180, doi: 10.25344/S49G6V

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Specification v1.0, Draft 8", Lawrence Berkeley National Laboratory Tech Report, September 26, 2018, LBNL 2001179, doi: 10.25344/S45P4X

John Bachan, Scott Baden, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Bryce Lelbach, Brian Van Straalen, "UPC++ Specification v1.0, Draft 6", Lawrence Berkeley National Laboratory Tech Report, March 26, 2018, LBNL 2001135, doi: 10.2172/1430689

John Bachan, Scott Baden, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian Van Straalen, "UPC++ Programmer’s Guide, v1.0-2018.3.0", Lawrence Berkeley National Laboratory Tech Report, March 2018, LBNL 2001136, doi: 10.2172/1430693

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ Programmer’s Guide, v1.0-2017.9", Lawrence Berkeley National Laboratory Tech Report, September 2017, LBNL 2001065, doi: 10.2172/1398522

UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.

John Bachan, Scott Baden, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Bryce Lelbach, Brian Van Straalen, "UPC++ Specification v1.0, Draft 4", Lawrence Berkeley National Laboratory Tech Report, September 27, 2017, LBNL 2001066, doi: 10.2172/1398521

UPC++ is a C++11 library providing classes and functions that support Asynchronous Partitioned Global Address Space (APGAS) programming. We are revising the library under the auspices of the DOE’s Exascale Computing Project, to meet the needs of applications requiring PGAS support. UPC++ is intended for implementing elaborate distributed data structures where communication is irregular or fine-grained. The UPC++ interfaces for moving non-contiguous data and handling memories with different optimal access methods are composable and similar to those used in conventional C++. The UPC++ programmer can expect communication to run at close to hardware speeds. The key facilities in UPC++ are global pointers, that enable the programmer to express ownership information for improving locality, one-sided communication, both put/get and RPC, futures and continuations. Futures capture data readiness state, which is useful in making scheduling decisions, and continuations provide for completion handling via callbacks. Together, these enable the programmer to chain together a DAG of operations to execute asynchronously as high-latency dependencies become satisfied.

G. Eads, J. Colmenares, S. Hofmeyr, S. Bird, D. Bartolini, D. Chou, B. Glutzman, K. Asanovic, J. D. Kubiatowicz, "Building an Adaptive Operating System for Predictability and Efficiency", University of California, Berkeley Technical Report No. UCB/EECS-2014-137, 2014,

B Edwards, S Hofmeyr, G Stelle, S Forrest, "Internet Topology over Time", arXiv:1202.3993, February 17, 2012,

P Beckman, R Brightwell, BR de Supinski, M Gokhale, S Hofmeyr, S Krishnamoorthy, M Lang, B Maccabe, J Shalf, M Snir, "Exascale Operating Systems and Runtime Software Report", 2012,

Posters

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "Pagoda: Lightweight Communications and Global Address Space Support for Exascale Applications - UPC++ (ECP'19)", Poster at Exascale Computing Project (ECP) Annual Meeting 2019, January 2019,

Scott B. Baden, Paul H. Hargrove, Hadia Ahmed, John Bachan, Dan Bonachea, Steve Hofmeyr, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet-EX: PGAS Support for Exascale Applications and Runtimes", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'18) Research Poster, November 2018,

Lawrence Berkeley National Lab is developing a programming system to support HPC application development using the Partitioned Global Address Space (PGAS) model. This work is driven by the emerging need for adaptive, lightweight communication in irregular applications at exascale. We present an overview of UPC++ and GASNet-EX, including examples and performance results.

GASNet-EX is a portable, high-performance communication library, leveraging hardware support to efficiently implement Active Messages and Remote Memory Access (RMA). UPC++ provides higher-level abstractions appropriate for PGAS programming such as: one-sided communication (RMA), remote procedure call, locality-aware APIs for user-defined distributed objects, and robust support for asynchronous execution to hide latency. Both libraries have been redesigned relative to their predecessors to meet the needs of exascale computing. While both libraries continue to evolve, the system already demonstrates improvements in microbenchmarks and application proxies.

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'18)", Poster at Exascale Computing Project (ECP) Annual Meeting 2018, February 2018,

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian Van Straalen, "UPC++: a PGAS C++ Library", The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'17) Research Poster, November 2017,

John Bachan, Scott Baden, Dan Bonachea, Paul Hargrove, Steven Hofmeyr, Khaled Ibrahim, Mathias Jacquelin, Amir Kamil, Brian van Straalen, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes (ECP'17)", Poster at Exascale Computing Project (ECP) Annual Meeting 2017, January 2, 2017,

Others

S Hofmeyr, Why today’s security technologies are so inadequate: History, implications, and new approaches, Information Security Management Handbook, Sixth Edition, Pages: 2623--2627 2007,