Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Computer Languages & Systems Software

Colin MacLean

colin maclean
Colin Alexander MacLean
HPC Computer Systems Engineer

Biographical Sketch

Colin MacLean is a member of the CLaSS Group working on the Pagoda Project. Before joining CLaSS, Colin was a member of the NERSC Advanced Technology Group (ATG).  Before that Colin was a Research Programmer at the National Center for Supercomputing Applications at the University of Illinois. He has a M.Sc. in High Performance Computing from the University of Edinburgh and a B.Sc. in Nanotechnology from the University of Leeds.

Journal Articles

Tan Nguyen, Colin MacLean, Marco Siracusa, Douglas Doerfler, Nicholas J. Wright, Samuel Williams, "FPGA‐based HPC accelerators: An evaluation on performance and energy efficiency", CCPE, August 22, 2021, doi: 10.1002/cpe.6570

Colin A. MacLean, Neil C. Hong, James Prendergast, "hapbin: An Efficient Program for Performing Haplotype-Based Scans for Positive Selection in Large Genomic Datasets", Mol Biol Evol, November 2015, 32(11):3027-9, doi: 10.1093/molbev/msv172

Conference Papers

Daniel Waters, Colin A. MacLean, Dan Bonachea, Paul H. Hargrove, "Demonstrating UPC++/Kokkos Interoperability in a Heat Conduction Simulation (Extended Abstract)", Parallel Applications Workshop, Alternatives To MPI+X (PAW-ATM), November 2021, doi: 10.25344/S4630V

We describe the replacement of MPI with UPC++ in an existing Kokkos code that simulates heat conduction within a rectangular 3D object, as well as an analysis of the new code’s performance on CUDA accelerators. The key challenges were packing the halos in Kokkos data structures in a way that allowed for UPC++ remote memory access, and streamlining synchronization costs. Additional UPC++ abstractions used included global pointers, distributed objects, remote procedure calls, and futures. We also make use of the device allocator concept to facilitate data management in memory with unique properties, such as GPUs. Our results demonstrate that despite the algorithm’s good semantic match to message passing abstractions, straightforward modifications to use UPC++ communication deliver vastly improved performance and scalability in the common case. We find the one-sided UPC++ version written in a natural way exhibits good performance, whereas the message-passing version written in a straightforward way exhibits performance anomalies. We argue this represents a productivity benefit for one-sided communication models.

Tan Nguyen, Samuel Williams, Marco Siracusa, Colin MacLean, Douglas Doerfler, Nicholas J. Wright, "The Performance and Energy Efficiency Potential of FPGAs in Scientific Computing", (BEST PAPER) Performance Modeling, Benchmarking, and Simulation of High Performance Computer Systems (PMBS), November 2020,

Colin A. MacLean, HonWai Leong, Jeremy Enos, "Improving the start-up time of python applications on large scale HPC systems", Proceedings of HPCSYSPROS 2017, Denver, CO, November 2017, doi: 10.1145/3155105.3155107

Colin MacLean, "Python Usage Metrics on Blue Waters", Cray User Group, Redmond, WA, May 2017,

Colin A. MacLean, "Maintaining Large Software Stacks in a Cray Ecosystem with Gentoo Portage", Cray User Group, London, England, May 2016,


Paul H. Hargrove, Dan Bonachea, Max Grossman, Amir Kamil, Colin A. MacLean, Daniel Waters, "UPC++ and GASNet: PGAS Support for Exascale Apps and Runtimes", Poster at Exascale Computing Project (ECP) Annual Meeting 2021, April 2021,

We present UPC++ and GASNet-EX, which together enable one-sided, lightweight communication such as arises in irregular applications, libraries and frameworks running on exascale systems.

UPC++ is a C++ PGAS library, featuring APIs for Remote Memory Access (RMA) and Remote Procedure Call (RPC).  The combination of these two features yields performant, scalable solutions to problems of interest within ECP.

GASNet-EX is PGAS communication middleware, providing the foundation for UPC++ and Legion, plus numerous non-ECP clients.  GASNet-EX RMA interfaces match or exceed the performance of MPI-RMA across a variety of pre-exascale systems