A - Z Index | Phone Book | Careers

Paul H. Hargrove

PaulHargrove.jpg
Paul H. Hargrove , Ph.D.
Computer Systems Engineer IV
Future Technologies Group
Computer and Data Sciences Department
Phone: +1 510 495 2352
Fax: +1 510 486 6900
Lawrence Berkeley National Laboratory
One Cyclotron Rd MS50A1148
Berkeley, CA 94720

Education

Stanford University, Palo Alto, CA, Ph.D., 2003.
Scientific Computing – Computational Mathematics Program

Cornell University, Ithaca, NY, B.A., 1994.
Triple major: Computer Science, Physics (magna cum laude), and Mathematics

Biographical Sketch

Since September 2000, Paul has been a full-time PI at Lawrence Berkeley National Laboratory (LBNL).  His current research interests include Resilience and Network Communications, both in the context of High-Performance Computing (HPC). Current software projects include Berkeley Lab Checkpoint/Restart (BLCR) for Linux, Global Address Space Networking (GASNet), and Berkeley Unified Parallel C (UPC).

BLCR began as a DOE funded effort to produce a production-quality system-level checkpointing implementation suitable for use in preemptive scheduling, migration and fault tolerance. The BLCR implementation work is now part of the larger DEGAS project at LBNL. Paul was the LBNL PI for two five-year multi-institution projects that funded the initial BLCR efforts, and is still lead developer of the BLCR implementation.

GASNet is a joint project between LBNL and the University of California at Berkeley, funded by the DOE Office of Science and the Department of Defense. GASNet provides an abstraction of a network interconnect suitable for implementation of Partitioned Global Address Space (PGAS) languages such as UPC, Titanium and Co-Array FORTRAN; and is the network runtime layer for the UPC and Titanium implementations at LBNL and UCB, plus several external PGAS compilers. GASNet is intended for use by library writers and as a compilation target, unlike MPI which is an application programmer’s interface. To support PGAS languages, GASNet is oriented toward one-sided memory-to-memory transfers with a rich set of non-blocking primitives. Paul’s most significant contributions to GASNet are the InfiniBand and PAMI ports of GASNet, implementation of a library abstracting hardware atomic operations, and ongoing work to add collective operations support to the GASNet specification and implementation. Paul maintains GASNet’s network-specific code for InfiniBand, Myrinet GM, Cray GNI. Cray Portals and IBM LAPI.  Paul is also an active developer and the lead maintainer of the network-independent portion of the Berkeley UPC runtime library. 

Journal Articles

Rajesh Nishtala, Yili Zheng, Paul Hargrove, Katherine A. Yelick, "Tuning collective communication for Partitioned Global Address Space programming models", Parallel Computing, 2011, 37(9):576-591,

Hongzhang Shan, Filip Blagojevic, Seung-Jai Min, Paul Hargrove, Haoqiang Jin, Karl Fuerlinger, Alice Koniges, Nicholas J. Wright, "A Programming Model Performance Study Using the NAS Parallel Benchmarks", Scientific Programming -Exploring Languages for Expressing Medium to Massive On-Chip Parallelism, August 1, 2010, vol.18,

Katherine A. Yelick, Dan Bonachea, Wei-Yu Chen, Phillip Colella, Kaushik Datta, Jason Duell, Susan L. Graham, Paul Hargrove, Paul N. Hilfinger, Parry Husbands, Costin Iancu, Amir Kamil, Rajesh Nishtala, Jimmy Su, Michael L. Welcome, Tong Wen, "Productivity and performance using partitioned global address space languages", Parallel Symbolic Computation (PASCO), 2007,

Conference Papers

George Almasi, Paul Hargrove, Gabriel Tanase and Yili Zheng, "UPC Collectives Library 2.0", Fifth Conference on Partitioned Global Address Space Programming Models (PGAS11), October 17, 2011,

Chang-Seo Park, Koushik Sen, Paul Hargrove, Costin Iancu, "Efficient data race detection for distributed memory parallel programs", Supercomputing (SC), 2011,

Filip Blagojevic, Paul Hargrove, Costin Iancu, and Katherine Yelick, "Hybrid PGAS Runtime Support for Multicore Nodes", Fourth Conference on Partitioned Global Address Space Programming Model (PGAS10), October 2010,

R. Nishtala, P. Hargrove, D. Bonachea, K. Yelick, "Scaling Communication-Intensive Applications on BlueGene/P Using One-Sided Communication and Overlap", 23rd International Parallel & Distributed Processing Symposium (IPDPS), Rome, May 2009,

D. Bonachea, P. Hargrove, M. Welcome, K. Yelick, "Porting GASNet to Portals: Partitioned Global Address Space (PGAS) Language Support for the Cray XT", Cray Users Group (CUG), May 2009,

Katherine Yelick, Dan Bonachea, Wei-Yu Chen, Phillip Colella, Kaushik Datta, Jason Duell, Susan L. Graham, Paul Hargrove, Paul Hilfinger, Parry Husbands, Costin Iancu, Amir Kamil, Rajesh Nishtala, Jimmy Su, Michael Welcome, Tong Wen, "Productivity and Performance Using Partitioned Global Address Space Languages", Parallel Symbolic Computation (PASCO'07), July 2007,

Partitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a consortium that boasts multiple proprietary and open source compilers. Another PGAS language, Titanium, is a dialect of Java T M designed for high performance scientific computation. In this paper we describe some of the highlights of two related projects, the Titanium project centered at U.C. Berkeley and the UPC project centered at Lawrence Berkeley National Laboratory. Both compilers use a source-to-source strategy that translates the parallel languages to C with calls to a communication layer called GASNet. The result is portable highperformance compilers that run on a large variety of shared and distributed memory multiprocessors. Both projects combine compiler, runtime, and application efforts to demonstrate some of the performance and productivity advantages to these languages.

Paul Hargrove, Jason Duell, Eric Roman, "Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters", Proceedings of SciDAC 2006, June 27, 2006,

Costin Iancu, Parry Husbands, Paul Hargrove, "HUNTing the Overlap", IEEE Parallel Architectures and Compilation Techniques (PACT), 2005,

S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, E. Roman, "The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing", Los Alamos Computer Science Institute Symposium Proceedings (LACSI'03), Santa Fe, NM, October 2003,

Christian Bell, Dan Bonachea, Yannick Cote, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Michael L. Welcome, Katherine A. Yelick, "An Evaluation of Current High-Performance Networks", IPDPS - IEEE International Parallel & Distributed Processing Symposium, 2003,

Presentation/Talks

Paul H. Hargrove, UPC Language Full-day Tutorial, Workshop at UC Berkeley, July 12, 2012,

Paul H. Hargrove, UPC Language Half-day Tutorial, Workshop at UC Berkeley, June 15, 2011,

Paul H. Hargrove, Introduction to UPC, CScADS Workshop, July 21, 2010,

Yili Zheng, Filip Blagojevic, Dan Bonachea, Paul H. Hargrove, Steven Hofmeyr, Costin Iancu, Seung-Jai Min, Katherine Yelick, Getting Multicore Performance with UPC, SIAM Conference on Parallel Processing for Scientific Computing, February 2010,

Rajesh Nishtala, Yili Zheng, Paul H. Hargrove, Katherine Yelick, UPC at Scale, SIAM Conference on Parallel Processing for Scientific Computing, February 25, 2010,

Yili Zheng, Costin Iancu, Paul H. Hargrove, Seung-Jai Min, Katherine Yelick, Extending Unified Parallel C for GPU Computing, SIAM Conference on Parallel Processing for Scientific Computing, February 24, 2010,

Paul H. Hargrove, A Brief Introduction to BLCR (Berkeley Lab Checkpoint/Restart), SIAM Conference on Parallel Processing for Scientific Computing, February 24, 2010,

Paul Hargrove, Jason Duell, Eric Roman, Berkeley Lab Checkpoint/Restart (BLCR): Status and Future Plans, Dagstuhl Seminar: Fault Tolerance in High-Performance Computing and Grids, May 2009,

Paul Hargrove, Jason Duell, Eric Roman, System-level Checkpoint/Restart with BLCR, TeraGrid 2009 Fault Tolerance Workshop, March 19, 2009,

Paul Hargrove, Jason Duell, Eric Roman, System-level Checkpoint/Restart with BLCR, Los Alamos Computer Science Symposium (LACSS08), October 15, 2008,

Paul Hargrove, Jason Duell, Eric Roman, Advanced Checkpoint Fault Tolerance Solutions for HPC, Workshop on Trends, Technologies and Collaborative Opportunities in High Performance and Grid Computing, Bangkok and Phuket Thailand, June 9, 2008,

Paul H. Hargrove, Dan Bonachea, Christian Bell, Experiences Implementing Partitioned Global Address Space (PGAS) Languages on InfiniBand, OpenFabrics Alliance 2008 International Sonoma Workshop, April 2008,

Paul Hargrove, Jason Duell and Eric Roman, An Overview of Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters, Presentation to ParLab group at UC Berkeley, March 18, 2008,

Paul Hargrove, Eric Roman, Jason Duell, Job Preemption with BLCR, Urgent Computing Workshop, April 25, 2007,

Dan Bonachea, Rajesh Nishtala, Paul Hargrove, Katherine Yelick, Efficient Point-to-Point Synchronization in UPC, 2nd Conf. on Partitioned Global Address Space Programming Models (PGAS06), October 4, 2006,

J. Duell, P. Hargrove, E. Roman, An Overview of Berkeley Lab's Linux Checkpoint/Restart, Presentation at LLNL, January 2004,

Reports

W. Kramer, J. Carter, D. Skinner, L. Oliker, P. Husbands, P. Hargrove, J. Shalf, O. Marques, E. Ng, A. Drummond, K. Yelick, "Software Roadmap to Plug and Play Petaflop/s", 2006,

J. Duell, P. Hargrove, E. Roman, "The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart", LBNL Technical Report, December 2002, LBNL 54941,

J. Duell, P. Hargrove, E. Roman, "Requirements for Linux Checkpoint/Restart", LBNL Technical Report, May 2002, LBNL 49659,

Posters

Dan Bonachea, Rajesh Nishtala, Paul Hargrove, Mike Welcome, Kathy Yelick, "Optimized Collectives for PGAS Languages with One-Sided Communication", Poster Session at SuperComputing 2006, November 2006,

P.H. Hargrove, J.C. Duell, "Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters", Proceedings of SciDAC 2006, June 27, 2006,

Dan O Bonachea, Christian Bell, Rajesh Nishtala, Kaushik Datta, Parry Husbands, Paul Hargrove, Katherine Yelick, "The Performance and Productivity Benefits of Global Address Space Languages", Poster Session at SuperComputing 2005, November 2005,

Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Wei Tu, Mike Welcome, Kathy Yelick, "GASNet 2 - An Alternative High-Performance Communication Interface", Poster Session at SuperComputing 2004, November 9, 2004,

Others

Paul Hargrove, Brock Palen, Jeff Squyres, RCE 12: BLCR, RCE Podcast (interview), June 19, 2009,

Brock Palen and Jeff Squyres speak with Paul Hargrove of the Berkley Labratory Checkpoint Restart (BLCR) project