Berkeley Lab Checkpoint/Restart (BLCR) Publications
Items are listed chronologically within each section. Since BLCR is an ongoing research project, information in later publications may supersede earlier ones and some aspects of BLCR may not be accurately reflected by any of these publications. In all cases the documentation that accompanies a given source distribution of BLCR should be considered more authoritative than any document here. If in doubt, ask.
Papers And Technical Reports
-
Duell, J., Hargrove, P., and Roman, E. Requirements for Linux Checkpoint/Restart. Berkeley Lab Technical Report (publication LBNL-49659), May 2002.
-
Duell, J., Hargrove, P., and Roman., E. The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart. Berkeley Lab Technical Report (publication LBNL-54941), December 2002.
-
Roman, E. A Survey of Checkpoint/Restart Implementations. Berkeley Lab Technical Report (publication LBNL-54942), July 2002.
-
Sriram Sankaran, Jeffrey M. Squyres, Brian Barrett, Andrew Lumsdaine, Jason Duell, Paul Hargrove, and Eric Roman. The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing. In LACSI Symposium, October 2003. (publication LBNL-53808 Proc.)
-
Paul H. Hargrove and Jason C. Duell. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters. In Proceedings of SciDAC 2006: June 2006. (publication LBNL-60520)
Invited Presentations
- Duell, J., Hargrove, P., and Roman, E. An Overview of Berkeley Lab's Linux Checkpoint/Restart. Presented January 2004 at LLNL.
- Paul H. Hargrove and Jason C. Duell. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters (Poster). Given at SciDAC 2006: June 2006.
- Paul Hargrove (Joint work with Eric Roman and Jason Duell). Job Preemption with BLCR. Urgent Computing Workshop: April 25-6, 2007, Argonne, IL.
- Paul Hargrove (Joint work with Jason Duell and Eric Roman). An Overview of Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters. Presentation to UC Berkeley CS Dept ParLab group: March 2008.
- Paul Hargrove (Joint work with Eric Roman and Jason Duell). Advanced Checkpoint Fault Tolerance Solutions for HPC. Workshop on Trends, Technologies and Collaborative Opportunities in High Performance and Grid Computing: June 9-10, 2008, Bangkok, Thailand and June 12, 2008, Phuket, Thailand.
- Paul Hargrove (Joint work with Eric Roman and Jason Duell) System-level Checkpoint/Restart with BLCR Los Alamos Computer Science Symposium (LACSS08): Oct 13-5, 2008, Santa Fe, NM.
- Paul Hargrove (Joint work with Eric Roman and Jason Duell) System-level Checkpoint/Restart with BLCR TeraGrid 2009 Fault Tolerance Workshop: Mar 19-20, 2009, Albuquerque, NM.
- Paul Hargrove (Joint work with Eric Roman and Jason Duell) Berkeley Lab Checkpoint/Restart (BLCR): Status and Future Plans Dagstuhl Seminar "Fault Tolerance in High-Performance Computing and Grids": May 3-8, 2009, Wadern, Germany.
BLCR in the Media
- March 30, 2009
Berkeley Lab Checkpoint Restart Improves Productivity : article about BLCR was featured in "Berkeley Lab Computing Sciences News", the monthly newsletter of Lawrence Berkeley National Lab's Computing Sciences Division. - March 31, 2009
The Computing Sciences News item, above, was picked up by HPCWire - June 19, 2009
The weekly Research, Computing and Engineering (RCE) Podcast broadcast a show about BLCR (direct MP3 for RCE12)