Berkeley Lab Checkpoint/Restart Flyer for SC2004
Berkeley Lab Checkpoint/Restart (BLCR) for Linux
Researchers at Berkeley Lab's Future Technologies Group (FTG) are implementing checkpoint/restart for Linux. Our goal is to provide a robust, production quality implementation that checkpoints a wide range of applications, without requiring changes to be made to application code. Additionally, we require a preemptive implementation, allowing the use of checkpointing as a batch scheduling tool. Our strategy for meeting these goals is to implement BLCR mostly at the Linux kernel level. Implementing checkpoint/restart in the Linux kernel level poses many challenges, but has many benefits. Traditional UNIX checkpointing, provided by some vendors, requires very large changes to the UNIX kernel. Since we lack full control over the Linux kernel, our checkpoint implementation is designed not as a kernel patch, but as a module; making BLCR an addition to the kernel rather than introducing changes to existing code. Our work builds upon the vmadump kernel module from BProc.
Support for MPI Applications
In addition to the kernel module, BLCR is designed with a user space library component for extensibility - especially in the area of checkpointing communications such as those present in an MPI application. The user space library allows an application or library to block checkpoints during critical sections, and to register special handlers for checkpoint and restart events. Using this interface, FTG and the Indiana University Open Systems Lab have added checkpoint/restart support to LAM/MPI. This work has taught us a great deal about how checkpoint restart should behave in an MPI library. (Our results are described in [1]). Checkpoint/Restart is available in the 7.x versions of LAM/MPI, including TCP/IP and gm support. More information and downloads are available at http://www.lam-mpi.org
Support for Scalable Systems Software
The DOE-funded SciDAC Scalable Systems Software ISIC is building a set of software components and interfaces for resource management of large parallel computers, independent of architecture and Operating System. Checkpoint/Restart is an integral part of the center's general resource management strategy. FTG is funded to develop BLCR as the reference implementation of checkpoint/restart for the Scalable Systems Software Suite, as well as specifying standard (implementation-neutral) interfaces between checkpoint/restart and other software components in the Scalable Systems Software Suite. More information and downloads of the Scalable Systems Software reference suite is available at http://www.csm.ornl.gov/oscar/sss/
Feature Highlights for BLCR
- Fully Open Source (GPL and LGPL) licensing
- Fully SMP safe
- Rebuilds the virtual address space and restores registers
- Supports both the LinuxThreads and new NPTL implementations of POSIX threads
- Restores file descriptors, and state associated with an open file
- Restores signal handlers, signal mask, and pending signals
- Restores the process ID (PID), thread group ID (TGID), parent process ID (PPID), and process tree to old state
- Tested against a variety of Linux 2.4 kernels, distributions and C library versions
- Currently implemented only for IA32
Platforms Supported by BLCR
BLCR is presently limited to IA32 hardware, and has been developed and tested primarily on Red Hat 8 and Red Hat 9 distributions. However, wide portability is a primary goal. Therefore, BLCR is periodically tested against a variety of Linux kernels and distributions:
- Vanilla Linux 2.4 kernels 2.4.0 through 2.4.27
- Full distributions with their associated kernels
- Red Hat 7.1 through 7.3, Red Hat 8, and Red Hat 9
- SuSE Linux 9
- CentOS 3.1
- GNU C library (glibc) versions 2.1 through 2.3
Plans for BLCR
Next steps include
- Special device files such as /dev/null, /dev/zero and /dev/random
- Coherent checkpoints of process groups and sessions
- Process group support allows checkpointing of command pipelines (e.g. grep foo bar | sort)
- Sessions support eases integration with most batch systems and allows checkpointing of login shells
- Support for Linux 2.6
- Support for Opteron, IA64 and PowerPC architectures
BLCR Projects Participants
Berkeley Lab Future Technologies Group (FTG)
- Paul Hargrove
- Jason Duell
- Eric Roman (alumni)
Indiana University Open Systems Lab
- Andrew Lumsdaine
- Jeff Squyres
- Sriram Sankaran (alumni)
- Brian Barrett (alumni)
Source Code Availability and Further Information
BLCR is now available from the Future Technologies Group web site
References
- The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing. In LACSI Symposium, October 2003. Sriram Sankaran, Jeffrey M. Squyres, Brian Barrett, Andrew Lumsdaine, Jason Duell, Paul Hargrove, and Eric Roman.
- The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart., Duell, J., Hargrove, P., and Roman., E.