Berkeley Lab Checkpoint/Restart (BLCR) Administrator's Guide

This guide describes how to install, configure, and maintain Berkeley Lab Checkpoint/Restart (BLCR) for Linux.

System Requirements

BLCR consists of three kernel modules, some user-level libraries, and several command-line executables. No kernel patching is required.

BLCR has been engineered to work with a wide range of Linux kernels:

BLCR uses assembly code to save some program state (most notably the CPU registers). This means that the BLCR kernel modules are not portable across CPU architectures "out of the box". Currently only x86 and x86_64 systems are fully tested with BLCR. The 0.6.0 release is the first to include experimental support for PowerPC64 and for ARM. The PowerPC port works for both 32- and 64-bit application, but requires a 64-bit kernel at this time.  Porting BLCR to a different CPU is not a large software effort if one has sufficient Linux kernel experience and knowledge of the target CPU's ABI and instructions.  Please contact us if you are interested in contributing a port. We are especially interested in somebody with the time and equipment to port the PowerPC code to 32-bit kernels.

Installing/Configuring BLCR

To build checkpoint/restart, you need the following files: If you run into trouble when following the instructions below, make sure to check both our FAQ (especially the "Build/Install Questions" section), and our bug database, located at http://mantis.lbl.gov/bugzilla/. Your install problem may have already been solved!

Configuring BLCR

BLCR builds and installs much like any other autotools-based distribution:

    % tar zxf blcr-<VERSION>.tar.gz
% cd blcr-<VERSION>
% mkdir builddir
% cd builddir
% ../configure [ options ]
% make
% make install
Depending on which kernel you are building against, and where you wish to put the BLCR libraries, there are a number of options to configure that you need to consider.

We strongly recommend that you configure and build BLCR in a directory other than the one containing the BLCR source code (use of some options to configure actually require this). In the example above the build is conducted in a subdirectory, named 'builddir', of the source directory. Any writable location is fine, but you will need to invoke configure by the correct path in place of '../configure' used in the example.

Check the FAQ if you run into issues building BLCR on your system.

Choosing an installation directory

By default BLCR will install into /usr/local. To choose a different directory tree to install into, pass the '--prefix' flag to configure: However, be aware that using a location other than /usr/local or /usr may require additions to the PATH, MANPATH and LD_LIBRARY_PATH environment variables of users (more details below).

Building against a kernel other than the one that's running

By default, BLCR builds against the kernel that is running on the system at configure time, and looks in a number of standard locations (/usr/src/linux, etc.) for the above files that correspond to it. If you're building for a kernel other than the kernel that is running at the time of the build (or if the source for the running kernel are in non-standard locations), you'll need to pass configure the following option:

Unless System.map or vmlinux exists in the directory given to --with-linux you'll also need to pass one of the following two options:

Building 32-bit application support on a 64-bit platform

BLCR's build logic is capable of building both 64-bit and 32-bit libraries at the same time on most 64-bit platforms it supports.  However, because this feature is new and does not work well with certain setups, it is disabled by default.  To enable it you'll want to pass configure the following option:

If configuration fails with this option specified, you can still configure without it to get only application 64-bit support.

Compiling BLCR

Just type 'make':
    % make

Testing your build (optional, but recommended)

As with many autotools-based packages, BLCR includes a 'check' make target.  However, it cannot run the tests until the kernel modules are loaded (and will tell you so if you forget).  Since the not-yet-installed kernel modules are located throughout the BLCR build directory, an 'insmod' make target is provided to automate this task.  If you are not running as root, "make insmod" will try to use the 'sudo' utility to perform the insmod operations as root.  However, it is not necessary (or recommended) to run the tests themselves as root.  So, we recommend run the following as a non-root user if 'sudo' is installed and configured to allow your user:
    % make insmod check
Which may prompt for a password, depending on how 'sudo' is configured.  If the 'sudo' utility is not installed (or not configured for your user), the following steps are equivalent:
    % su
Password:[type root password here]
# make insmod
# exit
% make check
If the modules fail to load, then your kernel is not supported and you'll need to report this as a bug to the BLCR team, after first checking the bug database to ensure the problem isn't already known (or even fixed).  Similarly, if one or more tests fail, we'll want to know that too.  However, if the only failures are one or two tests that say "restart/timeout" then you should first try increasing the timeout as follows (assuming the kernel modules have already been loaded):
    % make check CRUT_TIMEOUT=120
The 'CRUT_TIMEOUT' is a value in seconds, with a default of 60  (CRUT is an acronym for Checkpoint/Restart Unit Test).

Tests marked 'SKIP' are neither a 'PASS', nor a 'FAIL' - instead they indicate a test that was not actually run. So don't be alarmed if you see one or more tests marked 'SKIP'. This happens when a given test is not applicable to your system (for instance the hugetlbfs test is skipped when no writable mountpoint for hugetlbfs is found).

We do not advise continuing to install BLCR if any tests 'FAIL' (other than timeouts correctable by raising CRUT_TIMEOUT sufficiently).

Installing BLCR

Use the standard 'install' make target to install the BLCR utilities and libraries, and to place the kernel modules in the standard location for your kernel:

    % make install
or, if you prefer stripped binaries:
    % make install-strip

Loading the Kernel Modules

Before you can checkpoint/restart applications, the kernel modules need to be loaded into your kernel. The kernel modules are placed into a subdirectory of the lib/blcr (or lib64/blcr) branch of the installation directory. In this example, we'll assume the installation prefix was the default /usr/local and that your kernel is version 2.6.12-1.234 for an x86. Thus, for this example the kernel modules are in the directory /usr/local/lib/blcr/2.6.12-1.234/. There are three kernel modules in this directory which must be loaded (in the correct order) for BLCR to function.

As root, load the kernel modules in this order:

    # /sbin/insmod /usr/local/lib/blcr/2.6.12-1.234/blcr_imports.ko
# /sbin/insmod /usr/local/lib/blcr/2.6.12-1.234/blcr_vmadump.ko
# /sbin/insmod /usr/local/lib/blcr/2.6.12-1.234/blcr.ko

You may wish to set up your system to load these modules by default at boot time. The exact mechanism for doing so differs between Linux distributions, and thus requires an experienced system administrator. However, a template init script is provided as etc/blcr.rc in the BLCR source directory.

Updating ld.so.cache

Nearly all Linux distributions use a caching mechanism for resolving dynamic library dependencies. If you have installed BLCR's shared library in a directory that is cached by the mechanism, then you will need to update this cache. To do so, run the ldconfig command as root; no command-line arguments are needed.

It should always be safe to run the ldconfig command, even if BLCR did not install its library in a directory managed in the cache. However, if you wish to avoid this step when unneccessary, you can know that BLCR's shared library is in a cached directory if you configured with --prefix= or --libdir= options that cause BLCR's shared library (libcr.so) to be installed in:

Note that if you passed no --prefix= or --libdir= options to BLCR's configure script, then you should check /etc/ld.so.conf and /etc/ld.so.conf.d/ for /usr/local/lib (the default location) to determine if you actually need to run the ldconfig command.

Configuring Users' environments

Finally, you may wish to add the appropriate BLCR directories to the default PATH, LD_LIBRARY_PATH, and MANPATH environment variables for your users. You may either modify the /etc/profile and/or /etc/cshrc files, or add new files in the /etc/profile.d directory. Alternatively, you may provide modules that accomplish the same thing. You should replace PREFIX by the installation prefix (such as /usr/local) in the following examples:

For Bourne-style shells:

    $ PATH=$PATH:PREFIX/bin
$ MANPATH=$MANPATH:PREFIX/man
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:PREFIX/lib
$ export PATH MANPATH LD_LIBRARY_PATH

For csh-style shells:

    % setenv PATH ${PATH}:PREFIX/bin
% setenv MANPATH ${MANPATH}:PREFIX/man
% setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:PREFIX/lib

It is worth noting that if the BLCR libraries are installed in a directory named in /etc/ld.so.conf or /etc/ld.so.conf.d/, then you do not need to add it to LD_LIBRARY_PATH. Similarly, you may find it unnecssary to add to PATH and/or MANPATH if BLCR has been installed in a location that is already searched.

Uninstalling BLCR

If you preserve the BLCR build tree, then there is a standard 'uninstall' make target available to remove the files copied by the 'install' target.

Making RPMs from the BLCR sources

An alternate way to install BLCR is to build a binary RPM for your system, which you can then install. This has certain advantages (such as making upgrading easier, especially if you maintain BLCR on multiple systems).

Building binary RPMs from the source tarball

Once you've configured BLCR with any options your system requires, the simplest method for building RPMs is to just
    % make rpms
If successful, the new RPM packages will be in the rpm/RPMS subdirectory of the build tree. The resulting packages will be for whatever kernel you configured for.

Building a binary RPM from source RPMS

You may also with start from a source RPM (with a .src.rpm suffix) rather than the .tar.gz version of the BLCR distribution. Source RPMs are available on our website. These source RPMs are configured to build for the running kernel, with --prefix=/usr and to configure with --enable-multilib on 64-bit platforms. Alternatively, the 'make rpms' step above will create a source RPM in the rpm/SRPMS subdirectory of the build tree, valid for the configured kernel.

If building as root, built RPMs will be placed in a subdirectory of /usr/src/redhat/RPMS. However, if you are not root, you may need to see this page at IBM for information on configuring an output location before proceeding.  Personally, we prefer not to build as root.

To build binary RPMs from the source RPM, use

    % rpmbuild --rebuild blcr-X.Y.Z-N.src.rpm --target ARCH
replacing blcr-X.Y.Z-N.src.rpm with the correct filename, and ARCH with a specific target CPU. If you don't know your target, try "uname -p" to determine it. If you don't specify a --target, the default will depend on the version of rpmbuild and may be i386 (which will be rejected). See the documentation for rpmbuild for more information on building binary RPMs from source RPMs.

The RPMs should build without error. However, if not building for the running kernel, you may see a warning about this. You will see the location of the binary RPMs in the last few lines of output from rpmbuild - something like this:

    Wrote: /usr/src/redhat/RPMS/i686/blcr-0.6.5-1.i686.rpm
Wrote: /usr/src/redhat/RPMS/i686/blcr-libs-0.6.5-1.i686.rpm
Wrote: /usr/src/redhat/RPMS/i686/blcr-devel-0.6.5-1.i686.rpm
Wrote: /usr/src/redhat/RPMS/i686/blcr-modules_2.6.12_1.234-0.6.5-1.i686.rpm
Wrote: /usr/src/redhat/RPMS/i686/blcr-testsuite-0.6.5-1.i686.rpm
You should note that the kernel version 2.6.12-1.234 has become 2.6.12_1.234 in the name of the blcr-modules package (a change of a hyphen to an underscore).

In most cases, you will want to install the blcr, blcr-libs and blcr-modules binary RPMS. The blcr-devel is only required on machines on which you will compiling/linking source code against BLCR's libraries. So, for a cluster you may want to install blcr-devel only on the front-end node(s).

The blcr-testsuite RPM is optional. You may install and run the testsuite (/usr/libexec/blcr-testsuite/RUN_ME) if you wish to verify correct operation of BLCR. You may be asked to do this if you report bugs to us.

For more information

For more information on Berkeley Lab Checkpoint/Restart for Linux, visit the project home page: http://ftg.lbl.gov/checkpoint
To report bugs (or look for bug fixes prior to reporting new ones), visit http://mantis.lbl.gov/bugzilla