K. Wu A. Canning H. D. Simon
Lawrence Berkeley National Laboratory/NERSC, Berkeley, CA 94720.
Email: {kwu, acanning, hdsimon}@lbl.gov.
In the heart of most electronic structure simulation programs, there is a routine to find the solution of eigenvalue problems. Solving these eigenvalue problems usually dominates the computer time used for the whole simulation[6]. Because of their physical properties, these eigenvalue problems are always symmetric real or Hermitian. The dimensions of the matrices involved are usually very large and a large number of eigenvalues and their corresponding eigenvectors are needed to compute the desired physical quantities. To solve this type of problems, we introduce a variant of the Lanczos method called the thick-restart Lanczos method. In material science, this method is most appropriate for non-selfconsistent cases where the eigenvalue problems are linear and the number of required eigenvalues is relatively small compared to the size of the matrix.
The Lanczos method is very simple and yet effective in finding eigenvalues. It is also well suited for parallel computing. There are two common ways of implementing the Lanczos method depending on whether the Lanczos vectors are stored. When the Lanczos vectors are not stored, they may lose orthogonality and the Lanczos method may generate spurious eigenvalues [2, 10]. Though spurious eigenvalues can be effectively identified, we still prefer not to deal with the spurious eigenvalues. When the Lanczos vectors are stored, the loss of orthogonality problem can be corrected by re-orthogonalization [4, 5, 7]. No spurious eigenvalue is generated in this case. However, because each Lanczos step generates one vector, a large amount of computer memory may be required to store all the Lanczos vectors. To limit the maximum amount of memory used, we typically restart the Lanczos algorithm after a certain number of steps. The restarted versions usually use considerably more matrix-vector multiplications than the non-restarted version. In recent years, newly developed restarting strategies have significantly reduced the number of matrix-vector multiplications used. The two most successful ones are the implicit restarting technique [1, 3, 8] and the dynamic thick-restart technique [9, 12]. For symmetric or Hermitian eigenvalue problems, these two schemes are equivalent. Because the thick-restart scheme is easier to implement and it is slightly more flexible than the implicit restarted scheme [9, 12], the new method described here uses the thick-restart scheme. Other thick-restart eigenvalue methods, e.g., the thick-restart Davidson method, can be applied on symmetric eigenvalue problems as well. Compared to them, the main advantage of the new scheme is that it uses less arithmetic operations by taking full advantage of the symmetry of the matrix [13].
The thick-restart Lanczos algorithm combines the Lanczos algorithm with the thick-restart technique to form a new restarted eigenvalue method. It is designed to solve symmetric or Hermitian eigenvalue problems of the form,
where A is the matrix,
is
an eigenvalue of A and
is the corresponding
eigenvector. The goal of the Lanczos eigenvalue method is to find
approximate values of
and
which will be denoted by
and x.
The Lanczos method for eigenvalue problems can be roughly split into two
parts, one to construct the Lanczos basis and one to compute the
approximate solutions using the Rayleigh-Ritz projection [5]. The approximate eigenvalues and
eigenvectors computed using Rayleigh-Ritz projection are commonly
referred to as the Ritz values and the Ritz vectors [5]. In the restarted Lanczos algorithm, we
construct the basis and perform the Rayleigh-Ritz projection as usual.
In addition, it also selects some appropriate Ritz pairs for restarting.
The derivation of the thick-restart Lanczos algorithm can be found
elsewhere [13]. Here we will describe a
version of it that is suitable for floating-point arithmetic
implementation. The main difference between this one and the one for
exact arithmetic is that this one has a re-orthogonalization
step. The re-orthogonalization scheme shown here guarantees that the
Lanczos vectors are orthogonal to machine precision (
)
and coefficients
and
are accurate to order of
.
Assuming there is enough computer memory to store m+1 Lanczos vectors, the thick-restart Lanczos algorithm progressively builds its basis vectors as follows.
When restarting, the quantities
,
,
and
shall satisfy
The most important ingredient of the thick-restart Lanczos algorithm is
the restarting strategy. A successful restarting scheme can
significantly reduce the time needed to find the wanted eigenvalues.
The algorithm outlined next is a simple procedure, however, it captures
the essence of a successful restarting scheme, i.e., it saves
significant portion of the current basis across restarting. For
convenience, we define
and
.
In the actual implementation, the quantities with hat,
,
and
occupy the same storage
as the corresponding quantities without hat,
,
and
. We distinguish them here
only to make it clear what are new quantities to be saved and what are
old quantities to be discarded. It is easy to verify that
,
and
satisfy Equation 1 [13], which means that
they can be used to restart the Lanczos algorithm. When entering this
algorithm for the first time, it is hard to satisfy Equation 1 with k>1. Thus this algorithm is usually
started with one vector.
When using the above algorithm, the user need to select a m. The minimum size for m is Neig+6, where Neig is the number of eigenvalues wanted. In many cases, increasing the size of m will increase the effectiveness of the restarted Lanczos algorithm. The algorithm can be implemented to utilize the space reserved for the eigenvectors, in which case, only m-Neig vectors are needed workspace for the restarted Lanczos algorithm. Typically, the restarted Lanczos method may need 20 -- 50 vectors as workspace.
What makes the above algorithm different from the naive explicit restarted Lanczos method is that k is significantly larger than one. When k is set to one during restarting phase, the thick-restart Lanczos algorithm reduces to a simple explicitly restarted Lanczos algorithm. The explicitly restarted Lanczos algorithm is usually effective in find one extreme eigenvalue. The thick-restart Davidson method and Arnoldi method have been shown to be effective in find a large number when eigenvalues [1, 9, 12].
We have given all implementation details except how to generate random
vectors and how to perform convergence test. It is rear for the
re-orthogonalization procedure to actually need a random vector. When a
random vector is needed, one can simple add a few random numbers to the
current
and orthogonalize the new
against all existing Lanczos vectors. Typical
convergence tests for symmetric eigenvalue problems use either residual
norms or estimated errors in the eigenvalues. In the experiment
reported later, we declare Ritz pairs converged if their residual norm
is less than
,
.
We will use two cases of one electronic structure simulation to
demonstrate the effectiveness of the new method. The simulation program
was used by Zunger and colleagues to compute the electron distribution
of the semiconductor-embedded quantum dots [11, 14]. The
simulation uses the empirical pseudopotential method which generates one
linear eigenvalue problem for each physical system. The eigenvalues of
interests are those near the energy band gap because they are directly
related to observable electrical properties of the material. The two
test cases are a 512-atom InGaP system and a 9000-atom InGaAs system.
Both are embedded in InGa surroundings. Let H denote the
discrete form of the Hamiltonian given by the empirical pseudopotential
method. The program computes a number of eigenvalues near the band gap
by computing the smallest eigenvalues of
with appropriately
chosen
. When
is
near the bottom of the gap, the top of the valence band will be computed
and when
is near the top of the
gap, the bottom of the conduction band is computed. The matrix H
is Hermitian. Both eigenvalues and eigenvectors are computed during the
simulation. The eigenvectors are represented as Fourier components.
Because of the symmetry, only half of the Fourier components are stored.
In the 512-atom case, the grid in real space is 48
48
48 and only 6603 Fourier components are stored in
Fourier space. In the 9000-atom case, the grid in real space is 240
36
320 and only 137919
Fourier components are stored.
The Conjugate Gradient (CG) method were used on this problem [6, 11]. The particular implementation of the CG method alternates between applying 50 CG steps on each individual approximate eigenvector and performing Rayleigh-Ritz projection on the whole set. The tests we conducted compare the CG scheme with the new restarted Lanczos method. The experiment is conducted on a massively parallel compute, the Cray T3E 900. The matrix-vector multiplication is specially optimized for the particular platform and is highly efficient. Since the matrix-vector multiplication is the dominate operation in both CG and the restarted Lanczos method, they both exhibit good parallel efficiency.
Table 1: Time (seconds) used to find eigenvectors
of quantum dots.
Table 1 shows the time used by the CG scheme
and the restarted Lanczos (TRL) scheme to find the eigenvectors of the
two test problems. The time shown includes the time to setup the
problem and time to write out the solutions for further processing. Neig
is the number of eigenpairs computed. The CG scheme failed to compute
100 eigenpairs within 4 hours in the 9000-atom case. We feel that
existing data have established a clear trend, therefore did not attempt
to restart the computations. The basis size m used to obtain the
results in Table 1 is Neig+50. Since the
space used to store the eigenvectors are used to store the Lanczos
vectors as well, We used 51 extra vectors as workspace to store the
Lanczos vectors
. The restarted Lanczos method
uses more workspace than the CG scheme.
The quality of the solutions computed by both schemes are about the
same. The eigenvalues computed by the two schemes agree with each other
to the specified tolerance (
) in all cases. This
indicates that the thick-restart Lanczos method can compute multiple
eigenvalues as well as the CG scheme used.
From Table 1, we see that as more
eigenvectors are computed, the restarted Lanczos method becomes more
competitive. In many cases, the restarted Lanczos method uses only 25%
of the time used by CG. We attribute this performance difference to two
major factors. One, this particular CG scheme computes each eigenvector
independently while the restarted Lanczos method computes all the
eigenvectors together. Starting with any vector, using CG to minimize
the Rayleigh quotient, the resulting vector usually converges to the
eigenvector with the smallest eigenvalue. When trying to find 100
eigenvectors, the 100 individual CG iterations performs significant
amount of redundant work. In contrast, the thick-restart Lanczos method
can efficiently use the matrix-vector multiplications to extract the
desired solutions [1, 9, 12]. The
second reason that the restarted Lanczos method is more efficient is
that we are able to reduce the amount of arithmetic work done by the
Lanczos method by using the symmetry of the problem. The CG scheme did
not attempt to take advantage of this symmetry. Because of this
symmetry, all dot-product operations only produce real numbers and all
eigenvalues and eigenvectors of
are real. This reduces
the arithmetic operations expect those in the matrix-vector
multiplication by half. This reduction can significantly reduces the
execution time of the re-orthogonalization steps especially when the
basis size m is large.
The thick-restart Lanczos algorithm presented in this paper is most appropriate for non-selfconsistent computations. On two such cases, we have demonstrated that the restarted Lanczos method uses much less time to compute the same eigenvalues and eigenvectors than the Conjugate Gradient method. In most material science computations, e.g., self-consistent fields and molecular dynamics computations, there are good initial guesses for the eigenvectors from previous setp[6]. Since the thick-restart Lanczos algorithm can only take advantage of one starting vector, alternative schemes such as the initial guesses as a block version of the thick-restart Lanczos algorithm and the Conjugate Gradient method, may be more appropriate.
The program used to conduct the tests are from Dr. Alex Zunger and Lin-Wang Wang of National Renewal Energy Lab. The authors would like express their sincere gratitude to them for the permission to use the program and for numerous help during the numerical experiment.
This work was supported by the Director, Office of Energy Research, Office of Laboratory Policy and Infrastructure Management, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Energy Research of the U.S. Department of Energy.
Kesheng Wu
Fri Aug 7 23:53:39 PDT 1998