A new Lanczos method
for electronic structure calculations

K. Wu    A. Canning    H. D. Simon
Lawrence Berkeley National Laboratory/NERSC, Berkeley, CA 94720.
Email: {kwu, acanning, hdsimon}@lbl.gov.



1. Introduction

In the heart of most electronic structure simulation programs, there is a routine to find the solution of eigenvalue problems. Solving these eigenvalue problems usually dominates the computer time used for the whole simulation[6]. Because of their physical properties, these eigenvalue problems are always symmetric real or Hermitian. The dimensions of the matrices involved are usually very large and a large number of eigenvalues and their corresponding eigenvectors are needed to compute the desired physical quantities. To solve this type of problems, we introduce a variant of the Lanczos method called the thick-restart Lanczos method. In material science, this method is most appropriate for non-selfconsistent cases where the eigenvalue problems are linear and the number of required eigenvalues is relatively small compared to the size of the matrix.

The Lanczos method is very simple and yet effective in finding eigenvalues. It is also well suited for parallel computing. There are two common ways of implementing the Lanczos method depending on whether the Lanczos vectors are stored. When the Lanczos vectors are not stored, they may lose orthogonality and the Lanczos method may generate spurious eigenvalues [2, 10]. Though spurious eigenvalues can be effectively identified, we still prefer not to deal with the spurious eigenvalues. When the Lanczos vectors are stored, the loss of orthogonality problem can be corrected by re-orthogonalization [4, 5, 7]. No spurious eigenvalue is generated in this case. However, because each Lanczos step generates one vector, a large amount of computer memory may be required to store all the Lanczos vectors. To limit the maximum amount of memory used, we typically restart the Lanczos algorithm after a certain number of steps. The restarted versions usually use considerably more matrix-vector multiplications than the non-restarted version. In recent years, newly developed restarting strategies have significantly reduced the number of matrix-vector multiplications used. The two most successful ones are the implicit restarting technique [1, 3, 8] and the dynamic thick-restart technique [9, 12]. For symmetric or Hermitian eigenvalue problems, these two schemes are equivalent. Because the thick-restart scheme is easier to implement and it is slightly more flexible than the implicit restarted scheme [9, 12], the new method described here uses the thick-restart scheme. Other thick-restart eigenvalue methods, e.g., the thick-restart Davidson method, can be applied on symmetric eigenvalue problems as well. Compared to them, the main advantage of the new scheme is that it uses less arithmetic operations by taking full advantage of the symmetry of the matrix [13].

2. The thick-restart Lanczos algorithm

The thick-restart Lanczos algorithm combines the Lanczos algorithm with the thick-restart technique to form a new restarted eigenvalue method. It is designed to solve symmetric or Hermitian eigenvalue problems of the form,

displaymath352

where A is the matrix, tex2html_wrap_inline356 is an eigenvalue of A and tex2html_wrap_inline360 is the corresponding eigenvector. The goal of the Lanczos eigenvalue method is to find approximate values of tex2html_wrap_inline356 and tex2html_wrap_inline360 which will be denoted by tex2html_wrap_inline366 and x.

The Lanczos method for eigenvalue problems can be roughly split into two parts, one to construct the Lanczos basis and one to compute the approximate solutions using the Rayleigh-Ritz projection [5]. The approximate eigenvalues and eigenvectors computed using Rayleigh-Ritz projection are commonly referred to as the Ritz values and the Ritz vectors [5]. In the restarted Lanczos algorithm, we construct the basis and perform the Rayleigh-Ritz projection as usual. In addition, it also selects some appropriate Ritz pairs for restarting. The derivation of the thick-restart Lanczos algorithm can be found elsewhere [13]. Here we will describe a version of it that is suitable for floating-point arithmetic implementation. The main difference between this one and the one for exact arithmetic is that this one has a re-orthogonalization step. The re-orthogonalization scheme shown here guarantees that the Lanczos vectors are orthogonal to machine precision ( tex2html_wrap_inline370 ) and coefficients tex2html_wrap_inline372 and tex2html_wrap_inline374 are accurate to order of tex2html_wrap300.

Assuming there is enough computer memory to store m+1 Lanczos vectors, the thick-restart Lanczos algorithm progressively builds its basis vectors as follows.

 
Initialization To start solving a new eigenvalue problem, take a starting vector, normalize it and store the result in tex2html_wrap_inline378 (k=0).

When restarting, the quantities tex2html_wrap301 , tex2html_wrap302, tex2html_wrap303 and tex2html_wrap_inline382 shall satisfy   equation22

Iterate For tex2html_wrap304 ,
  1. tex2html_wrap305 ,
  2. tex2html_wrap306 ,
  3. orthogonalization: If tex2html_wrap307,
      equation31
    else
      equation38
  4. re-orthogonalization:
    • If tex2html_wrap307 , tex2html_wrap309 , else tex2html_wrap310.
    • If tex2html_wrap311, perform the local re-orthogonalization
        equation50
      else if tex2html_wrap312, perform the global re-orthogonalization
        equation61
      else, replace tex2html_wrap_inline384 with a random vector that is orthogonal to tex2html_wrap313.
    • Before updating tex2html_wrap_inline384 using Equation 4 or 5, update tex2html_wrap_inline372 , tex2html_wrap314.
  5. normalization: tex2html_wrap315 , tex2html_wrap316.
    If tex2html_wrap_inline384 is a random vector, set tex2html_wrap_inline374 to zero.

The most important ingredient of the thick-restart Lanczos algorithm is the restarting strategy. A successful restarting scheme can significantly reduce the time needed to find the wanted eigenvalues. The algorithm outlined next is a simple procedure, however, it captures the essence of a successful restarting scheme, i.e., it saves significant portion of the current basis across restarting. For convenience, we define tex2html_wrap317 and tex2html_wrap318.

equation81

Restarting scheme

  1. Find all eigenvalues and eigenvectors of tex2html_wrap_inline394 , tex2html_wrap319 , where the columns of Y are eigenvectors and the diagonal matrix D contains the eigenvalues. The Ritz values are tex2html_wrap320.
  2. If the smallest eigenvalues are sought, order the Ritz values in ascending order, else order the Ritz values in descending order. Order the columns of Y accordingly.
  3. Let tex2html_wrap321 , where tex2html_wrap_inline402 is the number of converged eigenvalues and tex2html_wrap_inline404 is the number of desired eigenvalues.
  4. Denote the first k columns of Y as tex2html_wrap_inline410 , replace the first k columns of tex2html_wrap_inline414 with tex2html_wrap322 , i.e., tex2html_wrap323 , and tex2html_wrap324. The corresponding tex2html_wrap_inline416 and tex2html_wrap_inline418 are defined as:
    equation104

In the actual implementation, the quantities with hat, tex2html_wrap_inline420 , tex2html_wrap_inline416 and tex2html_wrap_inline418 occupy the same storage as the corresponding quantities without hat, tex2html_wrap_inline426 , tex2html_wrap_inline372 and tex2html_wrap_inline374. We distinguish them here only to make it clear what are new quantities to be saved and what are old quantities to be discarded. It is easy to verify that tex2html_wrap_inline420 , tex2html_wrap_inline416 and tex2html_wrap_inline418 satisfy Equation 1 [13], which means that they can be used to restart the Lanczos algorithm. When entering this algorithm for the first time, it is hard to satisfy Equation 1 with k>1. Thus this algorithm is usually started with one vector.

When using the above algorithm, the user need to select a m. The minimum size for m is Neig+6, where Neig is the number of eigenvalues wanted. In many cases, increasing the size of m will increase the effectiveness of the restarted Lanczos algorithm. The algorithm can be implemented to utilize the space reserved for the eigenvectors, in which case, only m-Neig vectors are needed workspace for the restarted Lanczos algorithm. Typically, the restarted Lanczos method may need 20 -- 50 vectors as workspace.

What makes the above algorithm different from the naive explicit restarted Lanczos method is that k is significantly larger than one. When k is set to one during restarting phase, the thick-restart Lanczos algorithm reduces to a simple explicitly restarted Lanczos algorithm. The explicitly restarted Lanczos algorithm is usually effective in find one extreme eigenvalue. The thick-restart Davidson method and Arnoldi method have been shown to be effective in find a large number when eigenvalues [1, 9, 12].

We have given all implementation details except how to generate random vectors and how to perform convergence test. It is rear for the re-orthogonalization procedure to actually need a random vector. When a random vector is needed, one can simple add a few random numbers to the current tex2html_wrap_inline384 and orthogonalize the new tex2html_wrap_inline384 against all existing Lanczos vectors. Typical convergence tests for symmetric eigenvalue problems use either residual norms or estimated errors in the eigenvalues. In the experiment reported later, we declare Ritz pairs converged if their residual norm is less than tex2html_wrap_inline448 , tex2html_wrap325.

3. Performance

We will use two cases of one electronic structure simulation to demonstrate the effectiveness of the new method. The simulation program was used by Zunger and colleagues to compute the electron distribution of the semiconductor-embedded quantum dots [11, 14]. The simulation uses the empirical pseudopotential method which generates one linear eigenvalue problem for each physical system. The eigenvalues of interests are those near the energy band gap because they are directly related to observable electrical properties of the material. The two test cases are a 512-atom InGaP system and a 9000-atom InGaAs system. Both are embedded in InGa surroundings. Let H denote the discrete form of the Hamiltonian given by the empirical pseudopotential method. The program computes a number of eigenvalues near the band gap by computing the smallest eigenvalues of tex2html_wrap450 with appropriately chosen tex2html_wrap_inline456. When tex2html_wrap_inline456 is near the bottom of the gap, the top of the valence band will be computed and when tex2html_wrap_inline456 is near the top of the gap, the bottom of the conduction band is computed. The matrix H is Hermitian. Both eigenvalues and eigenvectors are computed during the simulation. The eigenvectors are represented as Fourier components. Because of the symmetry, only half of the Fourier components are stored. In the 512-atom case, the grid in real space is 48 tex2html_wrap_inline464 48 tex2html_wrap_inline464 48 and only 6603 Fourier components are stored in Fourier space. In the 9000-atom case, the grid in real space is 240 tex2html_wrap_inline464 36 tex2html_wrap_inline464 320 and only 137919 Fourier components are stored.

The Conjugate Gradient (CG) method were used on this problem [6, 11]. The particular implementation of the CG method alternates between applying 50 CG steps on each individual approximate eigenvector and performing Rayleigh-Ritz projection on the whole set. The tests we conducted compare the CG scheme with the new restarted Lanczos method. The experiment is conducted on a massively parallel compute, the Cray T3E 900. The matrix-vector multiplication is specially optimized for the particular platform and is highly efficient. Since the matrix-vector multiplication is the dominate operation in both CG and the restarted Lanczos method, they both exhibit good parallel efficiency.

   table136
Table 1: Time (seconds) used to find eigenvectors of quantum dots.

Table 1 shows the time used by the CG scheme and the restarted Lanczos (TRL) scheme to find the eigenvectors of the two test problems. The time shown includes the time to setup the problem and time to write out the solutions for further processing. Neig is the number of eigenpairs computed. The CG scheme failed to compute 100 eigenpairs within 4 hours in the 9000-atom case. We feel that existing data have established a clear trend, therefore did not attempt to restart the computations. The basis size m used to obtain the results in Table 1 is Neig+50. Since the space used to store the eigenvectors are used to store the Lanczos vectors as well, We used 51 extra vectors as workspace to store the Lanczos vectors tex2html_wrap451. The restarted Lanczos method uses more workspace than the CG scheme.

The quality of the solutions computed by both schemes are about the same. The eigenvalues computed by the two schemes agree with each other to the specified tolerance (tex2html_wrap_inline448) in all cases. This indicates that the thick-restart Lanczos method can compute multiple eigenvalues as well as the CG scheme used.

From Table 1, we see that as more eigenvectors are computed, the restarted Lanczos method becomes more competitive. In many cases, the restarted Lanczos method uses only 25% of the time used by CG. We attribute this performance difference to two major factors. One, this particular CG scheme computes each eigenvector independently while the restarted Lanczos method computes all the eigenvectors together. Starting with any vector, using CG to minimize the Rayleigh quotient, the resulting vector usually converges to the eigenvector with the smallest eigenvalue. When trying to find 100 eigenvectors, the 100 individual CG iterations performs significant amount of redundant work. In contrast, the thick-restart Lanczos method can efficiently use the matrix-vector multiplications to extract the desired solutions [1, 9, 12]. The second reason that the restarted Lanczos method is more efficient is that we are able to reduce the amount of arithmetic work done by the Lanczos method by using the symmetry of the problem. The CG scheme did not attempt to take advantage of this symmetry. Because of this symmetry, all dot-product operations only produce real numbers and all eigenvalues and eigenvectors of tex2html_wrap_inline394 are real. This reduces the arithmetic operations expect those in the matrix-vector multiplication by half. This reduction can significantly reduces the execution time of the re-orthogonalization steps especially when the basis size m is large.

The thick-restart Lanczos algorithm presented in this paper is most appropriate for non-selfconsistent computations. On two such cases, we have demonstrated that the restarted Lanczos method uses much less time to compute the same eigenvalues and eigenvectors than the Conjugate Gradient method. In most material science computations, e.g., self-consistent fields and molecular dynamics computations, there are good initial guesses for the eigenvectors from previous setp[6]. Since the thick-restart Lanczos algorithm can only take advantage of one starting vector, alternative schemes such as the initial guesses as a block version of the thick-restart Lanczos algorithm and the Conjugate Gradient method, may be more appropriate.

4. Acknowledgment

The program used to conduct the tests are from Dr. Alex Zunger and Lin-Wang Wang of National Renewal Energy Lab. The authors would like express their sincere gratitude to them for the permission to use the program and for numerous help during the numerical experiment.

This work was supported by the Director, Office of Energy Research, Office of Laboratory Policy and Infrastructure Management, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Energy Research of the U.S. Department of Energy.

5. References

1
D. Calvetti, L. Reichel, and D. Sorensen. An implicitly restarted Lanczos method for large symmetric eigenvalue problems. Electronic Transactions on Numerical Analysis, 2:1-21, 1994.

2
J. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computations: Theory, volume 3 of Progress in Scientific Computing. Birkhauser, Boston, 1985.

3
R. Lehoucq, D. Sorensen, and C. Yang. ARPACK USERS GUIDE: solution of large scale eigenvalue problems with implicitly restarted Arnoldi methods. SIAM, Philadelphia, PA, 1998.

4
B. N. Parlett and D. Scott. The Lanczos algorithm with selective orthogonalization. Math. Comp., 33:217-2388, 1979.

5
Beresford N. Parlett. The symmetric eigenvalue problem. Classics in Applied Mathematics. SIAM, Philadelphia, PA, 1998.

6
M. C. Payne, M. P. Teter, D. C. Allan, T. A. Arias, and J. D. Joannopoulos. Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients. Review of Modern Physics, 64:1045-1097, 1992.

7
Horst D. Simon. Analysis of the symmetric Lanczos algorithm with reorthogonalization methods. Lin. Alg. Appl., 61:101-131, 1984.

8
D. S. Sorensen. Implicit application of polynomial filters in a K-step Arnoldi method. SIAM J. Matrix Anal. Appl., 13(1):357-385, 1992.

9
A. Stathopoulos, Y. Saad, and K. Wu. Dynamic thick restarting of the Davidson and the implicitly restarted Arnoldi methods. SIAM J. Sci. Comput., 19(1):227-245, 1998.

10
L.-W. Wang and A. Zunger. Large scale electronic structure calculations using the Lanczos method. Computational Materials Science, 2:326-340, 1994.

11
L.-W. Wang and A. Zunger. Solving Schrodinger's equation around a desired energy: application to silicon quantum dots. Journal of Chemical Physics, 100(3):2394-7, February 1994.

12
Kesheng Wu. Preconditioned Techniques for Large Eigenvalue Problems. PhD thesis, University of Minnesota, 1997. An updated version also appears as Technical Report TR97-038 at the Computer Science Department.

13
Kesheng Wu and Horst Simon. Thick-restart Lanczos method for symmetric eigenvalue problems. Technical Report 41412, Lawrence Berkeley National Laboratory, 1998.

14
Alex Zunger. Electronic-structure theory of semiconductor quantum dots. MRS Bulletin, pages 35-42, February 1998.


software
related work

Kesheng Wu
Fri Aug 7 23:53:39 PDT 1998