# Research Areas

## 1. Dense linear algebra

A number of applications require the solution of eigenvalue problems and singular value decomposition (SVD) of matrices that do not have a particular structure or that are often dense. Usually, dense matrices are first mapped into a tridiagonal matrix (for eigenvalue problems) or a bidiagonal matrix (SVD); solutions obtained from these more compact forms are then mapped back into solutions for the original matrices. We have developed fast sequential algorithms for the solution of eigenvalue problems for tridiagonal matrices, and for the computation of singular values of bidiagonal matrices with high relative accuracy. These algorithms have been implemented in the widely used LAPACK library of dense linear algebra computations. We have also developed an algorithm for the computation of subsets of singular values and vectors, based on an associated tridiagonal eigenvalue problem, which offers an advantageous alternative for applications that do not require a full SVD. The mapping of dense matrices into a more compact form, for large applications of interest, have become a significant bottleneck, especially in numerical libraries like ScaLAPACK. There is therefore a need for strategies that speed up that part of the computations, or that obtain the compact forms in two (or potentially more) stages, e.g. dense to band then band to tridiagonal. We are interested in such strategies for improving the overall performance of dense eigensolvers and SVD calculations. These strategies are important for existing highly parallel multi-core computer architectures; they are expected to become even more relevant on upcoming computer architectures. For more information, contact Osni Marques.

**2. Factorization-based sparse solvers and preconditioners**

We anticipate that more and more simulation codes encompass multiphysics components in multiple spatial and length scales. The resulting discretized sparse linear systems can be highly indefinite, nonsymmetric and extremely ill-conditioned. For such problems, factorization based algorithms are often the most robust algorithmic choices among many alternatives, either being used as direct solvers, or as coarse-grid solvers in multigrid, or as preconditioners for iterative solvers which otherwise rarely converge. Our team has a long track record of conducting innovative algorithm research and producing widely used software packages with sparse factorization algorithms. We are actively conducting R&D in the following areas.

ver using LU factorization for general nonsymmetric sparse linear systems.**Sparse direct sol***Our sparse direct solver SuperLU has long been used in many DOE simulation codes for climate modeling, cosmology, fusion reactors, nuclear reactors, particle accelerator design, porous media / subsurface flow, etc. It is also used as an internal (e.g., coarse-grid) solver in other high level solver algorithms, including hypre, PETSc, SUNDIALS, and Trillinos. The number of downloads of SuperLU in FY15 reached the new peak of over 38000. For more information, contact Sherry Li.*... Our sparse direct symmetric solver symPACK, currently under development, is planned to be used in many DOE codes for electronic structure computations, eigenvalue computations, etc. symPACK relies on efficient communication and scheduling strategies to deliver scalable performance while providing optimal memory consumption. For more information, contact Mathias Jacquelin.**Sparse direct solver using Cholesky or LDL' factorizations for symmetric linear systems**.In addition to exploiting structural sparsity as in 1 and 2 above, this class of algorithms also exploits additional data-sparsity in the dense blocks in the sparse factors. This leads to linear scaling O(N) or nearly so O(N log N) arithmetic complexity for PDEs with smooth kernels. In essence, the HSS representation can be considered as an algebraic generalization of the fast multipole (FMM) method, Our newly developed software package STRUMPACK was released less than a year ago, and already has over 200 downloads. This demonstrates that these types of algorithms are applicable to broad application domains. For more information, contact Pieter Ghysels.**Sparse preconditioners using low-rank approximate factorization algorithms such as Hierarchically Semi-Separable (HSS) structure.**In this method, the global system is first partitioned into smaller interior subdomain systems, which are connected only through separators. The subdomain variables are first eliminated independently with a parallel direct solver. The interface variables are then used to form approximate Schur complement. Finally, the schur complement system is solved iteratively using appoximate Schur complement as a preconditioner. Our solver library PDSLin is used in the accelerator modeling production code Omega3P. For more information, contact Sherry Li.**Non-overlapping domain decomposition based hybrid sparse solver.****Resilient solvers at extreme scale. (TOORSES)**

**3. Eigenvalue calculations **

**The goal of this project is to develop scalable and robust algorithms and solvers for tackling large-scale linear and nonlinear eigenvalue problems. Techniques to be considered include, but are not limited to, implicitly restarted Lanczos and Arnoldi iterations for linear eigenvalue problems; nonlinear Arnoldi, Jacobi-Davidson, rational Krylov, second order Arnoldi methods for one-parameter nonlinear eigenvalue problems; and optimization approaches for solving eigenvalue problems with nonlinearity in the eigenvectors. For more information, contact Chao Yang.**

**Fast Algorithms for Electronic Structure Analysis**

**The goal of this project is to develop fast numerical methods for performing ground-state electronic structure analysis that is based on the Kohn-Sham density functional theory (KSDFT). The standard implementation of KSDFT, which requires partially diagonalizing the Kohn-Sham Hamiltonian scales cubically with respect to the number of electrons in the system. This is prohibitively expensive for large molecule and crystals. Our past and ongoing efforts include: **

- Developing a novel discretization method called adaptive local basis (ALB) expansion to systematically discretize the Kohn-Sham Hamiltonian with a small number of degrees of freedom;
- Developing a novel numerical method called the pole expansion and selected inversion (PEXSI) method to evaluate the electron density in KSDFT without diagonalizing the Kohn-Sham Hamiltonian. The complexity of this approach is at most N^2, where N is the number of electrons;
- Developing a novel accelerating or preconditioning technique for accelerating solution of the Kohn-Sham problem. This technique requires solving an elliptic partial differential equation. Thus we call this approach elliptic preconditioning. It is applicable to insulating, metallic and hybrid/surface systems.

Numerical Methods for Materials at Excited States

The goal of this project is to investigate and develop numerical methods for excited state electronic structure analysis. Excited state calculations are usually performed after a ground state electronic structure analysis has been completed. They are important in a number of applications such as understanding the photovoltaic processes and designing energy efficient batteries. Compared to ground state calculations, the theory and algorithms for excited state calculations are less well developed from a mathematical point of view. The cost of existing computational methods is generally much higher. This project will investigate and develop numerical methods for the GW theory, as well as wavefunction methods such as the configuration interaction and the coupled cluster methods. For more information, contact Lin Lin and Chao Yang.

## 4. Mathematical and computational methods for data analysis in experimental sciences

Our group is actively involved in LBNL's CAMERA project, which is to develop and deploy new mathematical and high performance algorithms to solve the data analysis problems from the experimental facilities, such as Advanced Light Source, Molecular Foundary. The following are highlights of the projects.

*Next generation computing for X-ray science.*We develop new theory, computational methods, HPC code and user interface for computing X-ray scattering patterns with complex nanostructures for ALS and NGLS. Our High-performance glancing-incidence small-angle x-ray scattering (GISAXS) simulation code, HipGISAXS, has been tested on a Cray XE6 CPU cluster with over 140,000 CPU cores and on a Cray XK6 GPU cluster with over 900 GPU nodes. It is three- to four-orders of magnitude faster than the existing code, and allows simulations with multiple scatterings. This is joint work with the Advanced Light Source. Computational Research Divsion: Xiaoye Li (PI), Slim Chourou, Abhinav Sarje. Advanced Light Source: Elaine Chan, Alexander Hexemer.