## Intel Parallel Computing Center at LBNL

The Intel Parallel Computing Center at Lawrence Berkeley National Laboratory is focused on advancing the open-source NWChem applications on next generation multicore high-performance computing systems. The aim is to create optimized versions of these important and widely used scientific applications that will enable the scientific community to pursue new frontiers in the fields of chemistry and materials modeling.

The goal is to deliver enhanced versions of NWChem with a significantly increased overall performance on a manycore machine of today over the course of the project. The research and development will be focused upon implementing greater amounts of parallelism in the codes, starting with simple modifications such as adding/modifying OpenMP pragmas and refactoring to enable vectorization to repeatable patterns for performance improvement, all the way to exploring new algorithmic approaches that can better exploit manycore architectures. NWChem is open source and therefore any modifications made will be available to the whole community of users, maximizing the impact of the project.

The will also undertake an extensive outreach and education effort, to ensure that the lessons learned are disseminated to the broader user community at the National Energy Research Scientific Computing center (NERSC). The aim will be to supplement the training and outreach efforts NERSC is already undertaking to support its users on its Intel® Xeon Phi™ (Knights Landing) based Cori supercomputer in 2016.

#### Some highlights from LBNL's IPCC

- Fully threaded ab initio plane wave density functional code with an optimized multithreaded implementation for the Lagrange Multiplier achieves 7.8x speedup over sequential run using 55 threads
- New MPI+OpenMP hybrid implementation attains up to 65× better performance for the triples part of the CCSD(T) compared to existing MPI implementation
- Optimization of two-electron integrals and Fock matrix construction leads to 2.5x speedup
- IPCC work highlighted in Supercomputing Magazine
- Fully OpenMP threaded plane wave code with highly optimized Lagrange Multiplier algorithm (20x speedup over conventional Lagrange algorithm) is shown to scale to 68 threads on one KNL node, and 4x68 threads on 4 KNL nodes
- The paper "Towards highly scalable Ab Initio Molecular Dynamics (AIMD) simulations on the Intel Knights Landing manycore processor" was accepted at IPDPS 2017
- NWChem plane wave work highlighted at LBNL
- Plane wave work highlighted at HPCWire

#### Conference papers describing results of the IPCC

- Advancing Algorithms to Increase Performance of Correlated and Dynamical Electronic Structure Simulation”, in CMMSE 2016: Proceedings of the 16th International Conference on Mathematical Methods in Science and Engineering (2016).
- H. Shan, S. Williams, W.A. de Jong, L. Oliker, “Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture”, in PMAM 2015: The 2015 International Workshop on Programming Models and Applications for Multicores and Manycores, pp 58-67 (2015).
- H. Shan, B. Austin, W.A. de Jong, L. Oliker, N. Wright, E. Apra, “Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms”, in High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. Lecture Notes in Computer Science, 261-280 (2014).

#### Some of the presentations that highlighted and acknowledge the IPCC

- W.A. de Jong, Kostas Vogiatzis, Laura Gagliardi, Chao Yang, Jiri Brabec, M. Jacquelin, Eric Bylaska, “Advancing Algorithms of Correlated and Dynamical Electronic Structure Simulations”, presented at International Conference on Algorithms and Applications for Excited State Electronic Structure Theories 2016, Beijing, China on August 9, 2016.
- W.A. de Jong, M. Jacquelin, L. Gagliardi, “Advancing Algorithms to Increase Performance of Correlated and Dynamical Electronic Structure Simulations”, presented at CMMSE 2016, Rota, Spain on July 6, 2016.
- W.A. de Jong, M. Jacquelin, L. Gagliardi, “Advancing Algorithms to Increase Performance of Electronic Structure Simulations on Many-Core Architectures”, presented at SIAM-PP 2016, Paris, France on April 15, 2016.
- W.A. de Jong, H. Shan, M. Jacquelin, M. Chabbi, “Using Next-generation Architectures to Model Large and Complex Molecular Environments”, presented at SIAM-CSE 2015, Salt Lake City, UT on March 17, 2015.
- W.A. de Jong, L. Lin, C. Yang, H. Shan, L. Oliker, “Modeling large-scale molecular environments with new mathematics approaches and next-generation architectures”, presented at International Conference on Theoretical and High Performance Computational Chemistry, Beijing, China on September 16, 2014.
- W.A. de Jong, L. Lin, C. Yang, H. Shan, L. Oliker, “Towards modeling complex mesoscale molecular environments”, presented at Computational and Mathematical Methods in Science and Engineering (CMMSE) 2014, Cádiz, Spain on July 5, 2014.