# EDGAR: Energy-efficient Data and Graph Algorithms Research

*This work is supported through a DOE Early Career award from the Office of Advanced Scientific Computing Research (ASCR) (Applied Mathematics) for the period of 2013-2018.*

Data are fundamental sources of insight for experimental and computational sciences. The Department of Energy acknowledges the challenges posed by fast-growing scientific data sets and more complex data. The graph abstraction provides a natural way to represent relationships among complex fast-growing scientific data sets. On future exascale systems, power consumption is of primary concern yet existing graph algorithms consume too much energy per useful operation due to their high communication costs, lack of locality, and inability to exploit hierarchy. This project explores methods to increase the energy efficiency of parallel graph algorithms and data mining tasks. A new family of algorithms will be developed to drastically reduce the energy footprint and running time of the graph and sparse matrix computations that form the basis of various data mining techniques. This project will also exploit the well-known duality between graph and sparse matrices to develop communication-avoiding graph algorithms that consume significantly less power. This project is relevant to DOE mission-critical science including bioinformatics and genomics with particular emphasis on plant genomics that can result in better biofuels through efficient genetic mapping, climate science where recent graph-based methods show increased accuracy in hurricane predictions, and combustion science where graph search techniques are used to analyze extreme-scale simulation data.

Currently, we are designing new parallel algorithms that reduce the communication costs of data and graph analysis algorithms. In many cases, these algorithms have very low computational intensities, hence their execution times are bounded by their communication requirements. In addition to improving the running time drastically, reducing communication will also help improve the energy consumption of data and graph algorithms in exascale. This is because the power required to transmit data also depends on the length of the wire: the farther the data, the higher the power usage.

## LBNL Researchers

- Aydın Buluç (PI)
- Ariful Azad
- Carl Yang (LBL/UC Davis)

## Presentations and Slides

- ASCR PI Meeting (2017): Poster 1, Poster 2
- ASCR PI Meeting (2013)
- Highlights from Calender Year 2014
- Highlights from Calender Year 2015
- Highlights from Calender Year 2016

## Progress Reports

Collaborators

- Leonid Oliker (LBL)
- Samuel Williams (LBL)
- The BeBOP group
- Edgar Solomonik (UIUC)
- Oded Schwartz (Hebrew University)
- Grey Ballard (Wake Forest)

- James Demmel (UCB/LBL)

- The Graph Algorithm Platform
- Scott Beamer (LBL)
- Krste Asanović (UCB/LBL)

- The Combinatorial Scientific Computing Lab
- John R. Gilbert (UCSB/LBL)
- Veronika Strnadova-Neeley

- ExaBiome (ECP) project
- Evangelos Georganas
- Kathy Yelick
- Dan Rokhsar

**Publications**

### 2018

### Ariful Azad, Georgios A. Pavlopoulos, Christos A. Ouzounis, Nikos C. Kyrpides, Aydin Buluç, "HipMCL: A high-performance parallel implementation of the Markov cluster algorithm for large scale networks", Nucleic Acids Research, April 2018,

### 2017

### Yang You, Aydin Buluc, James Demmel, "Scaling deep learning on GPU and Knights Landing clusters", Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17), 2017,

### Aydin Buluc, Tim Mattson, Scott McMillan, Jose Moreira, Carl Yang, "Design of the GraphBLAS API for C", IEEE Workshop on Graph Algorithm Building Blocks, IPDPSW, 2017,

- Download File: GABB17.pdf (pdf: 359 KB)

### Ariful Azad, Aydin Buluc, "A work-efficient parallel sparse matrix-sparse vector multiplication algorithm", IEEE International Parallel & Distributed Processing Symposium (IPDPS), Orlando, FL, May 2017,

- Download File: SpMSpV-ipdps17.pdf (pdf: 422 KB)

### Ariful Azad, Mathias Jacquelin, Aydin Bulu\cc, Esmond G Ng, "The reverse Cuthill-McKee algorithm in distributed-memory", Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International, January 2017, 22--31,

- Download File: RCM-ipdps17.pdf (pdf: 1.1 MB)

### 2016

### Ariful Azad, Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, Samuel Williams, "Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication", SIAM Journal on Scientific Computing, 38(6), C624–C651, November 2016, doi: 10.1137/15M104253X

- Download File: SISC-SpGEMM.pdf (pdf: 1.5 MB)

### Jeremy Kepner, Peter Aaltonen, David Bader, Aydin Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, José Moreira, John Owens, Carl Yang, Marcin Zalewski, Timothy Mattson., "Mathematical foundations of the GraphBLAS", IEEE High Performance Extreme Computing (HPEC), September 1, 2016,

### Ariful Azad, Aydın Buluç, "A matrix-algebraic formulation of distributed-memory maximal cardinality matching algorithms in bipartite graphs", Parallel Computing, June 2016,

### Ariful Azad, Aydin Buluç, "Distributed-Memory Algorithms for Maximum Cardinality Matching in Bipartite Graphs", IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2016,

- Download File: MaximumMatchingDist-IPDPS16.pdf (pdf: 620 KB)

### Ariful Azad, Aydın Buluç, Alex Pothen, "Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting", IEEE Transactions on Parallel and Distributed Systems (TPDS), May 2016,

- Download File: matchingGraft-TPDS.pdf (pdf: 1.4 MB)

### Ariful Azad, Aydın Buluç, Distributed-memory algorithms for cardinality matching using matrix algebra, SIAM Conference on Parallel Processing for Scientific Computing (PP), Paris, France, April 2016,

- Download File: Azad-SIAM-PP16.pdf (pdf: 1.3 MB)

### P Koanantakool, A Azad, A Buluc, D Morozov, SY Oh, L Oliker, K Yelick, "Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication", Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016, January 2016, 842--853, doi: 10.1109/IPDPS.2016.117

### 2015

### A. Azad, G. Ballard, A. Buluc, J. Demmel, J. Gilbert, L. Grigori, O. Schwartz, S. Toledo, S. Williams, Parallel Sparse Matrix-Matrix Multiplication and Its Use in Triangle Counting and Enumeration, SIAM ALA, October 26, 2015,

- Download File: SIAM-LA-2015-SpGEMM.pdf (pdf: 3 MB)
- Download File: SIAM-LA-2015-SpGEMM.pptx (pptx: 2.5 MB)

### Ariful Azad, Aydin Buluc, "Distributed-Memory Algorithms for Maximal Cardinality Matching using Matrix Algebra", IEEE Cluster, Chicago, IL, September 2015,

- Download File: maximalMatching-distribute.pdf (pdf: 659 KB)

### Aydin Buluç, Scott Beamer, Kamesh Madduri, Krste Asanović, David Patterson., "Distributed-memory breadth-first search on massive graphs.", In D. Bader (editor), Parallel Graph Algorithms. CRC Press/Taylor-Francis, ( 2015)

### Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, Katherine Yelick, "MerAligner: A Fully Parallel Sequence Aligner", IEEE 29th International Parallel and Distributed Processing Symposium (IPDPS), May 2015, 561--570, doi: 10.1109/IPDPS.2015.96

Aligning a set of query sequences to a set of target sequences is an important task in bioinformatics. In this work we present merAligner, a highly parallel sequence aligner that implements a seed -- and -- extend algorithm and employs parallelism in all of its components. MerAligner relies on a high performance distributed hash table (seed index) and uses one-sided communication capabilities of the Unified Parallel C to facilitate a fine-grained parallelism. We leverage communication optimizations at the construction of the distributed hash table and software caching schemes to reduce communication during the aligning phase. Additionally, merAligner preprocesses the target sequences to extract properties enabling exact sequence matching with minimal communication. Finally, we efficiently parallelize the I/O intensive phases and implement an effective load balancing scheme. Results show that merAligner exhibits efficient scaling up to thousands of cores on a Cray XC30 supercomputer using real human and wheat genome data while significantly outperforming existing parallel alignment tools.

### Ariful Azad, Aydin Buluc, John Gilbert, "Parallel Triangle Counting and Enumeration using Matrix Algebra", Workshop on Graph Algorithms Building Blocks (GABB), in conjunction with IPDPS, IEEE, May 2015,

- Download File: triangles-gabb.pdf (pdf: 384 KB)

### Ariful Azad, Aydin Buluç, Alex Pothen, "A Parallel Tree Grafting Algorithm for Maximum Cardinality Matching in Bipartite Graphs", International Parallel and Distributed Processing Symposium (IPDPS), May 2015,

- Download File: matchingGraft.pdf (pdf: 518 KB)

### Aydin Buluç, Henning Meyerhenke, Ilya Safro, Peter Sanders, Christian Schulz., "Recent advances in graph partitioning", ArXiv, ( 2015)

### J. Chapman, M. Mascher, A. Buluç, K. Barry, E. Georganas, A. Session, V. Strnadova, J. Jenkins, S. Sehgal, L. Oliker, J Schmutz, K. Yelick, U. Scholz, R. Waugh, J. Poland, G. Muehlbauer, N. Stein, D. Rokhsar, "A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome", Genome biology, 2015,

### E Georganas, A Buluç, J Chapman, S Hofmeyr, C Aluru, R Egan, L Oliker, D Rokhsar, K Yelick, "HipMer: An extreme-scale de novo genome assembler", International Conference for High Performance Computing, Networking, Storage and Analysis, SC, January 1, 2015, 15-20-No, doi: 10.1145/2807591.2807664

### 2014

### Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, Katherine Yelick, "Parallel de Bruijn Graph Construction and Traversal for de Novo Genome Assembly", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), November 16, 2014, 437--448, doi: 10.1109/SC.2014.41

- Download File: sc14genome.pdf (pdf: 719 KB)

### Adam Lugowski, Shoaib Kamil, Aydın Buluç, Samuel Williams, Erika Duriakova, Leonid Oliker, Armando Fox, John R. Gilbert,, "Parallel processing of filtered queries in attributed semantic graphs", Journal of Parallel and Distributed Computing (JPDC), September 2014, doi: 10.1016/j.jpdc.2014.08.010

### H. M. Aktulga, A. Buluc, S. Williams, C. Yang, "Optimizing Sparse Matrix-Multiple Vector Multiplication for Nuclear Configuration Interaction Calculations", International Parallel and Distributed Processing Symposium (IPDPS 2014), May 2014, doi: 10.1109/IPDPS.2014.125

- Download File: ipdps14mfdnfinal.pdf (pdf: 631 KB)

### 2013

###
Tim Mattson, David Bader, Jon Berry, Aydin Buluc, Jack Dongarra, Christos Faloutsos, John Feo, John Gilbert, Joseph Gonzalez, Bruce

Hendrickson, Jeremy Kepner, Charles Leiserson, Andrew Lumsdaine, David Padua, Stephen Poole, Steve Reinhardt, Mike Stonebraker, Steve Wallach,

Andrew Yoo,
"Standards for Graph Algorithm Primitives",
HPEC,
2013,

### Grey Ballard, Aydin Buluç, James Demmel, Laura Grigori, Benjamin Lipshitz, Oded Schwartz, Sivan Toledo, "Communication optimal parallel multiplication of sparse random matrices", SPAA 2013: The 25th ACM Symposium on Parallelism in Algorithms and Architectures, Montreal, Canada, 2013, 222-231, doi: 10.1145/2486159.2486196

- Download File: spaa134-ballard.pdf (pdf: 301 KB)

### E. Solomonik, A. Buluç, J. Demmel, "Minimizing communication in all-pairs shortest paths", International Parallel and Distributed Processing Symposium (IPDPS), 2013,

- Download File: 25dapspipdps13.pdf (pdf: 256 KB)

### Scott Beamer, Aydın Buluç, Krste Asanović, David A Patterson, "Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search", Proc. Workshop on Multithreaded Architectures and Applications (MTAAP), in conjunction with IPDPS, 2013,

- Download File: mtaapbottomup2D.pdf (pdf: 357 KB)