GraphBLAS: Building Blocks For High Performance Graph Analytics
Berkeley Lab Researchers Contribute to GraphBLAS and Will Leverage it for Exascale Applications
November 21, 2017
Contact: Linda Vu, [email protected], +1 510.495.2402
Many of us thought linear algebra and graph structures were concepts we’d never again have to deal with after high school. However, these concepts underpin a variety of transactions, from Internet searches to cryptography, artificial intelligence and even operation of the power grid. They are also vital to many computational science and parallel computing applications.
Now after nearly five years of collaboration between researchers in academia, industry and national research laboratories—including Aydın Buluç, a scientist in Lawrence Berkeley National Laboratory’s (Berkeley Lab’s) Computational Research Division (CRD)—GraphBLAS, a collection of standardized building blocks for graph algorithms in the language of linear algebra, is publicly available.
“When people talk about artificial intelligence, big data and data analytics, significant pieces of those come down to graph structures, which is a way of representing relationships between items,” says Tim Mattson, an Intel Senior Principal Engineer and a member of the GraphBLAS collaboration.
In the era of big data, Mattson notes that there is an increasing interest in finding patterns and connections in this information by building graphs and exploring their properties.
“This is a newish application area for scalable supercomputing. Graph problems are fairly straightforward to write down mathematically; however, getting them to work on petabytes of data, on a highly distributed system and in a reasonable amount of time, is actually very difficult,” he adds. “But if we can view graphs as linear algebra problems—which have been central to science and engineering applications in high performance computing for decades—then we can immediately apply everything that we’ve learned from parallel supercomputing over the last 35 years to graphs.”
This is where Buluç’s experience proved to be extremely useful. Buluç began applying linear algebra to high performance graph analysis nearly a decade ago when he was a graduate student at the University of California Santa Barbra (UCSB). For his Ph.D. thesis, he created Combinational BLAS, an extensible distributed-memory parallel graph library offering a small but powerful set of linear algebra primitives specifically targeting graph analytics. This library later partly inspired the bigger GraphBLAS effort. Another CRD Scientist, Ariful Azad, is a major contributor to the Combinatorial BLAS library and graph applications that use Combinatorial BLAS for scalability.
After earning his doctorate, Buluç continued this work at Berkeley Lab as a Luis Alvarez fellow and then as a research scientist. Along the way he also began collaborating with Jeremy Kepner, a Lincoln Laboratory Fellow at the Massachusetts Institute of Technology (MIT). Mattson notes that Buluç and Kepner were driving forces in the modern resurgence to get the community to think about graphs as linear algebra. He connected with both researchers via Buluç’s thesis advisor, UCSB Professor John Gilbert.
“Aydin Buluç was a leader in demonstrating highly scalable implementations of matrix based graph algorithms, his work inspired others to try similar approaches,” says Kepner. “Tim Mattson then championed the idea that a GraphBLAS standard would allow hardware people and software people to work together and magnify our collective efforts.”
According to Mattson, the impetus to create a standard BLAS (Basic Linear Algebra Subprograms) for graph analytics came in 2012 when Intel launched a Science and Technology Center for Big Data at MIT to produce new data management systems and compute architectures for Big Data. As one of the center’s principal investigators, Mattson began building a team of researchers from academic and research institutions across the country with experience in high performance graph analysis, including Buluç, Gilbert and Kepner.
Over the next several years, the collaboration worked to define the mathematical concepts that would go into GraphBLAS. Because this software library was going to be publicly available, it couldn’t be overwhelming for users. So the team aimed to identify the smallest number of linear algebra functions to get the job done. Once the team agreed on the mathematical concepts, a subset of the researchers spent a couple of more years to bind GraphBLAS to the C programming language.
“GraphBLAS is an elegantly small number of functions that are feasible to implement in hardware, as we have demonstrated in the Lincoln Laboratory Graph Processor,” says Kepner. “It allows us to explore graphs with powerful mathematical properties such as associativity, commutativity and distributivity. More recently, GraphBLAS has begun to be of interest to people beyond the graph community, including machine learning, databases, and simulations.”
Thanks to the efforts of Texas A&M Professor Timothy Davis, the GraphBLAS will soon be in many major Linux distributions and in many of the most popular mathematical programming environments in the world. Additionally, hardware manufacturers are starting to build computers specifically designed to accelerate these operations. Kepner notes that the confluence of these efforts will allow millions to enjoy the benefits of GraphBLAS.
As a member of the ExaGraph: Combinatorial Methods for Enabling Exascale Application Co-Design Center, which is a Department of Energy Exascale Computing Project (ECP), Buluç and Azad plan to use GraphBLAS to develop graph and combinatorial algorithms for exascale systems.
“As science problems get bigger, more computing power will be necessary to address these challenges,” says Buluç. “These large-scale applications have many computational components that we call motifs; that’s the whole idea of co-design. Exascale applications are a patchwork of different motifs, and if we optimize all other motifs for exascale and ignore the graph and combinatorics motifs, we’ll hit a performance bottleneck. That’s why this work is so important.”
Basics of BLAS
Many people are familiar with programming languages—like Python, C, C++, Java, and thousands of others—that are used to create a variety of software and applications. These high-level languages make coding relatively easy for humans but make little sense to computer hardware, which only comprehends low-level binary language of ones and zeros. Low-level programs essentially allow the programmer to have more control over how the computer hardware will perform, which means the developer will be able to ensure optimal software performance on a particular machine.
It turns out that only a small number of low-level routines are required to perform most common linear algebra computing operations. So in 1979, researchers at NASA’s Jet Propulsion Laboratory, Sandia National Laboratories and the University of Texas at Austin publicly released Basic Linear Algebra Subprograms (BLAS), a library of these low-level routines for performing linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations and matrix multiplication.
“When BLAS entered the scene in the late 1970s and early 1980s, it was transformative,” says Mattson. “Instead of handcrafting linear algebra algorithms from scratch, I could build them off of these common building blocks. And if a vendor would optimize those common building blocks just for their hardware, I could gain the benefits of that optimization pretty much for free.”
In subsequent years, various research collaborations created a variety of BLAS libraries for different tasks. Realizing the benefits to users, vendors also worked with researchers to optimize these building blocks to run on their hardware. GraphBLAS is essentially a continuation of this BLAS heritage.
“My hope is that GraphBLAS will be just as remarkable for those doing high performance graph analytics,” adds Mattson.
In addition to Buluç, Gilbert, Kepner and Mattson, other members of the GraphBLAS steering committee include David Bader of Georgia Tech and Henning Meyerhenke Karlsruhe Institute of Technology.
The Department of Energy’s Office of Science partially funded the development of GraphBLAS through the Office of Advanced Scientific Computing Research’s Applied Math Early Career program. Buluç was also awarded a DOE Office of Science Early Career Research Program award in 2013.
Download GraphBLAS specification and reference implementations here: http://graphblas.org/
About Berkeley Lab
Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 14 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.