Scalable Graph Learning for Scientific Discovery

This project targets graph representation learning (GRL), which has been transforming many scientific domains that are of importance to the Department of Energy, such as structural biology, computational chemistry, particle physics, transportation, and program analysis. However, GRL on large-scale problems is extremely expensive, both in terms of memory and computational power. Distributed-memory parallelism is needed for training large GRL models and our communication-avoiding sparse matrix algorithms provide a solid foundation for scaling GRL to unprecedented concurrencies. For our algorithms to be memory frugal, we will develop distributed-memory sampling algorithms for GRL training. Furthermore, rich scientific data can often not be accurately represented as a simple graph, requiring more complex network structures such as hypergraphs. Therefore, we also need efficient distributed-memory learning on hypergraphs and other higher-level network structures for versatility. In summary, this project will provide scalable, memory frugal, and versatile methods for GRL. We specifically target models with high learnability and expressive power.