Parallel Primitives for Randomized Algorithms on Sparse Data

Parallel Primitives for Randomized Algorithms on Sparse Data project targets scalable randomized methods broadly within the context of scientific data analysis. Specifically, we are concerned with sparse problems arising in machine learning, graph algorithms, and computational biology. We develop efficient parallel randomized building blocks that will massively accelerate those problems. Commonly used data structures for computations on discrete data are graphs, trees, and hash tables. Our project targets efficient computations on these data structures and provides scalable parallel building blocks. Our specific toolkit for this project is the use of randomization, which is shown to be effective across various domains where computations are expensively prohibitive to run as deterministic algorithms. Randomized algorithms are often claimed to be simpler and easier to parallelize but this is cold comfort for domain scientists when they are left without provably-efficient computational building blocks from which they can build their applications. Our objective is to move the field of randomized methods on discrete, sparse structures from being a disparate collection of ingenious methods only known to a specialized subset of computer scientists to a well-understood and prevalent computational tool available for domain scientists.