The Complex Systems Group participated in a number of other projects, most of which are led by researchers in other groups or divisions at LBNL. Here is a brief list:
- Computational, data management and analysis methods for the study of a rapidly expanding genome and metagenome sequence data space. CXG's Buluc is working with computational biologists at LBNL to determine if significant savings in storage and processing times are possible by using a pangenomic representation of metagenomes and highly similar isolate organisms. A pangenome consists of a core (shared by all strains of the same species) and a variable part (different in at least one strain). New computational methods are needed to analyze pangenomes. The high level goal of this project is high-performance data mining using a graph representation of the pangenome, resulting in discovery of genomic variations, and core genomes. So far, researchers have accelerated the clustering step of pangenome identification by more than an order of magnitude, and developed capability to identify genomic islands.
- Evaluation of Emerging Execution Models. Working with researchers in the Future Technologies Group, CXG's Buluc is researching dynamically scheduled Bulk Synchronous Parallel (BSP) execution models, as an alternative to statically scheduled BSP models, which have been the dominant approach to parallel computing for over two decades now. Although alternative dynamic asynchronous models have been proposed and even demonstrated, their broader adoption has been hampered by difficulties of scaling up the approach, and the overheads of dynamic scheduling. However, the explosive growth in system parallelism, need for more robust fault recovery, new sources of performance heterogeneity (fault recovery, power management, resource contention), and the need to support more adaptive unstructured algorithms (graphs, adaptive refinement), has prompted a re-examination of alternative execution models to overcome the challenges presented by emerging hardware technology, and are better matched to the requirements of a dynamic/adaptive workload. The goal of this project is to evaluate the need for emerging dynamic/asynchronous execution models, to evaluate the merits of various alternative implementations and to determine and quantify deficiencies in each execution model.
- High-Performance Parallel Graph-Analysis for Key Genomics Computations. CXG's Buluc is collaborating with computational biologists in LBNL's Joint Genome Institute to determine whether graph analytic software can be utilized in high-level genomics calculations. Technological advances in computers and sequencing technology have enabled bioinformatics to develop at an unprecedented rate, especially in terms available data volume that require analysis. Many of these biological datasets, including genome assembly and protein clustering, can be conveniently modeled as large-scale graphs and analyzed using state-of-the-art bioinformatics graph algorithms. However, biologists face significant challenges in effectively leveraging supercomputers due to the complexity of parallelizing these classes of computations on distributed memory systems. The goal of this project is to deliver unprecedented computational capability to large-graph analytics for key bioinformatics applications, via the development and integration of flexible and high-performance parallel graph software packages. The proposed work targets three specific high-level genomics computations, allowing us to significantly advance the analysis capabilities in those areas while simultaneously driving the development of parallel graph and data analysis software. By collaborating closely with computational biologists at the Joint Genome Institute who are directly engaged in solving specific data-intensive problems, we will ensure relevance of new tools to state-of-the-science genomics problems.
- Gamma Radiation Storage Feasibility Study. CXG's Alex Slepoy is participating in a feasibility study (funded by NA-22, the research section of DOE's National Nuclear Security Administration) to determine the viability of the long-term storage, curation and distribution of hundreds of Tbytes of gamma radiation background data. This study includes a survey of the operational and research communities of the gamma detection subject matter experts to determine the utility of storing the data long-term as well as the potential use cases. Slepoy has interviewed dozens of subject matter experts, has surveyed a body of literature, presented a poster, has worked closely with the data collection project (including writing code for their data processing needs, to understand the parameters of the data to be stored) and has taken an active role in the planning and organization of a MISTI Data Workshop.