Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research
CRDhpTrillion.jpg

Sifting Through a Trillion Electrons

SDM's Surendra Byna and colleagues from Berkeley Lab’s Computational Research Division teamed up with researchers to develop novel software strategies for storing, mining, and analyzing massive datase…

bz-322-full.jpg

Catching Turbulence in the Solar Wind

Massive datasets plus modelling, visualization and analytics allow researchers to "see" the unseen: the turbulence in solar winds.

Arie award

Arie Shoshani Earns Lifetime Achievement Award

More than 25 years ago, Arie Shoshani realized that researchers were facing significant challenges in organizing, managing and analyzing their scientific data. He set out to develop computer applicati…

The Scientific Data Management (SDM) group enables and accelerates scientific discoveries through effective data management and analysis tools and libraries. The SDM group’s research and development efforts focus on (1) scalable storage and I/O strategies, (2) autonomous data management infrastructure, (3) data life-cycle management, and (4) workflow optimization and automation.  Our group actively works with data generation and analysis workflows to reduce the complexity of large scientific analyses, including complex real-time workflows that could drive the next generation of scientific user facilities.  Members of the SDM group work closely with application scientists throughout the DOE community, academic and industry researchers around the world.  The group has a strong history of publications and contributes to many widely used software systems.  We have strong contributions to well-known I/O libraries including HDF5 and ADIOS; and are the primary developers of FastBit, FasTensor, and so on.

Interim Group Leader: Alex Sim

»Visit the Scientific Data Management (SDM) site.

SDM Publications

Conditional Recurrent Neural Networks for Enhancing Throughput Prediction and Slow File Transfers Detection in Large Science Workflows

January 10, 2025

DISTRI: Development and Integration of Simulation Tools for Resilient Infrastructure

December 15, 2024

Evaluating Performance Trade-offs of Caching Strategies for AI-Powered Querying Systems

December 15, 2024

TensorSearch: Parallel Similarity Search on Tensors

December 15, 2024

Analyzing Parallel I/O

November 21, 2024

SWARM: Scientific Workflow Applications on Resilient Metasystem

November 20, 2024

Predicting Dataset Popularity for Improved Distributed Content Caching in High Energy Physics

November 19, 2024

Comparing Cache Utilization Trends for Regional Scientific Caches with Transfer Learning Models

November 19, 2024

SWARM: Scientific Workflow Applications on Resilient Metasystem

November 19, 2024

IO500: The High-Performance Storage Community

November 19, 2024

Drishti: I/O Insights for All

November 19, 2024

A Study of a Deterministic Networking Framework for Latency Critical Large Scientific Data Transfers

November 18, 2024

Enabling Data Reduction for Flash-X Simulations

November 18, 2024

Accurate in-situ in-transit analysis of particle diffusion for large-scale tokamak simulation

November 18, 2024

Imb-FinDiff: Conditional Diffusion Models for Class Imbalance Synthesis of Financial Tabular Data

November 17, 2024

Exploring the Proactive Data Containers Runtime System in VAST - A Case Study

November 17, 2024

BULKI - Binary Unified Layout for Key-value Interchange

November 17, 2024

Exploring Data Caching Policy with Data Access Patterns from dCache Logs

October 21, 2024

Comparing Cache Utilization Trends for Regional Data Caches

October 21, 2024

HDF5 in the Exascale Era: Delivering Efficient and Scalable Parallel I/O for Exascale Applications

October 16, 2024