Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research

Sifting Through a Trillion Electrons

SDM's Surendra Byna and colleagues from Berkeley Lab’s Computational Research Division teamed up with researchers to develop novel software strategies for storing, mining, and analyzing massive datase…


Catching Turbulence in the Solar Wind

Massive datasets plus modelling, visualization and analytics allow researchers to "see" the unseen: the turbulence in solar winds.

Arie award

Arie Shoshani Earns Lifetime Achievement Award

More than 25 years ago, Arie Shoshani realized that researchers were facing significant challenges in organizing, managing and analyzing their scientific data. He set out to develop computer applicati…

The Scientific Data Management (SDM) group enables and accelerates scientific discoveries through effective data management and analysis tools and libraries. The SDM group’s research and development efforts focus on (1) scalable storage and I/O strategies, (2) autonomous data management infrastructure, (3) data life-cycle management, and (4) workflow optimization and automation.  Our group actively works with data generation and analysis workflows to reduce the complexity of large scientific analyses, including complex real-time workflows that could drive the next generation of scientific user facilities.  Members of the SDM group work closely with application scientists throughout the DOE community, academic and industry researchers around the world.  The group has a strong history of publications and contributes to many widely used software systems.  We have strong contributions to well-known I/O libraries including HDF5 and ADIOS; and are the primary developers of FastBit, FasTensor, and so on.

Group Leader: John Wu

»Visit the Scientific Data Management (SDM) site.

SDM Publications

Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective

January 8, 2023

Data Throughput Performance Trends of Regional Scientific Data Cache

November 15, 2022

Predicting Scientific Dataset Popularity Using dCache Logs

November 15, 2022

Design and Implementation of Dynamic I/O Control Scheme for Large Scale Distributed File Systems

July 1, 2022

What Makes You Hold onto That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions

July 1, 2022

Predicting Slow Connections in Scientific Computing

June 30, 2022

Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches

June 30, 2022

Access Trends of In-network Cache for Scientific Data

June 30, 2022

SNTA’22: The 5th Workshop on Systems and Network Telemetry and Analytics

June 28, 2022

LBNL Superfacility Project Report

June 22, 2022

Adaptive Optimization for Sparse Data on Heterogeneous GPUs

May 30, 2022

Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific Workflow

May 20, 2022

Using Multi-resolution Data to Accelerate Neural Network Training in Scientific Applications

May 16, 2022

Enhancing IoT Anomaly Detection Performance for Federated Learning

May 1, 2022

Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization

April 5, 2022

Transparent Asynchronous Parallel I/O using Background Threads

April 4, 2022

Deploying in-network caches in support of distributed scientific data sharing

March 15, 2022

Automating Data Management Through Unified Runtime Systems

January 24, 2022

Support for In-Flight Data Analyses in Scientific Workflows

January 24, 2022

Data access pattern analysis for dCache storage system

January 12, 2022