Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research

Sifting Through a Trillion Electrons

SDM's Surendra Byna and colleagues from Berkeley Lab’s Computational Research Division teamed up with researchers to develop novel software strategies for storing, mining, and analyzing massive datase…


Catching Turbulence in the Solar Wind

Massive datasets plus modelling, visualization and analytics allow researchers to "see" the unseen: the turbulence in solar winds.

Arie award

Arie Shoshani Earns Lifetime Achievement Award

More than 25 years ago, Arie Shoshani realized that researchers were facing significant challenges in organizing, managing and analyzing their scientific data. He set out to develop computer applicati…

The Scientific Data Management (SDM) group enables and accelerates scientific discoveries through effective data management and analysis tools and libraries. The SDM group’s research and development efforts focus on (1) scalable storage and I/O strategies, (2) autonomous data management infrastructure, (3) data life-cycle management, and (4) workflow optimization and automation.  Our group actively works with data generation and analysis workflows to reduce the complexity of large scientific analyses, including complex real-time workflows that could drive the next generation of scientific user facilities.  Members of the SDM group work closely with application scientists throughout the DOE community, academic and industry researchers around the world.  The group has a strong history of publications and contributes to many widely used software systems.  We have strong contributions to well-known I/O libraries including HDF5 and ADIOS; and are the primary developers of FastBit, FasTensor, and so on.

Group Leader: John Wu

»Visit the Scientific Data Management (SDM) site.

SDM Publications

Comparative Study of the Cache Utilization Trends for Regional Scientific Data Caches

November 13, 2023

Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective

July 5, 2023

Leveraging History to Predict Abnormal Transfers in Distributed Workflows

July 1, 2023

Analyzing Transatlantic Network Traffic Patterns with Scientific Data Caches

June 20, 2023

AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis.

June 16, 2023

ESnet's In-Network Caching Pilot

June 5, 2023

Predicting Resource Usage Trends with Southern California Petabyte Scale Cache

May 8, 2023

Understanding Data Access Patterns for dCache System

May 8, 2023

Experiences in deploying in-network data caches

May 8, 2023

Design and Implementation of I/O Performance Prediction Scheme on HPC Systems through Large-scale Log Analysis

May 1, 2023

Effectiveness and predictability of in-network storage cache for Scientific Workflows

February 20, 2023

Locating Partial Discharges in Power Transformers with Convolutional Iterative Filtering

February 6, 2023

Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective

January 8, 2023

Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems

January 4, 2023

Visualizing I/O bottlenecks with DXT Explorer 2.0

November 17, 2022

Predicting Scientific Dataset Popularity Using dCache Logs

November 16, 2022

Data Throughput Performance Trends of Regional Scientific Data Cache

November 15, 2022

Drishti: Guiding End-Users in the I/O Optimization Journey

November 14, 2022

Feature Engineering and Classification Models for Partial Discharge in Power Transformers

October 25, 2022

April 2019 Darshan counters from the Cori supercomputer [Data set]

August 1, 2022