Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research

Sifting Through a Trillion Electrons

SDM's Surendra Byna and colleagues from Berkeley Lab’s Computational Research Division teamed up with researchers to develop novel software strategies for storing, mining, and analyzing massive datase…


Catching Turbulence in the Solar Wind

Massive datasets plus modelling, visualization and analytics allow researchers to "see" the unseen: the turbulence in solar winds.

Arie award

Arie Shoshani Earns Lifetime Achievement Award

More than 25 years ago, Arie Shoshani realized that researchers were facing significant challenges in organizing, managing and analyzing their scientific data. He set out to develop computer applicati…

The Scientific Data Management (SDM) group enables and accelerates scientific discoveries through effective data management and analysis tools and libraries. The SDM group’s research and development efforts focus on (1) scalable storage and I/O strategies, (2) autonomous data management infrastructure, (3) data life-cycle management, and (4) workflow optimization and automation.  Our group actively works with data generation and analysis workflows to reduce the complexity of large scientific analyses, including complex real-time workflows that could drive the next generation of scientific user facilities.  Members of the SDM group work closely with application scientists throughout the DOE community, academic and industry researchers around the world.  The group has a strong history of publications and contributes to many widely used software systems.  We have strong contributions to well-known I/O libraries including HDF5 and ADIOS; and are the primary developers of FastBit, FasTensor, and so on.

Group Leader: John Wu

»Visit the Scientific Data Management (SDM) site.

SDM Publications

ESnet's In-Network Caching Pilot

June 5, 2023

Predicting Resource Usage Trends with Southern California Petabyte Scale Cache

May 8, 2023

Understanding Data Access Patterns for dCache System

May 8, 2023

Experiences in deploying in-network data caches

May 8, 2023

Effectiveness and predictability of in-network storage cache for Scientific Workflows

February 20, 2023

Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective

January 8, 2023

Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems

January 4, 2023

Visualizing I/O bottlenecks with DXT Explorer 2.0

November 17, 2022

Predicting Scientific Dataset Popularity Using dCache Logs

November 16, 2022

Data Throughput Performance Trends of Regional Scientific Data Cache

November 15, 2022

Drishti: Guiding End-Users in the I/O Optimization Journey

November 14, 2022

Feature Engineering and Classification Models for Partial Discharge in Power Transformers

October 25, 2022

April 2019 Darshan counters from the Cori supercomputer [Data set]

August 1, 2022

Design and Implementation of Dynamic I/O Control Scheme for Large Scale Distributed File Systems

July 1, 2022

What Makes You Hold onto That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions

July 1, 2022

Predicting Slow Connections in Scientific Computing

June 30, 2022

Studying Scientific Data Lifecycle in On-demand Distributed Storage Caches

June 30, 2022

Access Trends of In-network Cache for Scientific Data

June 30, 2022

SNTA’22: The 5th Workshop on Systems and Network Telemetry and Analytics

June 28, 2022

Access Patterns and Performance Behaviors of Multi-layer Supercomputer I/O Subsystems under Production Load

June 27, 2022