Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research
CRDhpTrillion.jpg

Sifting Through a Trillion Electrons

SDM's Surendra Byna and colleagues from Berkeley Lab’s Computational Research Division teamed up with researchers to develop novel software strategies for storing, mining, and analyzing massive datase…

bz-322-full.jpg

Catching Turbulence in the Solar Wind

Massive datasets plus modelling, visualization and analytics allow researchers to "see" the unseen: the turbulence in solar winds.

Arie award

Arie Shoshani Earns Lifetime Achievement Award

More than 25 years ago, Arie Shoshani realized that researchers were facing significant challenges in organizing, managing and analyzing their scientific data. He set out to develop computer applicati…

The Scientific Data Management (SDM) group enables and accelerates scientific discoveries through effective data management and analysis tools and libraries. The SDM group’s research and development efforts focus on (1) scalable storage and I/O strategies, (2) autonomous data management infrastructure, (3) data life-cycle management, and (4) workflow optimization and automation.  Our group actively works with data generation and analysis workflows to reduce the complexity of large scientific analyses, including complex real-time workflows that could drive the next generation of scientific user facilities.  Members of the SDM group work closely with application scientists throughout the DOE community, academic and industry researchers around the world.  The group has a strong history of publications and contributes to many widely used software systems.  We have strong contributions to well-known I/O libraries including HDF5 and ADIOS; and are the primary developers of FastBit, FasTensor, and so on.

Interim Group Leader: Alex Sim

»Visit the Scientific Data Management (SDM) site.

SDM Publications

Automatic Data Transformation Using Large Language Model – An Experimental Study on Building Energy Data

December 15, 2023

Counterfactual Analysis: A Case Study on Impact of External Events on Building Energy Consumption

December 15, 2023

Experiences in deploying in-network data caches

December 14, 2023

Predicting Resource Utilization Trends with Southern California Petabyte Scale Cache

December 14, 2023

Understanding Data Access Patterns for dCache System

December 14, 2023

Preparing Spectral Data for Machine Learning: A Study of Geological Classification from Aerial Surveys

December 10, 2023

Comparative Study of the Cache Utilization Trends for Regional Scientific Data Caches

November 13, 2023

Enabling Agile Analysis of I/O Performance Data with PyDarshan

November 12, 2023

I/O Access Patterns in HPC Applications: A 360-Degree Survey

September 15, 2023

Uncovering I/O demands on HPC platforms: Peeking under the hood of Santos Dumont

August 18, 2023

Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective

July 5, 2023

Leveraging History to Predict Abnormal Transfers in Distributed Workflows

July 1, 2023

Analyzing Transatlantic Network Traffic Patterns with Scientific Data Caches

June 20, 2023

AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis.

June 16, 2023

ESnet's In-Network Caching Pilot

June 5, 2023

Illuminating the I/O Optimization Path of Scientific Applications

May 10, 2023

Understanding Data Access Patterns for dCache System

May 8, 2023

Predicting Resource Usage Trends with Southern California Petabyte Scale Cache

May 8, 2023

Experiences in deploying in-network data caches

May 8, 2023

Design and Implementation of I/O Performance Prediction Scheme on HPC Systems through Large-scale Log Analysis

May 1, 2023