Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research

Sifting Through a Trillion Electrons

SDM's Surendra Byna and colleagues from Berkeley Lab’s Computational Research Division teamed up with researchers to develop novel software strategies for storing, mining, and analyzing massive datase…


Catching Turbulence in the Solar Wind

Massive datasets plus modelling, visualization and analytics allow researchers to "see" the unseen: the turbulence in solar winds.

Arie award

Arie Shoshani Earns Lifetime Achievement Award

More than 25 years ago, Arie Shoshani realized that researchers were facing significant challenges in organizing, managing and analyzing their scientific data. He set out to develop computer applicati…

The Scientific Data Management (SDM) group enables and accelerates scientific discoveries through effective data management and analysis tools and libraries. The SDM group’s research and development efforts focus on (1) scalable storage and I/O strategies, (2) autonomous data management infrastructure, (3) data life-cycle management, and (4) workflow optimization and automation.  Our group actively works with data generation and analysis workflows to reduce the complexity of large scientific analyses, including complex real-time workflows that could drive the next generation of scientific user facilities.  Members of the SDM group work closely with application scientists throughout the DOE community, academic and industry researchers around the world.  The group has a strong history of publications and contributes to many widely used software systems.  We have strong contributions to well-known I/O libraries including HDF5 and ADIOS; and are the primary developers of FastBit, FasTensor, and so on.

Interim Group Leader: Alex Sim

»Visit the Scientific Data Management (SDM) site.

SDM Publications

A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis

May 27, 2024

Drilling Down I/O Bottlenecks with Cross-layer I/O Profile Exploration

May 27, 2024

TunIO: An AI-powered Framework for Optimizing HPC I/O

May 27, 2024

Serving Deep Learning Model in Relational Databases

March 25, 2024

Detecting Anomalies in Time Series Using Kernel Density Approaches

March 15, 2024

h5bench: Exploring HDF5 Access Patterns Performance in Pre-Exascale Platforms

January 31, 2024

Counterfactual Analysis: A Case Study on Impact of External Events on Building Energy Consumption

December 15, 2023

Automatic Data Transformation Using Large Language Model – An Experimental Study on Building Energy Data

December 15, 2023

Experiences in deploying in-network data caches

December 14, 2023

Predicting Resource Utilization Trends with Southern California Petabyte Scale Cache

December 14, 2023

Understanding Data Access Patterns for dCache System

December 14, 2023

Preparing Spectral Data for Machine Learning: A Study of Geological Classification from Aerial Surveys

December 10, 2023

Comparative Study of the Cache Utilization Trends for Regional Scientific Data Caches

November 13, 2023

Elephants Sharing the Highway: Studying TCP Fairness in Large Transfers over High Throughput Links

November 13, 2023

Enabling Agile Analysis of I/O Performance Data with PyDarshan

November 12, 2023

I/O Access Patterns in HPC Applications: A 360-Degree Survey

September 15, 2023

Uncovering I/O demands on HPC platforms: Peeking under the hood of Santos Dumont

August 18, 2023

Gender Gaps in Mode Usage, Vehicle Ownership, and Spatial Mobility When Entering Parenthood: A Life Course Perspective

July 5, 2023

Leveraging History to Predict Abnormal Transfers in Distributed Workflows

July 1, 2023

Analyzing Transatlantic Network Traffic Patterns with Scientific Data Caches

June 20, 2023