Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research
CRDhpTrillion.jpg

Sifting Through a Trillion Electrons

SDM's Surendra Byna and colleagues from Berkeley Lab’s Computational Research Division teamed up with researchers to develop novel software strategies for storing, mining, and analyzing massive datase...

bz-322-full.jpg

Catching Turbulence in the Solar Wind

Massive datasets plus modelling, visualization and analytics allow researchers to "see" the unseen: the turbulence in solar winds.

Arie award

Arie Shoshani Earns Lifetime Achievement Award

More than 25 years ago, Arie Shoshani realized that researchers were facing significant challenges in organizing, managing and analyzing their scientific data. He set out to develop computer applicati...

The Scientific Data Management (SDM) group enables and accelerates scientific discoveries through effective data management and analysis tools and libraries. The SDM group’s research and development efforts focus on (1) scalable storage and I/O strategies, (2) autonomous data management infrastructure, (3) data life-cycle management, and (4) workflow optimization and automation.  Our group actively works with data generation and analysis workflows to reduce the complexity of large scientific analyses, including complex real-time workflows that could drive the next generation of scientific user facilities.  Members of the SDM group work closely with application scientists throughout the DOE community, academic and industry researchers around the world.  The group has a strong history of publications and contributes to many widely used software systems.  We have strong contributions to well-known I/O libraries including HDF5 and ADIOS; and are the primary developers of FastBit, FasTensor, and so on.

Group Leader: John Wu

»Visit the Scientific Data Management (SDM) site.

SDM Publications

Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization

April 5, 2022

Transparent Asynchronous Parallel I/O using Background Threads

April 4, 2022

Data access pattern analysis for dCache storage system

January 12, 2022

What Makes You Hold on to That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions

January 11, 2022

Optimizing Performance of Parallel I/O Accesses to Non-contiguous Blocks in Multiple Array Variables

December 19, 2021

Asynchronous I/O Strategy for Large-Scale Deep Learning Applications

December 17, 2021

An In-Depth I/O Pattern Analysis in HPC Systems

December 17, 2021

Performance of the Gold Standard and Machine Learning in Predicting Vehicle Transactions

December 15, 2021

SCTuner: An Auto-tuner Addressing Dynamic I/O Needs on Supercomputer I/O Sub-systems

November 21, 2021

Data-Aware Storage Tiering for Deep Learning

November 21, 2021

I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis

November 21, 2021

Exploiting User Activeness for Data Retention in HPC Systems

November 21, 2021

Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

November 18, 2021

Performance Prediction of Large Data Transfers

November 17, 2021

Predicting WAN Traffic Volumes using Fourier and Multivariate SARIMA Approach

November 3, 2021

Network traffic performance analysis from passive measurements using gradient boosting machine learning

October 25, 2021

Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers

October 13, 2021

h5bench: HDF5 I/O Kernel Suite for Exercising HPC I/O Patterns

September 30, 2021

Analyzing scientific data sharing patterns with in-network data caching

September 14, 2021

Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights

September 1, 2021

More from SDM Publications »