Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Scientific Data Management Research

Sifting Through a Trillion Electrons

SDM's Surendra Byna and colleagues from Berkeley Lab’s Computational Research Division teamed up with researchers to develop novel software strategies for storing, mining, and analyzing massive datase...


Catching Turbulence in the Solar Wind

Massive datasets plus modelling, visualization and analytics allow researchers to "see" the unseen: the turbulence in solar winds.

Arie award

Arie Shoshani Earns Lifetime Achievement Award

More than 25 years ago, Arie Shoshani realized that researchers were facing significant challenges in organizing, managing and analyzing their scientific data. He set out to develop computer applicati...

The Scientific Data Management (SDM) group enables and accelerates scientific discoveries through effective data management and analysis tools and libraries. The SDM group’s research and development efforts focus on (1) scalable storage and I/O strategies, (2) autonomous data management infrastructure, (3) data life-cycle management, and (4) workflow optimization and automation.  Our group actively works with data generation and analysis workflows to reduce the complexity of large scientific analyses, including complex real-time workflows that could drive the next generation of scientific user facilities.  Members of the SDM group work closely with application scientists throughout the DOE community, academic and industry researchers around the world.  The group has a strong history of publications and contributes to many widely used software systems.  We have strong contributions to well-known I/O libraries including HDF5 and ADIOS; and are the primary developers of FastBit, FasTensor, and so on.

Group Leader: John Wu

»Visit the Scientific Data Management (SDM) site.

SDM Publications

What Makes You Hold on to That Old Car? Joint Insights from Machine Learning and Multinomial Logit on Vehicle-level Transaction Decisions

January 11, 2022

Asynchronous I/O Strategy for Large-Scale Deep Learning Applications

December 17, 2021

An In-Depth I/O Pattern Analysis in HPC Systems

December 17, 2021

Performance Prediction of Large Data Transfers

November 17, 2021

Analyzing scientific data sharing patterns with in-network data caching

September 14, 2021

Enhancing IoT Anomaly Detection Performance for Federated Learning

July 9, 2021

Automated Variable Selection for Network Anomaly Detection

July 1, 2021

Analyzing scientific data sharing patterns with in-network data caching

June 21, 2021

Access Patterns of Disk Cache for Large Scientific Archive

June 21, 2021

GPU-based Classification for Wireless Intrusion Detection

June 21, 2021

Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU+GPU Architectures

May 17, 2021

Improving Botnet Detection with Recurrent Neural Network and Transfer Learning

April 26, 2021

Network Traffic Performance Analysis and Anomaly Detection using Supervised Machine Learning

March 31, 2021

An empirical study of I/O separation for burst buffers in HPC systems

February 1, 2021

Clustering Life Course to Understand the Heterogeneous Effects of Life Events, Gender and Generation on Habitual Travel Modes

December 19, 2020

Enhancing IoT Anomaly Detection Performance for Federated Learning

December 17, 2020

Effective Missing Value Imputation Methods for Building Monitoring Data

December 10, 2020

Combining Ambient Noise and Distributed Acoustic Sensing (DAS) Deployed on Dark Fiber Networks for High-resolution Imaging at the Basin Scale

December 9, 2020

Deep Learning for Surface Wave Identification in Distributed Acoustic Sensing Data

December 8, 2020

Botnets Detection Using Recurrent Variational Autoencoder

December 7, 2020

More from SDM Publications »