Careers | Phone Book | A - Z Index

Data Science & Technology Software

Collaborative Web Applications

      • ALSHub - ALSHub is a website used for managing users of the ALS facility and their proposals, experiment safety details, and beamtime
      • AmeriFlux web application ‐ Allows users to upload data, and download data. Future features include personalized sets of tower sites, personalized dashboard-style reporting, and data visualization.
      • CLEER Model - The Cloud Energy and Emissions Research (CLEER) Model is a comprehensive user-friendly open-access model for assessing the net energy and emissions implications of cloud services in different regions and at different levels of market adoption.
      •  eProject Builder - A secure web-based data entry and tracking system for energy savings performance contract (ESPC) projects
      • KBase Narrative - A web application for systems biology that tracks the analyses, data, and thought processes of an analysis in a way that is reproducible and shareable.
      • Materials Project -  Web application for materials design; by computing properties of all known materials, the Materials Project aims to remove guesswork from materials design in a variety of applications.
      • OpenMSI - Web application for management, storage, visualization, and statistical analysis of of Mass Spectrometry Imaging (MSI) data
      • PDG Workspace - A set of web applications used to author and edit the Review of Particle Physics. It includes an interface for adding literature to review, adding measurements found in that literature, and to view the final published results for general public consumption.

Data Management

      • Berkeley Storage Manager (BeStMan) ‐ LBNL implementation of Storage Resource Manager (SRM) based on standard interface.
      • FastBit - Implementation of FastBit indexing/searching algorithm.
      • FastQuery - A parallel indexing system for scientific data based on FastBit
      • pymatgen-db - Provides an add-on to the Python Materials Genomics (pymatgen) library ( that allows the creation of Materials Project-style databases for management of materials data.
      • SPADE - A JEE application that takes files from wherever they are produced, i.e. at an experiment, and delivers them into a data warehouse from which they can be retrieved for analysis and also archived
      • SRM-Lite - A simple command-line based tool with pluggable file transfer protocol supports including scp, hpn-scp and sftp
      • SDS framework (please ask permission) - An automatic data management system for exascale computing.
      • ArrayUDF - A MapReduce type system for scientific data (as tensor) analysis.  
      • DataElevator - A software to move in a hierarchy storage system, e.g. burst buffer. 

Networking, Monitoring, and Security

      • B.I.4NERSC - Analytical methods for unveiling information buried in data files from monitoring software at NERSC
      • BulkDataMover - A scalable data transfer management tool for GridFTP transfer protocol
      • DataMover-Lite - End-user data downloading tool for ESGF climate data
      • ESG2Net100 - Library enabling minimal memory copy from disk I/O to network I/O
      • Hive Mind - Lightweight, decentralized, intrusion detection based on mobile agents and swarm intelligence
      • LBNL Physics-Based Intrusion Detection Bro Modules - A set of signatures for use with the Zeek (née Bro) Network Security Monitor that analyze communication with a physical system and compare the effects of that communication with a physical simulation of the device. 
      • LBNL DDoS Detection on Science Networks - Monitors network logs in order to detect denial of service attacks on "research and education" networks that disambiguates such attacks from sustained, high-volume network flows characteristic of large science projects, and referred to as "elephant flows."
      • LBNL Stream-Processing Architecture for Real-time Cyber-physical Security (SPARCS) -  Extracts data from distribution-level phasor measurement units (PMUs) and power quality meters, and stores SCADA captured over the network, enabling a physically distributed, hierarchical processing of that data, stores the data in one or more databases, and provides both software APIs and a graphical, web-based, front-end for inspection of data. 
      • Analytics for Stream-Processing Architecture for Real-time Cyber-physical Security (Analytic-SPARCS) - A set of analytics that monitor both power measurements collected by distribution grid phasor measurement units (µPMUs) and SCADA communication in order to detect cyber attacks against equipment located in distribution grid substations.
      • LBNL Disruption Tolerant Key Management Monitoring for Stream-Processing Architecture for Real-time Cyber-physical Security (DTKM-SPARCS) - A set of signatures that monitor the Disruption-Tolerant Key Management protocol developed by PNNL as part of the DOE CEDS program.
      • Research Network Transfer Performance Predictor (netperf-predict) - This software containts two sets of analysis routines for predicting the percentage of retransmitted packets on network flows. One directory contains code that applies random forest regression in order to predict the number of retransmitted packets on each flow, operating on timeseries data from the tstat tool, which outputs flow-like data. The second directory also applies a random forest regression and also incorporates a “smoothing” routine that increases accuracy in some situations.
      • NetLogger - Methodology and set of software tools for debugging and performance analysis of complex distributed applications.
      • StorNet - Storage and network bandwidth coordination system.

Visualization Algorithms and Applications

      • BrainFormat - The LBNL BrainFormat library specifies a general data format standardization framework and implements a novel file format for management and storage of neuro-science data
      • Dionysus - Library for computation of persistent homology
      • QuantCT - Quantitative analysis of micro-tomography images
      • SHARP - A Multi-GPU Ptychographic Reconstruction Toolkit
      • tess and tess2 - Libraries to compute Delaunay and Voronoi tesselations in HPC environments
      • VisIt - A distributed, parallel visualization and graphical analysis tool for data defined on two- and three-dimensional (2D and 3D) meshes.
      • See also OpenMSI under Web Applications, above


    • FireWorks - Scientific workflow software that includes dynamic workflows, failure-detection routines, and built-in tools and execution modes for running high-throughput computations at large computing centers.
    • Tigres - Tigres provides a programming library to compose and execute large-scale data-intensive scientific workflows from desktops to supercomputers. Tigres addresses the challenge of enabling collaborative analysis of DOE Science data through a new concept of reusable “templates” that enable scientists to easily compose, run and manage collaborative computational tasks. These templates define common computation patterns used in analyzing a data set.