Data Science & Technology Software

Collaborative Web Applications

  • ALSHub - ALSHub is a website used for managing users of the ALS facility and their proposals, experiment safety details, and beamtime
  • AmeriFlux web application ‐ Allows users to upload data, and download data. Future features include personalized sets of tower sites, personalized dashboard-style reporting, and data visualization.
  • CLEER Model - The Cloud Energy and Emissions Research (CLEER) Model is a comprehensive user-friendly open-access model for assessing the net energy and emissions implications of cloud services in different regions and at different levels of market adoption.
  •  eProject Builder - A secure web-based data entry and tracking system for energy savings performance contract (ESPC) projects
  • KBase Narrative - A web application for systems biology that tracks the analyses, data, and thought processes of an analysis in a way that is reproducible and shareable.
  • Materials Project -  Web application for materials design; by computing properties of all known materials, the Materials Project aims to remove guesswork from materials design in a variety of applications.
  • OpenMSI - Web application for management, storage, visualization, and statistical analysis of of Mass Spectrometry Imaging (MSI) data
  • PDG Workspace - A set of web applications used to author and edit the Review of Particle Physics. It includes an interface for adding literature to review, adding measurements found in that literature, and to view the final published results for general public consumption.

Data Management

  • AUDAS ‐ Attribute-based unified data access system, based on FastBit
  • Berkeley Storage Manager (BeStMan) ‐ LBNL implementation of Storage Resource Manager (SRM) based on standard interface.
  • FastBit - Implementation of FastBit indexing/searching algorithm.
  • FastQuery - A parallel indexing system for scientific data based on FastBit
  • pymatgen-db - Provides an add-on to the Python Materials Genomics (pymatgen) library ( that allows the creation of Materials Project-style databases for management of materials data.
  • SPADE - A JEE application that takes files from wherever they are produced, i.e. at an experiment, and delivers them into a data warehouse from which they can be retrieved for analysis and also archived
  • SRM-Lite - A simple command-line based tool with pluggable file transfer protocol supports including scp, hpn-scp and sftp

Networking, Monitoring, and Security

  • B.I.4NERSC - Analytical methods for unveiling information buried in data files from monitoring software at NERSC
  • BulkDataMover - A scalable data transfer management tool for GridFTP transfer protocol
  • DataMover-Lite - End-user data downloading tool for ESGF climate data
  • ESG2Net100 - Library enabling minimal memory copy from disk I/O to network I/O
  • Hive Mind - Lightweight, decentralized, intrusion detection based on mobile agents and swarm intelligence
  • LBNL Physics-Based Intrusion Detection Bro Modules (github) - Combine network traces and simulation to compare the effects of communication with simulated physical behavior of a device.
  • NetLogger - Methodology and set of software tools for debugging and performance analysis of complex distributed applications.
  • StorNet - Storage and network bandwidth coordination system.

Visualization Algorithms and Applications

  • BrainFormat - The LBNL BrainFormat library specifies a general data format standardization framework and implements a novel file format for management and storage of neuro-science data
  • Dionysus - Library for computation of persistent homology
  • QuantCT - Quantitative analysis of micro-tomography images
  • SHARP - A Multi-GPU Ptychographic Reconstruction Toolkit
  • tess and tess2 - Libraries to compute Delaunay and Voronoi tesselations in HPC environments
  • VisIt - A distributed, parallel visualization and graphical analysis tool for data defined on two- and three-dimensional (2D and 3D) meshes.
  • See also OpenMSI under Web Applications, above


  • FireWorks - Scientific workflow software that includes dynamic workflows, failure-detection routines, and built-in tools and execution modes for running high-throughput computations at large computing centers.
  • Tigres - Tigres provides a programming library to compose and execute large-scale data-intensive scientific workflows from desktops to supercomputers. Tigres addresses the challenge of enabling collaborative analysis of DOE Science data through a new concept of reusable “templates” that enable scientists to easily compose, run and manage collaborative computational tasks. These templates define common computation patterns used in analyzing a data set.