Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Usable Data Systems Group

Cybersecurity for Scientific and High-Performance Computing

Berkeley Lab Computing Sciences Research is an active participant in numerous projects in areas of security for science, including high-performance computing and high-throughput networking environments. These projects include collaborations with numerous other academic, National Lab, and industry partners. R&D sponsors have included the Department of Energy (DOE) ASCR programs, the National Nuclear Security Administration (NNSA), the National Science Foundation (NSF) SaTC program and OAC office, and the National Security Agency, among others

Learn more at the LBNL Cybersecurity for Scientific and High-Performance Computing R&D Web Site.

A partial listing of current and recent projects specifically focused on security for high-performance, scientific computing and high-throughput, scientific networking is as follows:

  1. Data Enclaves for Scientific Computing. This project will develop secure computation architectures to ensure trustworthiness of scientific data while addressing the gaps left by existing solutions for scientific workflows to address the specific power, performance, and usability, and needs from the edge to the HPC center. It is led by Sean Peisert, Venkatesh Akella, and Jason Lowe-Power. See Data Enclaves for Scientific Computing website.
  2. Trusted CI — the National Science Foundation Cybersecurity of Excellence. The mission of Trusted CI is to improve the cybersecurity of NSF computational science and engineering projects, while allowing those projects to focus on their science endeavors. The PI of this center at the University of Illinois is Jim Basney.  Sean Peisert is Deputy Director and Co-PI and leads LBNL’s role in Trusted CI. See Trusted CI project website.
  3. Privacy-Preserving Data Analysis for Scientific Discovery. This project aims to produce methods, processes, and architectures applicable to a variety of scientific computing domains that enables querying, machine learning, and analysis of data while protecting against releasing sensitive information beyond pre-defined bounds. It is supported by LBNL CSR funds and is led by Sean Peisert.  See Privacy-Preserving Data Analysis project website.

Several recent projects include the following:

  1. Toward a Hardware/Software Co-Design Framework for Ensuring the Integrity of Exascale Scientific Data.  This project takes a broad look at several aspects of security and scientific integrity issues in HPC systems.  It is funded by DOE ASCR and is led by Sean Peisert.  See Scientific Computing Integrity project website.
  2. Democratizing Health Research Through Privacy-Protecting Synthetic Data.  This project aims to enable significantly broader use of health data by creating differentially private synthetic data sets. This project will also contribute to solutions for the focus on the coronavirus pandemic. It is supported by the UC Davis CeDAR. See Synthetic Data Privacy project website.
  3. Network Measurement, Analysis and Visualization. NetSage is a network measurement, analysis and visualization service funded by the National Science Foundation and is designed to address the needs of today's international networks. This project is co-led by Sean Peisert at LBNL. See NetSage project website.
  4. Distributed Detection of DDoS Attacks on the WAN.   This project is examining ways in which operators of wide-area networks (WANs) can better use their vantage points to detect DDoS attacks before they reach individual sites. It is particularly focused on large-scale science traffic as seen in ESnet and certain other national and regional “research and education” networks.  This project is funded by DOE's iJC3 Cyber R&D program and is led by Sean Peisert at Berkeley Lab. See DDoS Detection project website.
  5. Inferring Computing Activity Using Physical Sensors. This project is using power data to identify computational operations, particularly in high-performance and cloud computing environments. This project is led by Sean Peisert at Berkeley Lab. See project website for inferring computing activity with power data.
  6. A Mathematical and Data-Driven Approach to Intrusion Detection for High-Performance Computing. In this project, CRD researchers developed mathematical and statistical techniques to analyze the access and use of high-performance computer systems. This project was funded by the US Department of Energy’s Applied Mathematics Section. Berkeley Lab, the lead institution for the project, also funded UC Davis and the International Computer Science Institute (ICSI) at UC Berkeley in this activity via subcontracts from Berkeley Lab. See Mathematical Approach to Intrusion Detection in HPC project website.
  7. DALHIS – Data Analysis on Large-scale Heterogeneous Infrastructures for Science. The DALHIS associate team is a collaboration between the Myriads Inria project team (Rennes, France), the Avalon Inria project team (Lyon, France), and the Berkeley Lab Data Science and Technology (DST) department (Berkeley, USA). This portion of the DAHLIS project focuses on cybersecurity to enable an integrated scientific data analysis ecosystem to accelerate the pace of scientific insight.

Key Representative Publications:

Sean Peisert, “Trustworthy Scientific Computing,” Communications of the ACM (CACM), 64(5), pp. 18–21, May 2021.

Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, and Sean Peisert, “Performance Analysis of Scientific Computing Workloads on General Purpose TEEs,” Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 17–21, 2021.

Sean Peisert, Eli Dart, William K. Barnett, James Cuff, Robert L. Grossman, Edward Balas, Ari Berman, Anurag Shankar, and Brian Tierney, "The Medical Science DMZ: An Network Design Pattern for Data-Intensive Medical Science", Journal of the American Medical Informatics Association (JAMIA), 25,(3):267–274, March 2018.

Sean Peisert, “Security in High-Performance Computing Environments”, Communications of the ACM (CACM), 60(9):72-80, September 2017.

Sean Peisert, Von Welch, Andrew Adams, Michael Dopheide, Susan Sons, RuthAnne Bevier, Rich LeDuc, Pascal Meunier, Stephen Schwab, and Karen Stocks, Ilkay Altintas, James Cuff, Reagan Moore, and Warren Raquel, “Open Science Cyber Risk Profile,” February 2017.

Sean Whalen, Sean Peisert, Matt Bishop, “Multiclass Classification of Distributed Memory Parallel Computations,” Pattern Recognition Letters (PRL), 34(3):322-329, February 2013.

Sean Whalen, Sophie Engle, Sean Peisert, Matt Bishop, “Network-Theoretic Classification of Parallel Computation Patterns,” International Journal of High Performance Computing Applications (IJHPCA), 26(2):159-169, May 2012.

Software

A portion of the software developed through this project can be downloaded via Github.


About Berkeley Lab

Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 16 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.