HPC and Scientific Networking Security
The Data Science and Technology Department is an active participant in a number of projects in the arena of security for scientific, high-performance computing systems and high-bandiwdth research and education networks. Research sponsors have typically included DOE's ASCR program, NSF's SaTC program, and NSF's OAC, among others.
LBNL has had a leadership role in security in scientific computing environments for many years, including the development of the Zeek (Bro) Network Security Monitor, the 100G performance enhancements of Zeek (Bro), and Zeek/Bro's commercial spin-off, Corelight, Inc., as well as leading several DOE-sponsored activities related to defining a cybersecurity research program within the DOE Office of Science. More recently, LBNL led the coordination of the "Cyber R&D" Enterprise Cyber Capability (ECC) of the DOE-wide Integrated Joint Cybersecurity Coordination Center (iJC3) — a sponsored R&D program that currently involves ten DOE National Laboratories as performers.
Learn more at the LBNL HPC and Scientific Networking Cybersecurity R&D Web Site.
A partial listing of current and recent projects specifically focused on security for high-performance, scientific computing and high-throughput, scientific networking is as follows:
- Toward a Hardware/Software Co-Design Framework for Ensuring the Integrity of Exascale Scientific Data. This project takes a broad look at several aspects of security and scientific integrity issues in HPC systems. It is funded by DOE ASCR and is led by Sean Peisert. See Scientific Computing Integrity project website.
- Trusted CI --- the National Science Foundation Cybersecurity of Excellence. The mission of Trusted CI is to improve the cybersecurity of NSF computational science and engineering projects, while allowing those projects to focus on their science endeavors. The PI of this center at Indiana University is Von Welch. LBNL’s role in this center is led by Sean Peisert. See Trusted CI project website.
- Privacy-Preserving Data Analysis for Scientific Discovery. This project aims to produce methods, processes, and architectures applicable to a variety of scientific computing domains that enables querying, machine learning, and analysis of data while protecting against releasing sensitive information beyond pre-defined bounds. It is supported by LBNL CSR funds and is led by Sean Peisert. See Privacy-Preserving Data Analysis project website.
Several recent projects include the following:
- Democratizing Health Research Through Privacy-Protecting Synthetic Data. This project aims to enable significantly broader use of health data by creating differentially private synthetic data sets. This project will also contribute to solutions for the focus on the coronavirus pandemic. It is supported by the UC Davis CeDAR. See Synthetic Data Privacy project website.
- Network Measurement, Analysis and Visualization. NetSage is a network measurement, analysis and visualization service funded by the National Science Foundation and is designed to address the needs of today's international networks. This project is co-led by Sean Peisert at LBNL. See NetSage project website.
- Distributed Detection of DDoS Attacks on the WAN. This project is examining ways in which operators of wide-area networks (WANs) cam better use their vantage points to detect DDoS attacks before they reach individual sites. It is particularly focused on large-scale science traffic as seen in ESnet and certain other national and regional "research and education" networks. This project is funded by DOE's iJC3 Cyber R&D program and is led by Sean Peisert at LBNL. See DDoS Detection project website.
- Inferring Computing Activity Using Physical Sensors. This project is using power data to identify computational operations, particularly in high-performance and cloud computing environments. This project is led by Sean Peisert at LBNL. See project website for inferring computing activity with power data.
- A Mathematical and Data-Driven Approach to Intrusion Detection for High-Performance Computing. In this project, CRD researchers developed mathematical and statistical techniques to analyze the access and use of high-performance computer systems. This project was funded by the U.S. Department of Energy's Applied Mathematics Section. LBNL, which was the lead institution for the project, also funded UC Davis and the International Computer Science Institute (ICSI) at UC Berkeley in this activity via subcontracts from LBNL. See Mathematical Approach to Intrusion Detection in HPC project website.
- DALHIS – Data Analysis on Large-scale Heterogeneous Infrastructures for Science. The DALHIS associate team is a collaboration between the Myriads Inria project-team (Rennes, France), Avalon Inria project-team (Lyon, France) and the LBNL Data Science and Technology (DST) department (Berkeley, USA). This portion of the DAHLIS project focus on cybersecurity to enable an integrated scientific data analysis ecosystem to accelerating the pace of scientific insight.
Key Representative Publications:
Sean Peisert, Eli Dart, William K. Barnett, James Cuff, Robert L. Grossman, Edward Balas, Ari Berman, Anurag Shankar, and Brian Tierney, "The Medical Science DMZ: An Network Design Pattern for Data-Intensive Medical Science", Journal of the American Medical Informatics Association (JAMIA), 25,(3):267–274, March 2018.
Sean Peisert, “Security in High-Performance Computing Environments”, Communications of the ACM (CACM), 60(9):72-80, September 2017.
Sean Peisert, Von Welch, Andrew Adams, Michael Dopheide, Susan Sons, RuthAnne Bevier, Rich LeDuc, Pascal Meunier, Stephen Schwab, and Karen Stocks, Ilkay Altintas, James Cuff, Reagan Moore, and Warren Raquel, “Open Science Cyber Risk Profile,” February 2017.
Sean Whalen, Sean Peisert, Matt Bishop, “Multiclass Classification of Distributed Memory Parallel Computations,” Pattern Recognition Letters (PRL), 34(3):322-329, February 2013.
Sean Whalen, Sophie Engle, Sean Peisert, Matt Bishop, “Network-Theoretic Classification of Parallel Computation Patterns,” International Journal of High Performance Computing Applications (IJHPCA), 26(2):159-169, May 2012.
A portion of the software developed through this project can be downloaded via Github.