About SciData

What We Do

The Scientific Data Division fosters breakthrough discoveries through the application and development of novel data science methods, technologies, and infrastructures in partnership with science domain experts.

Machine Learning

With increasingly powerful scientific instruments, researchers can see things at a microscopic and atomic scale, measure vibrations imperceptible to the human eye, and capture high-resolution images of objects millions of light-years away. But those instruments produce vastly larger datasets than ever. By using machine-learning techniques, often tightly integrated with high-performance computing resources, models can be automatically derived from that data to identify features, reduce complexity, and control experiments.

Our work in machine learning involves developing and sharing the algorithms, software, tools, and libraries that are fundamental to scientific machine learning. We build on Berkeley Lab’s foundational work in mathematics to develop methods that are consistent with physical laws, robust in the presence of noisy or biased data, and capable of being interpreted and explained in scientifically meaningful ways.

Data Infrastructure

In partnership with science domain experts, we build tools and models that transform data generated by simulations, experiments, and observations to a form that researchers can manipulate to gain scientific insight.

Our focus involves building data transformation and processing pipelines, developing data management tools and techniques, advanced scientific workflow tools, storage and I/O technologies, data indexing and searching, in situ feature-extraction algorithms, and software platforms to help scientists process and analyze the information they’ve collected. Often this work will include building a strong user-facing component to enter the pipeline and produce final data products, as well as tools for data movement and cybersecurity. We also create methods and techniques to ensure Findable, Accessible, Interoperable, and Reusable (FAIR) data principles, such as web-based science gateways and data repositories.

Security and Privacy for Scientific Computing and Energy Technologies

We advance and leverage security techniques and privacy-preserving approaches to solve important problems in scientific cyberinfrastructure and energy delivery systems. Our focus is on developing novel, practical, and user-centered security and privacy methods to enable scientific research that is otherwise impeded and also to develop approaches for addressing important problems in energy delivery systems in novel ways.

Software Engineering and Sustainability

We lead software engineering, software release management, and software development efforts that promote the visibility, usability, reliability, and sustainability of software products in domains from chemical engineering to high-energy physics. Using established software development methodologies, sustainable software is created and maintained as a collaborative activity in teams varying in size and composition from several people within Berkeley Lab to dozens of people, including multiple labs and university partners.

Science Partnerships

Just as research and development programs benefit science, domain scientists influence and contribute to the direction and content of our research. We work in partnership with a variety of experts in domains ranging from biology and materials science to fundamental physics and climate science to develop tools, techniques, and technologies to meet the data-analysis challenges posed by present and future experimental, observational, and simulation data.

Got an idea for a collaboration?

We welcome inquiries.