Taking Big Steps, Berkeley Lab Scientists Shape Improvements in Topological Optimization
April 15, 2025
By Patrick Riley
Contact: cscomms@lbl.gov

Berkeley Lab scientists have transformed topological data analysis by developing a faster method to identify significant features in high-dimensional datasets. This breakthrough, akin to distinguishing true mountain summits like Mt. Everest (pictured) from minor bumps, enhances efficiency and precision in fields such as machine learning and materials science, paving the way for more insightful and accurate data-driven discoveries. (Credit: Wikimedia Commons)
Berkeley Lab scientists have developed a faster method for optimizing topological data analysis, a mathematical technique that helps researchers study the shape of data and uncover complex structures in high-dimensional datasets. Their novel approach accelerates the process of extracting meaningful patterns from noisy data by requiring significantly fewer steps than the standard procedure, making it much more efficient. This advancement, which has shown particular promise in machine learning, materials science, and biology, could have far-reaching impacts on data analysis and decision-making.
Their new method was detailed in a paper titled 'Topological Optimization with Big Steps,' published in Discrete & Computational Geometry.
“When people are trying to apply topological data analysis to machine learning, they quite often have to do this topological optimization. And so we’re giving them a much more efficient tool,” said Arnur Nigmetov, a computer systems engineer with the Machine Learning and Analytics Group of Berkeley Lab’s Scientific Data Division and lead author of the paper.
Everest, Kilimanjaro, and Topological Data Analysis
One can think of topological data analysis as similar to studying mountains: While Everest and Kilimanjaro have smaller ‘bumps’ along their slopes, these aren’t considered separate peaks. In topological data analysis, these ‘bumps’ are viewed as noise.
Researchers use math to separate meaningful patterns from random noise. Features in the data emerge or ‘birth’ and then disappear or ‘die.’ The longer a feature lasts, the more significant it is. For example, in materials science, stable voids created by atoms tend to last longer, while temporary, noisy pockets disappear quickly. This process helps researchers focus on the important features of complex data.
Topological data analysis can optimize data by smoothing out noise, making unwanted "bumps" disappear. It also enables researchers to adjust data to follow specific patterns, using techniques like backpropagation to speed up the process.
In machine learning, topological data analysis helps correct overfitting, which occurs when a model gets distracted by insignificant features.
“When the network overfits, it’s carving out little pockets around noisy data. It’s like treating a small bump on a mountain as an important peak,” said Morozov, a staff scientist in the Machine Learning and Analytics Group of Berkeley Lab’s Scientific Data Division and co-author of the paper.
To fix this, researchers add a term that encourages the model to focus on the more significant features—like real mountain peaks—helping it generalize better and avoid overfitting.
“By squashing noise and promoting meaningful features, we can make our models smarter and more efficient, much like a hiker learning to navigate the landscape by focusing only on the true peaks, not the distractions along the way.”
‘A Whole New Way of Thinking’
The Berkeley Lab scientists’ new algorithm addresses a long-standing challenge in topological data analysis: the inefficiency of optimization. Previously, optimizing topology required numerous small, incremental steps, making the process slow and cumbersome.
“Everyone who was doing topological optimization before struggled with this inefficiency problem,” said Nigmetov, referring to the many optimization steps previously required. “It’s a problem that has both theoretical and visual appeal and is relevant to the community..”
The pair’s breakthrough was recognizing a more efficient way to approach the problem. Rather than relying on these many small steps, their algorithm revealed a faster, more straightforward method of optimizing persistent homology—a core technique in topological data analysis.
“The idea that you can open this black box and figure out what’s going on inside is very unexpected,” Morozov said. “Usually, this kind of analysis is impossible, which is why everybody was just taking these small, local steps. So the fact that there is this extra structure and it's pretty straightforward to compute, and you can make use of it, I think that was a big surprise, not just to us, but to everybody who has heard of this work.”
Morozov and Nigmetov’s work was initiated under Laboratory Directed Research and Development (LDRD) funding from Berkeley Lab, provided by the U.S. Department of Energy (DOE). It was also supported by the Scientific Discovery through Advanced Computing (SciDAC) program and the Mathematical Multifaceted Integrated Capability Centers (MMICCs) program.
According to Morozov, this funding gave the researchers the time needed to thoroughly consider the problem. Their approach could have a significant impact on materials science, enabling researchers to manipulate and design materials with specific properties. “The ability to generate materials with prescribed topology could be a game-changer,” Morozov said.
The researchers also hope their work will advance the understanding of persistence diagrams, a core tool in topological data analysis. “Every new mathematical insight leads to fresh ways of thinking,” Morozov said.
About Berkeley Lab
Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 16 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.