Updated Workflows for New LHC ERA

Berkeley Lab scientists contribute to new software that helps physics maximize next-generation supercomputing architectures

February 22, 2016

Linda Vu, lvu@lbl.gov, 510.495.2402

A view inside the liquid-argon calorimeter endcap. Photo Credit: The ATLAS Experiment at CERN, atlas.ch

After a massive upgrade, the Large Hadron Collider (LHC), the world’s most powerful particle collider is now smashing particles at an unprecedented 13 tera-electron-volts (TeV)—nearly double the energy of its previous run from 2010-2012. In just one second, the LHC can now produce up to 1 billion collisions and generate up to 10 gigabytes of data in its quest to push the boundaries of known physics. And over the next decade, the LHC will be further upgraded to generate about 10 times more collisions and data.

To deal with the new data deluge, researchers working on one of the LHC’s largest experiments—ATLAS—are relying on updated workflow management tools developed primarily by a group of researchers at the Lawrence Berkeley National Laboratory (Berkeley Lab). Papers highlighting these tools were recently published in the Journal of Physics: Conference Series.

“The issue with High Luminosity LHC is that we are producing ever-increasing amounts of data, faster than Moore’s Law and cannot actually see how we can do all of the computing that we need to do with the current software that we have,” says Paolo Calafiura, a scientist in Berkeley Lab’s Computational Research Division (CRD). “If we don’t either find new hardware to run our software or new technologies to make our software run faster in ways we can’t anticipate, the only choice that we have left is to be more selective in the collision events that we record. But, this decision will of course impact the science and nobody wants to do that.”

To tackle this problem, Calafiura and his colleagues of the Berkeley Lab ATLAS Software group are developing new software tools called Yoda and AthenaMP to speed up the analysis of the data by leveraging the capabilities of next-generation Department of Energy (DOE) supercomputers like the National Energy Research Scientific Computing Center’s (NERSC’s) Cori system, as well as DOE’s current Leadership Computing Facilities, to analyze ATLAS data.

Yoda: Treating Single Supercomputers like the LHC Computing Grid

Around the world, researchers rely on the LHC Computing Grid to process the petabytes of data collected by LHC detectors every year. The grid comprises 170 networked computing centers in 36 countries. CERN’s computing center, where the LHC is located, is ‘Tier 0’ of the grid. It processes the raw LHC data, and then divides it into chunks for the other Tiers. Twelve ‘Tier 1’ computing centers then accept the data directly from CERN’s computers, further process the information and then break it down into even more chunks for the hundreds of computing centers further down the grid. Once a computer finishes its analysis, it sends the findings to a centralized computer and accepts a new chunk of data.

Like air traffic controllers, special software manages workflow on the computing grid for each of the LHC experiments. The software is responsible for breaking down the data, directing the data to its destination, telling systems on the grid when to execute an analysis and when to store information. To deal with the added deluge of data from the LHC’s upgraded ATLAS experiment, Vakhtang Tsulaia from the Berkeley Lab’s ATLAS Software group added another layer of software to the grid called Yoda Event Service system.

The researchers note that the idea with Yoda is to replicate the LHC Computing Grid workflow on a supercomputer. So as soon as a job arrives at the supercomputer, Yoda will breakdown the data chunk into even smaller units, representing individual events or event ranges, and then assign those jobs to different compute nodes. Because only the portion of the job that will be processed is sent to the compute node, computing resources no longer need to stage the entire file before executing a job, so processing happens relatively quickly.

To efficiently take advantage of available HPC resources, Yoda is also flexible enough to adapt to a variety of scheduling options—from back filling to large time allocations. After processing the individual events or event ranges, Yoda saves the output to the supercomputer’s shared file system so that these jobs can be terminated at anytime with minimal data losses. This means that Yoda jobs can now be submitted to the HPC batch queue in back filling mode. So if the supercomputer is not utilizing all of its cores for a certain amount of time, Yoda can automatically detect that and submit a properly sized job to the batch queue to utilize those resources.

“Yoda acts like a daemon that is constantly submitting jobs to take advantage of available resources, this is what we call opportunistic computing,” says Calafiura.

In early 2015 the team tested Yoda’s performance by running ATLAS jobs from the previous LHC run on NERSC’s Edison supercomputer and successfully scaled up to 50,000 computer processor cores.

AthenaMP: Adapting ATLAS Workloads for Massively Parallel Systems

In addition to Yoda, the Berkeley Lab ATLAS software group also developed the AthenaMP software that allows the ATLAS reconstruction, simulation and data analysis framework to run efficiently on massively parallel systems.

“Memory has always been a scare resource for ATLAS reconstruction jobs. In order to optimally exploit all available CPU-cores on a given compute node, we needed to have a mechanism that would allow the sharing of memory pages between processes or threads,” says Calafiura.

AthenaMP addresses the memory problem by leveraging the Linux fork and copy-on-write mechanisms. So when a node receives a task to process, the job is initialized on one core and sub-processes are forked to other cores, which then process all of the events assigned to the initial task. This strategy allows for the sharing of memory pages between event processors running on the same compute node.

By running ATLAS reconstruction in one AthenaMP job with several worker processes, the team notes that they achieved a significantly reduced overall memory footprint when compared to running the same number of independent serial jobs. And, for certain configurations of the ATLAS production jobs they’ve managed to reduce the memory usage by a factor of two.

“Our goal is to get onto more hardware and these tools help us do that. The massive scale of many high performance systems means that even a small fraction of computing power can yield large returns in processing throughput for high energy physics,” says Calafiura.

This work was supported by DOE’s Office of Science.

Read the papers:

Fine grained event processing on HPCs with the ATLAS Yoda system
http://iopscience.iop.org/article/10.1088/1742-6596/664/9/092025

Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP): http://iopscience.iop.org/article/10.1088/1742-6596/664/7/072050

About Berkeley Lab

Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 16 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.