NetLogger Helps Supernova Factory Improve Data Analysis

May 12, 2005

The Nearby Supernova Factory (SNfactory) project, established at Berkeley Lab in 2002, aims to dramatically increase the discovery of nearby Type 1a supernovae by applying assembly-line efficiencies to the collection, analysis and retrieval of large amounts of astronomical data.

To date, the program has resulted in the discovery of about 150 Type 1a supernovae – about three times the entire number reported before the project was started. Type Ia supernovae are important celestial bodies because they are used as “standard candles” for gauging the expansion of the universe.

Contributing to the SNfactory's remarkable discovery rate is its custom-developed “data pipeline” software. The pipeline fills with up to 50 gigabytes (billion bytes) of data per night from wide-field cameras built and operated by the Jet Propulsion Laboratory's Near Earth Asteroid Tracking program (NEAT). NEAT uses remote telescopes in Southern California and Hawaii.

Around 25,000 new images are captured each day, and the goal is to complete all processing before the next day’s images arrive. Image data is copied in real time from the Mt. Palomar Observatory in Southern California to a mass storage system at NERSC. Then the image data is copied to a large shared disk array on a 344-node cluster called PDSF. Each image is 8 MB (uncompressed), and the processing of each image requires between 5 and 25 reference images, for a total disk space requirement of about 0.5 TB each day.

Supernovae are found by comparing recently acquired telescope images with older reference images. If there is a source of light in the new image that did not exist in the old

image, it could be a supernova. Subtracting the new image from the reference image identifies new light sources. This process is quite delicate: aligning the images, matching the point-spread functions, and matching the photometry and bias all require precise calibration.

Because of the high demand put on all the resources in the pipeline, making sure that the data flow smoothly and can be analyzed quickly and correctly is critical to the overall success. While there are a number of tools for evaluating the performance of single systems, identifying the workflow bottlenecks in a distributed system such as the SNfactory requires a different type of application.

For the past 10 years, Brian Tierney and others in the Collaborative Computing Technologies Group have been developing the Netlogger toolkit as part of the Distributed Monitoring Framework project. NetLogger is a set of libraries and tools to support end-to-end monitoring of distributed applications. During the past few months, the team has been working closely with the SNfactory project to help debug and tune their application.

“NetLogger has been extremely useful in the debugging and commissioning of our data processing pipeline,” said Stephen Bailey, one of the lead developers on SNfactory project. “It has helped us identify bugs and processing bottlenecks in order to improve our efficiency and data quality. It additionally has allowed real time monitoring of the data processing to quickly identify problems that need immediate attention. This debugging, commissioning, and monitoring would have taken much longer without NetLogger.”

Tierney and Bailey, along with Dan Gunter of the Collaborative Computing Technologies Group, have written a paper entitled “Scalable Analysis of Distributed Workflow Traces,” which will be presented at the 2005 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'05) to be held June 27-30 in Las Vegas. The paper can be found at <http://dsd.lbl.gov/publications/NetLogger- SNFactory.pdf>.

“The first problem the SNFactory scientists asked us to solve was to figure out why some of their workflows where failing without any error messages as to the cause,” Tierney said. “Even when error messages were generated, the SNfactory application produced thousands of log files, and it was very difficult to locate the log messages related to failed workflows. NetLogger was very useful for easily characterizing where the failures were occurring so they would know where to focus debugging efforts.”

About Berkeley Lab

Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 16 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.