Supercomputing Report Card
Assessment of applications performance on six supercomputers pinpoints petascale challenge
February 15, 2007
A comprehensive supercomputer performance evaluation undertaken by CRD scientists has won a Best Paper award in the Application track at the IEEE International Parallel and Distributed Processing Symposium (IPDPS), underscoring the significant contribution the research makes to improving scientific applications for the arrival of petascale computing.
The paper, “Scientific Application Performance on Candidate PetaScale Platforms,” is an accumulation of four years of exhaustive study that examined the performance of six codes, representing a wide range of research disciplines, on six supercomputers throughout the country.
Headed by Lenny Oliker, the research not only answered key questions about high-performance systems, it also identified the shortcomings to overcome in order to scale the codes to run on next-generation supercomputers.
“If you go to the literature today and look for comparisons on how the machines differ from one another on realistic, large-scale applications, there is surprisingly little information out there,” Oliker said. “The research benefits everyone from application scientists to next-generation supercomputer designers.”
The research is a collaboration between computer and application scientists at CRD and NERSC. The co-authors are Andrew Canning, Jonathan Carter, Costin Iancu, Michael Lijewski, Shoaib Kamil, John Shalf, Hongzhang Shan and Erich Strohmaier. Stephane Ethier from the Princeton Plasma Physics Laboratory and Tom Goodale from Louisiana State University also contributed to the work.
The IEEE symposium is scheduled to honor the paper’s authors and have them present the paper on March 28 in Long Beach, California.
“This is indeed an outstanding accomplishment,” said Horst Simon, Associate Lab Director of Computing Sciences and head of CRD and NERSC. “Getting into the program is already a big deal. Getting best paper is exceptional.”
The project traces its genesis to the 2002 birth of Earth Simulator, which reigned on the TOP500 list for two and half years and upstaged the supercomputing community in the United States. Oliker and his team were among the small number of U.S. scientists given access to Earth Simulator and traveled to Japan several times to evaluate the system. The researchers, including Ethier, have reported their findings in over 10 technical papers.
The Earth Simulator evaluation prompted Oliker and other American researchers to kick-start the project to compare performances of widely used supercomputers in national labs across the United States. The team set out to determine how well the codes currently used by scientists would fare on the supercomputers and the tradeoffs among various system designs.
Oliker’s team chose six codes that represented a broad spectrum of research areas: magnetic fusion (GTC), fluid dynamics (ELBM3D), astrophysics (Cactus), high energy physics (BeamBeam3D), materials science (PARATEC) and AMR gas dynamics (HyperCLaw).
They ran these six codes on each of the six supercomputers that, in turn, represented a wide range of architectures. The systems were Bassi and Jacquard from Lawrence Berkeley National Laboratory, Jaguar and Phoenix from Oak Ridge National Laboratory, Blue Gene/L from Argonne National Laboratory and another Blue Gene/L from IBM Thomas J. Watson Research Center.
Bassi is an IBM Power5 system with 888 compute processors (111 8-way nodes) that runs on AIX. Jacquard contains 640 single-core AMD’s Opteron processors (320 2-way nodes), running Linux 2.6.5., while Jaguar features 14,400 dual-core Opteron processors (5,200 2-way nodes) and running Catamount. Pheonix is a vector-based Cray X1E system containing 768 processors (96 8-way MSP nodes) and runs UNICOS.mp. The Blue Gene/L at Argonne is an IBM PowerPC 440-based system with 2,048 processors (1024 2-way nodes) and runs SuSE Linux OS (SLES9). The Blue Gene/L at IBM’s research center contains 40,000 processors.
The study produced results that showed the strengths and weaknesses of various high-performance systems. For example, the Power5-based Bassi achieved the highest per-processor raw performance running four of the six codes. The X1E-based Pheonix system, on the other hand, produced impressive raw performance on GTC and ELBM3D. However, applications with nonvectorizable portions didn’t perform as well on Phoenix as a result of the “imbalance between the scalar and vector processors,” the researchers said.
“Our results indicate that our evaluated codes have the potential to effectively utilize petascale resources,” Oliker and others wrote in the paper. “However, several applications will require re-engineering to incorporate the additional levels of parallelism necessary to achieve the vast concurrency of upcoming ultra-scale systems.”
The team identified PARATEC and BeamBeam3D as among the applications that would benefit from retooling.
The scientists’ findings have been presented at the SC03, SC04, and SC05 conference, as well as in the recently published SIAM book on parallel programming.
Previous research on scientific application behavior for differing supercomputers also garnered Oliker and other researchers best-paper awards at the SC99 and SC2000 conferences.