Face-to-Face Discussion Helps Fusion Scientists Solve Interface Problem
January 24, 2005
Sometimes, $14 can go a long way. For the price of a train ticket from Manhattan to Princeton, CRD’s Sherry Li was able to meet with scientists at the Princeton Plasma Physics Lab and together they were able to solve problems that were keeping a new fusion code from running fully parallel.
Li, a member of the Scientific Computing Group and one of the key developers of the SuperLU library of solvers, had been consulting with Steve Jardin’s group at PPPL for several months as the fusion researchers worked to develop a newer, faster version of their legacy code known at M3D.
M3D used the explicit method to solve partial differential equations, an approach that required many small time steps, which took longer to run. The new version, called M3D-C1, uses an implicit scheme in which each time step is much larger and requires fewer time steps to the solution. “However, the matrix is much more difficult to solve, and many solvers cannot solve it,” Li said.
Several months ago, the team began using SuperLU as their solver. Although the fusion group computes on Seaborg, the IBM supercomputer at NERSC, they had acquired a 16-processor SGI Altix system to do local development of their new code.
As problems arose, Li and the PPPL team exchanged emails trying to resolve the problems. They even sent their code to her, but she could not find the sticking points just by reviewing it.
“It was getting more complicated – they know the physics part and I know the solver part,” Li said. “We finally decided it might be better to sit down in person and look over the code.”
So, while attending the 16th International Conference on Domain Decomposition Methods at NYU’s Courant Institute in January, Li slipped away for a day and took the train to Princeton.
“They educated me more on how their code worked and we looked at the interface,” she said. “M3D is written in Fortran 90 and my code is in C, so we needed to build some new wrappers.”
In the process of debugging in real time, they were able to identify a word-type inconsistency in the interface that caused the SGI implementation to fail for the largest problem sizes.
“Even though you have been very responsive via email during the last few months, there was really no substitute for your actually being here to witness and diagnose the problems we were having,” Jardin wrote to Li after their meeting. “Thank you so much for making a special trip from your conference to help us debug the implementation of your distributed SuperLU software on our local SGI Altix. This has really made a big impact. As a result of your visit, not only do we understand your SuperLU-dist much better, but we are now able to run our largest jobs in a fully parallel mode, with even better than "ideal" scaling. This will really help us in our code-development activities for the new M3D-C1 code, and will also make our use of NERSC for this code much more productive.”
While the immediate results demonstrate the value of an interpersonal collaborative approach, the success is also an example of how DOE’s Scientific Discovery through Advanced Computing (SciDAC) program is meeting its goal of developing advanced tools though collaboration. The SuperLU development is partly funded by the Terascale Optimal PDE Simulations (TOPS) SciDAC project, while M3D-C1 is funded by the fusion Center for Extended Magnetohydrodynamic Modeling (CEMM) SciDAC project.