Configuring HPGMG-FV as a Co-Design Proxy
AMR combustion applications like LMC from the Combustion Co-Design Center create several AMR levels of refinement, and within each level, maintain many variables for every point in space. As the code evolves, we expect to track up to 100 chemical species (e.g. CO, NO, CO2, etc...). Each of these mass fractions require a MG diffusion solve. One may use HPGMG-FV to proxy these diffusion solves. However, as there are up to 10 AMR levels each with 100 chemical species, the total memory available to any one MG solve may be small (0.1-1% of a node's total addressable memory).
By default, HPGMG-FV solves Poisson's equation (-b div beta grad u = f ). However, LMC requires soling the Helmholtz equation ( a*alpha*u - b div beta grad u = f ). One may compile HPGMG-FV with -DUSE_HELMHOLTZ in order to override the default Poisson Solve.
Memory may constrain LMC to perhaps 2M DOF per level per numa node. Running HPGMG with one 128³ box or eight 64³ boxes per numa node is a reasonable proxy for this problem size. Thus, invoking ./hpgmg-fv 7 1 or ./hpgmg-fv 6 8 can proxy this problem size. It is equally interesting to determine how performance varies with parallelism per node. Thus, increasing or reducing the DOF per node to make better use of manycore- and accelerator-based systems can provide fodder to the debate as to whether these technologies improve time to solution or affect computing consolidation.
As the default version (v0.3) of HPGMG-FV is now fourth order, one should explore performance of the 2nd order version as well in order to understand the performance challenges of today's applications compared to emerging solvers. This can be realized by changing operators.fv4.c to operators.fv2.c in finite-volume/source/local.mk and rebuilding. Alternately, one can simply compile operators.fv2.c instead of operators.fv2.c
Alternate Programming Models and Languages
In order to explore the performance and productivity tradeoffs, HPGMG has been ported to several programming models. First, a CUDA 6/7 implementation was created by NVIDIA and can be used to gauge the performance implications of Managed Memory when running Multigrid solvers on an NVIDIA GPU. A UPC++ implementation was created in order to quantify the benefits of one-sided communication within the v-cycle (ghost zone exchanges, restrictions, and interpolations). In order to quantify the productivity benefits of writing code in a high-level description, v0.2 of HPGMG was ported to the Halide DSL. This approach provides single source portability to both CPUs and GPUs.