June 2018 Results
Here are the results for the fourth order (operators.fv4.c / operators.flux.c) HPGMG-FV implementation (v0.3). Each machine was allowed to use any amount of memory per node, but three problem sizes were benchmarked: h(max), 2h(max/8), and 4h(max/64). Note, 'OMP' represents the number of OpenMP (or other) threads per process while 'ACC' represents the number of accelerators per process. Multiple entries represent baseline and optimized implementations.
Currently, machines are ranked based on peak DOF/s (almost invariably problem size h). Nevertheless, we are considering alternate metrics such as the sum, mean, geometric mean, and median. Feedback from the community is welcome. Note, due to scheduling and allocation limitations, some machines were evaluated at reduced concurrency.
10^9 DOF/s | Parallelization | DOF per | Top500 | |||||
Rank | Site | System | h, 2h, 4h |
MPI | OMP | ACC | Process | Rank |
1 |
RIKEN Center for Computational Science (R-CCS) |
K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect, Fujitsu |
1243 |
82944 | 8 | 0 | 72M | - |
2 |
National Supercomputing Center in Wuxi |
Sunway TaihuLight - Sunway MPP, SW26010 260C 1.45GHz, Sunway, |
1036 |
131072 | 1 | 1 | 32M | - |
3 | DOE / SC / LBNL / NERSC United States |
Cori - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect Cray |
859 |
65536 | 8 | 0 | 16M | - |
4 |
DOE / SC / Argonne National Laboratory |
Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom interconnect IBM |
500 |
49152 | 64 | 0 | 36M | - |
5 |
HLRS - Höchstleistungsrechenzentrum Stuttgart |
Hazel Hen - Cray XC40, Xeon E5-2680v3 12C 2.5GHz, Aries interconnect Cray Inc. |
495 |
15408 | 12 | 0 | 192M | - |
6 |
DOE / SC / Oak Ridge National Laboratory |
Titan - Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x Cray Inc. |
440 |
16384 | 4 | 1 | 32M | - |
(CPU-only) | 161 82.5 23.7 |
36864 | 8 | 0 | 48M | |||
7 |
King Abdullah University of Science and Technology |
Shaheen II - Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect Cray Inc. |
326 |
12288 | 16 | 0 | 144M | - |
8 | DOE / SC / LBNL / NERSC United States |
Edison - Cray XC30, Intel Xeon E5-2695v2 12C 2.4GHz, Aries interconnect Cray Inc. |
296 |
10648 | 12 | 0 | 128M | - |
9 |
Swiss National Supercomputing Centre (CSCS) |
Piz Daint - Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect, NVIDIA K20x |
153 |
4096 | 8 | 1 | 32M | - |
(CPU-only) | 85.1 62.6 24.7 |
4096 | 8 | 0 | 16M | - | ||
10 |
Cyberscience Center, |
SX-ACE, 4C 1GHz, IXS NEC |
73.8 |
4096 | 1 | 0 | 128M | - |
11 |
Leibniz Rechenzentrum (LRZ) |
SuperMUC - iDataPlex DX360M4, Xeon E5-2680 8C 2.70GHz, Infiniband FDR IBM/Lenovo |
72.5 |
4096 | 8 | 0 | 54M | - |
12 |
DOE / EERE / NREL |
Peregrine - Apollo 8000, Xeon E5-2670v3 12c 2.30GHz, Infiniband FDR |
10.0 |
1024 | 12 | 0 | 16M | - |
13 |
DOE / EERE / NREL |
Peregrine - Apollo 8000, Xeon E5-2695v2 12c 2.40GHz, Infiniband FDR Hewlett Packard Enterprise |
5.29 |
512 | 12 | 0 | 16M | - |
14 | HLRS - Höchstleistungsrechenzentrum Stuttgart Germany |
NEC SX-ACE, 4C 1GHz, Custom interconnect NEC |
3.24 1.77 0.751 |
256 | 1 | 0 | 32M | - |