November 2016 Results
Here are the results for the fourth order (operators.fv4.c) HPGMG-FV implementation (v0.3). Each machine was allowed to use any amount of memory per node, but three problem sizes were benchmarked: h(max), 2h(max/8), and 4h(max/64). Note, 'OMP' represents the number of OpenMP (or other) threads per process while 'ACC' represents the number of accelerators per process. Multiple entries represent baseline and optimized implementations.
Currently, machines are ranked based on peak DOF/s (almost invariably problem size h). Nevertheless, we are considering alternate metrics such as the sum, mean, geometric mean, and median. Feedback from the community is welcome. Note, due to scheduling and allocation limitations, some machines were evaluated at reduced concurrency.
10^9 DOF/s | Parallelization | DOF per | Top500 | |||||
Rank | Site | System | h, 2h, 4h |
MPI | OMP | ACC | Process | Rank |
1 | DOE / SC / Argonne National Laboratory United States |
Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom interconnect IBM |
500 |
49152 | 64 | 0 | 36M | 6 |
(baseline) |
395 |
49152 | 64 | 0 | 36M | |||
2 |
HLRS - Höchstleistungsrechenzentrum Stuttgart |
Hazel Hen - Cray XC40, Xeon E5-2680v3 12C 2.5GHz, Aries interconnect Cray Inc. |
495 |
15408 | 12 | 0 | 192M | 9 |
3 |
DOE / SC / Oak Ridge National Laboratory |
Titan - Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x Cray Inc. |
440 |
16384 | 4 | 1 | 32M | 3 |
(CPU-only) |
161 |
36864 | 8 | 0 | 48M | |||
4 |
King Abdullah University of Science and Technology |
Shaheen II - Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect Cray Inc. |
326 |
12288 | 16 | 0 | 144M | 10 |
5 |
DOE / SC / LBNL / NERSC |
Edison - Cray XC30, Intel Xeon E5-2695v2 12C 2.4GHz, Aries interconnect Cray Inc. |
296 |
10648 | 12 | 0 | 128M | 49 |
6 | Swiss National Supercomputing Centre (CSCS) Switzerland |
Piz Daint - Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect, NVIDIA K20x Cray Inc. |
153 |
4096 | 8 | 1 | 32M | 8 |
(CPU-only) |
85.1 |
4096 | 8 | 0 | 16M | |||
7 |
Cyberscience Center, |
SX-ACE, 4C 1GHz, IXS |
73.8 |
4096 | 1 | 0 | 128M | - |
8 |
Leibniz Rechenzentrum (LRZ) |
SuperMUC - iDataPlex DX360M4, Xeon E5-2680 8C 2.70GHz, Infiniband FDR IBM/Lenovo |
72.5 |
4096 | 8 | 0 | 54M | 27 |
9 |
DOE / EERE / NREL |
Peregrine - Apollo 8000, Xeon E5-2670v3 12c 2.30GHz, Infiniband FDR Hewlett Packard Enterprise |
10.0 |
1024 | 12 | 0 | 16M | - |
10 |
DOE / EERE / NREL |
Peregrine - Apollo 8000, Xeon E5-2695v2 12c 2.40GHz, Infiniband FDR |
5.29 |
512 | 12 | 0 | 16M | - |
11 |
HLRS - Höchstleistungsrechenzentrum Stuttgart |
NEC SX-ACE, 4C 1GHz, Custom interconnect NEC |
3.24 |
256 | 1 | 0 | 32M | - |
12 |
DOE / SC / LBNL / NERSC |
Babbage - Xeon E5-2670 8C 2.600GHz, Intel Xeon Phi (KNC), Infiniband |
0.762 |
256 | 45 | 0 | 8M | - |
13 | DOE / SC / LBNL / NERSC
United States |
Intel Xeon Phi 7250 68C 1.400GHz (KNL/QF) whitebox |
0.170 |
1 | 64 | 0 | 128M | - |
(baseline) |
0.128 |
1 | 64 | 0 | 128M | - | ||