Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Performance and Algorithms Research

June 2016 Results

Here are the results for the fourth order (operators.fv4.c) HPGMG-FV implementation (v0.3). Each machine was allowed to use any amount of memory per node, but three problem sizes were benchmarked: h(max), 2h(max/8), and 4h(max/64). Note, 'OMP' represents the number of OpenMP (or other) threads per process while 'ACC' represents the number of accelerators per process.  Multiple entries represent baseline and optimized implementations.

Currently, machines are ranked based on peak DOF/s (almost invariably problem size h). Nevertheless, we are considering alternate metrics such as the sum, mean, geometric mean, and median. Feedback from the community is welcome. Note, due to scheduling and allocation limitations, some machines were evaluated at reduced concurrency.

    DOF/s Parallelization DOF per Top500
Rank Site System h, 2h, 4h
MPI OMP ACC Process Rank
1 DOE / SC / Argonne National Laboratory
United States
Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom interconnect
IBM
5.00e11 3.13e11 1.07e11 49152 64  0 36M 6
     (baseline) 3.95e11 2.86e11 1.07e11 49152 64  0 36M  
2 HLRS - Höchstleistungsrechenzentrum Stuttgart
Germany
Hazel Hen - Cray XC40, Xeon E5-2680v3 12C 2.5GHz, Aries interconnect
Cray Inc.
4.95e11 4.11e11 2.21e11 15408 12  0 192M 9
3 DOE / SC / Oak Ridge National Laboratory
United States
Titan - Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x
Cray Inc.
4.40e11 1.63e11 3.89e10 16384 4 1 32M 3
    (CPU-only) 1.61e11 8.25e10 2.37e10 36864 8 0 48M  
4 King Abdullah University of Science and Technology
Saudi Arabia
Shaheen II - Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect
Cray Inc.
3.26e11 2.87e11 1.75e11 12288 16 0 144M 10
5 DOE / SC / LBNL / NERSC
United States
Edison - Cray XC30, Intel Xeon E5-2695v2 12C 2.4GHz, Aries interconnect
Cray Inc.
2.96e11 2.46e11 1.27e11 10648 12 0 128M 49
6 Swiss National Supercomputing Centre (CSCS)
Switzerland
Piz Daint - Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect, NVIDIA K20x
Cray Inc.
1.53e11 6.88e10 1.85e10 4096 8 1 32M 8
    (CPU-only) 8.51e10 6.26e10 2.47e10 4096 8 0 16M  
7 Leibniz Rechenzentrum (LRZ)
Germany
SuperMUC - iDataPlex DX360M4, Xeon E5-2680 8C 2.70GHz, Infiniband FDR
IBM/Lenovo
7.25e10 5.25e10 2.80e10 4096 8 0 54M 27
8 DOE / EERE / NREL 
United States
Peregrine - Apollo 8000, Xeon E5-2670v3 12c 2.30GHz, Infiniband FDR 
Hewlett Packard Enterprise
1.00e10 3.24e09 4.42e08 1024 12 0 16M  -
9

DOE / EERE / NREL
United States

Peregrine - Apollo 8000, Xeon E5-2695v2 12c 2.40GHz, Infiniband FDR
Hewlett Packard Enterprise

5.29e09 2.26e09 4.82e08 512 12 0 16M  -
10 HLRS - Höchstleistungsrechenzentrum Stuttgart
Germany
NEC SX-ACE, 4C 1GHz, Custom interconnect
NEC
3.24e09 1.77e09 7.51e08 256 1  0 32M -
11 DOE / SC / LBNL / NERSC
United States
Babbage - Xeon E5-2670 8C 2.600GHz, Intel Xeon Phi (KNC), Infiniband 7.62e08 3.16e08 9.93e07 256 45  0 8M -