Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Performance and Algorithms Research

June 2017 Results

Here are the results for the fourth order (operators.fv4.c) HPGMG-FV implementation (v0.3). Each machine was allowed to use any amount of memory per node, but three problem sizes were benchmarked: h(max), 2h(max/8), and 4h(max/64). Note, 'OMP' represents the number of OpenMP (or other) threads per process while 'ACC' represents the number of accelerators per process.  Multiple entries represent baseline and optimized implementations.

Currently, machines are ranked based on peak DOF/s (almost invariably problem size h). Nevertheless, we are considering alternate metrics such as the sum, mean, geometric mean, and median. Feedback from the community is welcome. Note, due to scheduling and allocation limitations, some machines were evaluated at reduced concurrency.

    10^9 DOF/s Parallelization DOF per Top500
Rank Site System h, 2h, 4h
MPI OMP ACC Process Rank
DOE / SC / LBNL / NERSC
United States
Cori - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect

Cray

859

376
87

65536  8  0 16M  6
2

National Supercomputing Center in Wuxi
China

Sunway TaihuLight - Sunway MPP, SW26010 260C 1.45GHz, Sunway,
NRCPC

819
486
157

131072 1  1 32M  1
3 DOE / SC / Argonne National Laboratory
United States
Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom interconnect
IBM

500
313
107

49152 64  0 36M 9
     (baseline)

395
286
107

49152 64  0 36M  
4

HLRS - Höchstleistungsrechenzentrum Stuttgart
Germany

Hazel Hen - Cray XC40, Xeon E5-2680v3 12C 2.5GHz, Aries interconnect
Cray Inc.

495
411
221

15408 12  0 192M 17
5

DOE / SC / Oak Ridge National Laboratory
United States

Titan - Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x
Cray Inc.

440
163
38.9

16384 4 1 32M 4
    (CPU-only)

161
82.5
23.7

36864 8 0 48M  
6

King Abdullah University of Science and Technology
Saudi Arabia

Shaheen II - Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect
Cray Inc.

326
287
175

12288 16 0 144M 18
7

DOE / SC / LBNL / NERSC
United States

Edison - Cray XC30, Intel Xeon E5-2695v2 12C 2.4GHz, Aries interconnect
Cray Inc.

296
246
127

10648 12 0 128M 72
8 Swiss National Supercomputing Centre (CSCS)
Switzerland
Piz Daint - Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect, NVIDIA K20x
Cray Inc.

153
68.8
18.5

4096 8 1 32M -
    (CPU-only)

85.1
62.6
24.7

4096 8 0 16M  

9

Cyberscience Center,
Tohoku University
Japan

SX-ACE, 4C 1GHz, IXS
NEC

73.8
45.2
15.6

4096 1 0 128M -
10

Leibniz Rechenzentrum (LRZ)
Germany

SuperMUC - iDataPlex DX360M4, Xeon E5-2680 8C 2.70GHz, Infiniband FDR
IBM/Lenovo

72.5
52.5
28.0

4096 8 0 54M -
11

DOE / EERE / NREL 
United States

Peregrine - Apollo 8000, Xeon E5-2670v3 12c 2.30GHz, Infiniband FDR 
Hewlett Packard Enterprise

10.0
3.24
0.442

1024 12 0 16M  -
12

DOE / EERE / NREL
United States

Peregrine - Apollo 8000, Xeon E5-2695v2 12c 2.40GHz, Infiniband FDR
Hewlett Packard Enterprise

5.29
2.26
0.482

512 12 0 16M  -
13

HLRS - Höchstleistungsrechenzentrum Stuttgart
Germany

NEC SX-ACE, 4C 1GHz, Custom interconnect
NEC

3.24
1.77
0.751

256 1  0 32M -