Skip to navigation Skip to content
Careers | Phone Book | A - Z Index
Performance and Algorithms Research

November 2018 Results

Here are the results for the fourth order (operators.fv4.c / operators.flux.c) HPGMG-FV implementation (v0.3). Each machine was allowed to use any amount of memory per node, but three problem sizes were benchmarked: h(max), 2h(max/8), and 4h(max/64). Note, 'OMP' represents the number of OpenMP (or other) threads per process while 'ACC' represents the number of accelerators per process.  Multiple entries represent baseline and optimized implementations.

Currently, machines are ranked based on peak DOF/s (almost invariably problem size h). Nevertheless, we are considering alternate metrics such as the sum, mean, geometric mean, and median. Feedback from the community is welcome. Note, due to scheduling and allocation limitations, some machines were evaluated at reduced concurrency.

    10^9 DOF/s Parallelization DOF per Top500
Rank Site System h, 2h, 4h
MPI OMP ACC Process Rank
1

RIKEN Center for Computational Science (R-CCS)
Japan

K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect, Fujitsu

1243
897
412

82944 8 0 72M -
2

National Supercomputing Center in Wuxi
China 

Sunway TaihuLight - Sunway MPP, SW26010 260C 1.45GHz, Sunway, 
NRCPC 

1036
565
163

131072 1 1 32M -
3 DOE / SC / LBNL / NERSC
United States
Cori - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect
Cray

859
376
87

65536 8  0 16M -
4

DOE / SC / Argonne National Laboratory
United States

Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom interconnect
IBM

500
313
107

49152 64  0 36M -
5

HLRS - Höchstleistungsrechenzentrum Stuttgart
Germany

Hazel Hen - Cray XC40, Xeon E5-2680v3 12C 2.5GHz, Aries interconnect
Cray Inc.

495
411
221

15408  12  0 192M -
6

DOE / SC / Oak Ridge National Laboratory
United States

Titan - Cray XK7, Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x
Cray Inc.

440
163
38.9

16384 4 32M  -
     (CPU-only) 161
82.5
23.7
36864 8 48M   
7

King Abdullah University of Science and Technology
Saudi Arabia

Shaheen II - Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Aries interconnect
Cray Inc.

326
287
175

12288 16 0 144M -
8 DOE / SC / LBNL / NERSC
United States
Edison - Cray XC30, Intel Xeon E5-2695v2 12C 2.4GHz, Aries interconnect
Cray Inc.

296
246
127

10648 12 128M  -

9

Swiss National Supercomputing Centre (CSCS)
Switzerland

Piz Daint - Cray XC30, Xeon E5-2670 8C 2.600GHz, Aries interconnect, NVIDIA K20x
Cray Inc.

 153
68.8
18.5

4096 8 1 32M -
    (CPU-only) 85.1
62.6
24.7
4096 16M   -
10

Cyberscience Center,
Tohoku University
Japan

SX-ACE, 4C 1GHz, IXS
NEC

73.8
45.2
15.6

4096 1 0 128M -
11

Leibniz Rechenzentrum (LRZ)
Germany

SuperMUC - iDataPlex DX360M4, Xeon E5-2680 8C 2.70GHz, Infiniband FDR
IBM/Lenovo

72.5
52.5
28.0

4096 8 0 54M  -
12 National Supercomputer Centre (NSC)
Sweden
Stratus - Intel Xeon Gold 6130 16c 2.1GHz, Omni-Path

54.1
46.5
26.7

2496 8 0 141M -
13

DOE / EERE / NREL 
United States

Peregrine - Apollo 8000, Xeon E5-2670v3 12c 2.30GHz, Infiniband FDR 
Hewlett Packard Enterprise

10.0
3.24
0.442

1024 12 0 16M  -
14

DOE / EERE / NREL 
United States

Peregrine - Apollo 8000, Xeon E5-2695v2 12c 2.40GHz, Infiniband FDR 
Hewlett Packard Enterprise

5.29
2.26
0.482

512 12  0 16M -
15 HLRS - Höchstleistungsrechenzentrum Stuttgart
Germany
NEC SX-ACE, 4C 1GHz, Custom interconnect
NEC
3.24
1.77
0.751
 256  1 32M