DEGAS: Dynamic Exascale Global Address Space Programming Environments
Dynamic Exascale Global Address Space Programming Environments
The Dynamic, Exascale Global Address Space programming environment (DEGAS) project will develop the next generation of programming models, runtime systems and tools to meet the challenges of Exascale systems. We will develop a new set of programming concepts based on a hierarchical model of parallelism and data locality, hierarchical fault containment/recovery for resilience, introspective dynamic resource management, demonstrate them using extensions to existing languages, and evaluate their utility for applications. Our solution will address the following key challenges posed by exascale systems:
- Scalability: Efficient communication (extended GASNet) and synchronization mechanisms combined with compiler (ROSE) and runtime optimizations to minimize both.
- Programmability: Rich set of programming constructs based on a dynamic, resilient Partitioned Global Address Space (PGAS) model, demonstrated in multiple language dialects (C and FORTRAN).
- Performance Portability: Non-invasive profiling (IPM), deep code analysis (ROSE) and a dynamically Adaptive RunTime System (ARTS).
- Resilience: Containment Domains and state capture mechanisms and lightweight, asynchronous recovery mechanisms.
- Energy Efficiency: Runtime energy adaptation and communication-optimal code generation.
- Interoperability: Runtime and language interoperability with MPI, OpenMP and libraries (Lithe).
The DEGAS team will work with Co-Design centers to drive the programming construct design, combined with information about hardware platforms as it emerges. We will also leverage ongoing discussions with other application and vendor stakeholders as well as mainstream language standards groups, augmented with advisory committees and semi-annual retreats involving broad representation from all three groups.
Our approach focuses on a vertically integrated programming and execution environment that incorporates the latest algorithmic approaches and application structures to effectively service ultra-scale science and energy applications. The primary focus areas of DEGAS are shown in Figure 1 along with the proposed integrated software stack.
- Katherine Yelick, Principal Investigator (LBNL)
- Krste Asanović (UC Berkeley)
- James Demmel (UC Berkeley)
- Mattan Erez (UT Austin)
- Paul Hargrove (LBNL)
- Steven Hofmeyr (LBNL)
- Costin Iancu (LBNL)
- Khaled Ibrahim (LBNL)
- John Mellor-Crummey (Rice University)
- Leonid Oliker (LBNL)
- Dan Quinlan (LLNL)
- Eric Roman (LBNL)
- Vivek Sarkar (Rice University)
- Erich Strohmaier (LBNL)
- Yili Zheng (LBNL)
Khaled Z. Ibrahim, Evgeny Epifanovsky, Samuel Williams, Anna I. Krylov, "Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends", Journal of Parallel and Distributed Computing (JPDC), February 2017, doi: 10.1016/j.jpdc.2017.02.010
Nicholas Chaimov, Khaled Z. Ibrahim, Samuel Williams, Costin Iancu, "Reaching Bandwidth Saturation Using Transparent Injection Parallelization", International Journal of High Performance Computing Applications (IJHPCA), November 2016, doi: 10.1177/1094342016672720
Nicholas Chaimov, Khaled Ibrahim, Samuel Williams, Costin Iancu, "Exploiting Communication Concurrency on High Performance Computing Systems", IJHPCA, April 17, 2015,
- Download File: thorserv2.pdf (pdf: 1.7 MB)
Nathan Zhang, Michael Driscoll, Armando Fox, Charles Markley, Samuel Williams, Protonu Basu, "Snowflake: A Lightweight Portable Stencil DSL", High-level Parallel Programming Models and Supportive Environments (HIPS), May 2017,
- Download File: hips17-snowflake.pdf (pdf: 475 KB)
M Ellis, E Georganas, R Egan, S Hofmeyr, A Buluç, B Cook, L Oliker, K Yelick, "Performance characterization of de novo genome assembly on leading parallel systems", Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, 10417 LN:79--91, doi: 10.1007/978-3-319-64203-1_6
E Georganas, M Ellis, R Egan, S Hofmeyr, A Buluç, B Cook, L Oliker, K Yelick, "MerBench: PGAS benchmarks for high performance genome assembly", Proceedings of PAW 2017: 2nd Annual PGAS Applications Workshop - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis, 2017, 2017-Jan:1--4, doi: 10.1145/3144779.3169109
H Shan, S Williams, Y Zheng, W Zhang, B Wang, S Ethier, Z Zhao, IEEE, "Experiences of Applying One-Sided Communication to Nearest-Neighbor Communication", PROCEEDINGS OF PAW 2016: 1ST PGAS APPLICATIONS WORKSHOP (PAW), January 2016, 17--24, doi: 10.1109/PAW.2016.008
- Download File: PAW16-stencil.pdf (pdf: 601 KB)
Hongzhang Shan, Samuel Williams, Yili Zheng, Amir Kamil, Katherine Yelick, "Implementing High-Performance Geometric Multigrid Solver With Naturally Grained Messages", 9th International Conference on Partitioned Global Address Space Programming Models (PGAS), September 2015,
- Download File: pgas15-hpgmg.pdf (pdf: 803 KB)
Scott French, Yili Zheng, Barbara Romanowicz, Katherine Yelick, "Parallel Hessian Assembly for Seismic Waveform Inversion Using Global Updates", International Parallel and Distributed Processing Symposium (IPDPS), May 2015,
Costin Iancu, Nicholas Chaimov, Khaled Z. Ibrahim, Samuel Williams, "Exploiting Communication Concurrency on High Performance Computing Systems", Programming Models and Applications for Multicores and Manycores (PMAM), February 2015,
- Download File: pmam15-servers.pdf (pdf: 1.2 MB)
Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushik Sen, John Mellor Crummey, Costin Iancu, "Barrier Elision for Production Parallel Programs", PPOPP 2015, February 5, 2015,
- Download File: nwbar.pdf (pdf: 663 KB)
M Chabbi, W Lavrijsen, W De Jong, K Sen, J Mellor-Crummey, C Iancu, "Barrier elision for production parallel programs", Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, January 1, 2015, 2015-Jan:109--119, doi: 10.1145/2688500.2688502
E Georganas, A Buluç, J Chapman, S Hofmeyr, C Aluru, R Egan, L Oliker, D Rokhsar, K Yelick, "HipMer: An extreme-scale de novo genome assembler", International Conference for High Performance Computing, Networking, Storage and Analysis, SC, January 1, 2015, 15-20-No, doi: 10.1145/2807591.2807664
Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, Katherine Yelick, "Parallel de bruijn graph construction and traversal for de novo genome assembly", Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'14), November 2014,
- Download File: sc14genome.pdf (pdf: 719 KB)
Hongzhang Shan, Amir Kamil, Samuel Williams, Yili Zheng, Katherine Yelick, "Evaluation of PGAS Communication Paradigms with Geometric Multigrid", 8th International Conference on Partitioned Global Address Space Programming Models (PGAS), October 2014, doi: 10.1145/2676870.2676874
- Download File: PGAS14-miniGMG.pdf (pdf: 1.2 MB)
Vivek Kumar, Yili Zheng, Vincent Cavé, Zoran Budimlic, Vivek Sarkar, "HabaneroUPC++: a Compiler-free PGAS Library", 8th International Conference on Partitioned Global Address Space Programming Models (PGAS), October 2014,
Amir Kamil, Yili Zheng, Katherine Yelick, "A Local-View Array Library for Partitioned Global Address Space C++ Programs", ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, June 2014,
Multidimensional arrays are an important data structure in many scientific applications. Unfortunately, built-in support for such arrays is inadequate in C++, particularly in the distributed setting where bulk communication operations are required for good performance. In this paper, we present a multidimensional library for partitioned global address space (PGAS) programs, supporting the one-sided remote access and bulk operations of the PGAS model. The library is based on Titanium arrays, which have proven to provide good productivity and performance. These arrays provide a local view of data, where each rank constructs its own portion of a global data structure, matching the local view of execution common to PGAS programs and providing maximum flexibility in structuring global data. Unlike Titanium, which has its own compiler with array-specific analyses, optimizations, and code generation, we implement multidimensional arrays solely through a C++ library. The main goal of this effort is to provide a library-based implementation that can match the productivity and performance of a compiler-based approach. We implement the array library as an extension to UPC++, a C++ library for PGAS programs, and we extend Titanium arrays with specializations to improve performance. We evaluate the array library by porting four Titanium benchmarks to UPC++, demonstrating that it can achieve up to 25% better performance than Titanium without a significant increase in programmer effort.
Michael Garland, Manjunath Kudlur, Yili Zheng, "Designing a Unified Programming Model for Heterogeneous Machines", Supercomputing (SC), November 2012,
Mads Kristensen, Yili Zheng, Brian Vinter, "PGAS for Distributed Numerical Python Targeting Multi-core Clusters", IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2012,
Samuel Williams, At Exascale, Will Bandwidth Be Free?, DOE ModSim Workshop, 2013,
- Download File: modsim2013SWWilliams.pdf (pdf: 408 KB)