Hongzhang Shan

Computer Scientist

Phone: +1 510 495 2339

Shan's research interests focus on parallel programming models, performance tuning of large-scale applications, and performance modeling and benchmarking for architecture evaluation, AI algorithms and big data processing

Journal Articles

Hongzhang Shan, Filip Blagojevic, Seung-Jai Min, Paul Hargrove, Haoqiang Jin, Karl Fuerlinger, Alice Koniges, Nicholas J. Wright, "A Programming Model Performance Study Using the NAS Parallel Benchmarks", Scientific Programming -Exploring Languages for Expressing Medium to Massive On-Chip Parallelism, August 1, 2010, vol.18, doi: 10.3233/SPR-2010-0306

Download File: scientific2010.pdf (pdf: 2.1 MB)

J. Borrill, L. Oliker, J. Shalf, H. Shan, A. Uselton, "HPC global file system performance analysis using a scientific-application derived benchmark", Parallel Computing, 2009, 35:358-373, doi: 10.1016/j.parco.2009.02.002

Download File: parco09-MADbench.pdf (pdf: 4.4 MB)

R. Biswas, L. Oliker, H. Shan, "Parallel Computing Strategies for Irregular Algorithms", Annual Review of Scalable Computing, April 2003,

Download File: ARSCsubmit.pdf (pdf: 690 KB)

Hongzhang Shan, Jaswinder P. Singh, Leonid Oliker, Rupak Biswas, "Message Passing and Shared Address Space Parallelism on an SMP Cluster", Parallel Computing Journal, Volume 29, Issue 2, February 2003,

Download File: pc03-smp.pdf (pdf: 307 KB)

H. Shan, J. P. Singh, L. Oliker, R. Biswas, "A Comparison of Three Programming Models for Adaptive Applications on the Origin2000", Journal of Parallel and Distributed Computing (JPDC), January 1, 2002, doi: doi:10.1006/jpdc.2001.1777

H. Shan, J. Singh, "A Comparison of Three Programming Models for Adaptive Applications on the Origin2000", Extended Version: Journal of Parallel and Distributed Computing, 2002,

Download File: jpdc02.pdf (pdf: 376 KB)

Conference Papers

Hongzhang Shan, Samuel Williams, Calvin W. Johnson, "Improving MPI Reduction Performance for Manycore Architectures with OpenMP and Data Compression", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2018,

Download File: pmbs18-reduce-final.pdf (pdf: 572 KB)

Tuomas Koskela, Zakhar Matveev, Charlene Yang, Adetokunbo Adedoyin, Roman Belenov, Philippe Thierry, Zhengji Zhao, Rahulkumar Gayatri, Hongzhang Shan, Leonid Oliker, Jack Deslippe, Ron Green, and Samuel Williams, "A Novel Multi-Level Integrated Roofline Model Approach for Performance Characterization", ISC, June 2018,

Download File: ISC18-RooflineAdvisor-final.pdf (pdf: 966 KB)

Philip C. Roth, Hongzhang Shan, David Riegner, Nikolas Antolin, Sarat Sreepathi, Leonid Oliker, Samuel Williams, Shirley Moore, Wolfgang Windl, "Performance Analysis and Optimization of the RAMPAGE Metal Alloy Potential Generation Software", SIGPLAN International Workshop on Software Engineering for Parallel Systems (SEPS), October 2017,

Hongzhang Shan, Samuel Williams, Calvin Johnson, Kenneth McElvain, "A Locality-based Threading Algorithm for the Configuration-Interaction Method", Parallel and Distributed Scientific and Engineering Computing (PDSEC), June 2017,

Download File: pdsec17-bigstick.pdf (pdf: 715 KB)

H Shan, S Williams, Y Zheng, W Zhang, B Wang, S Ethier, Z Zhao, IEEE, "Experiences of Applying One-Sided Communication to Nearest-Neighbor Communication", PROCEEDINGS OF PAW 2016: 1ST PGAS APPLICATIONS WORKSHOP (PAW), January 2016, 17--24, doi: 10.1109/PAW.2016.008

Download File: PAW16-stencil.pdf (pdf: 601 KB)

Hongzhang Shan, Kenneth McElvain, Calvin Johnson, Samuel Williams, W. Erich Ormand, "Parallel Implementation and Performance Optimization of the Configuration-Interaction Method", Supercomputing (SC), November 2015, doi: 10.1145/2807591.2807618

Download File: sc15-bigstick.pdf (pdf: 864 KB)

Hongzhang Shan, Samuel Williams, Yili Zheng, Amir Kamil, Katherine Yelick,, "Implementing High-Performance Geometric Multigrid Solver with Naturally Grained Messages", 9th International Conference on Partitioned Global Address Space Programming Models (PGAS), September 2015, 38--46, doi: 10.1109/PGAS.2015.12

Download File: pgas15-hpgmg.pdf (pdf: 803 KB)

Hongzhang Shan, Samuel Williams, Wibe de Jong, Leonid Oliker, "Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture", Programming Models and Applications for Multicores and Manycores (PMAM), February 2015,

Download File: pmam15nwchem.pdf (pdf: 1.1 MB)

Hongzhang Shan, Amir Kamil, Samuel Williams, Yili Zheng, Katherine Yelick, "Evaluation of PGAS Communication Paradigms with Geometric Multigrid", Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (PGAS), October 2014, doi: 10.1145/2676870.2676874

Download File: PGAS14-miniGMG.pdf (pdf: 1.2 MB)

Partitioned Global Address Space (PGAS) languages and one-sided communication enable application developers to select the communication paradigm that balances the performance needs of applications with the productivity desires of programmers. In this paper, we evaluate three different one-sided communication paradigms in the context of geometric multigrid using the miniGMG benchmark. Although miniGMG's static, regular, and predictable communication does not exploit the ultimate potential of PGAS models, multigrid solvers appear in many contemporary applications and represent one of the most important communication patterns. We use UPC++, a PGAS extension of C++, as the vehicle for our evaluation, though our work is applicable to any of the existing PGAS languages and models. We compare performance with the highly tuned MPI baseline, and the results indicate that the most promising approach towards achieving performance and ease of programming is to use high-level abstractions, such as the multidimensional arrays provided by UPC++, that hide data aggregation and messaging in the runtime library.

W.A. de Jong, L. Lin, H. Shan, C. Yang and L. Oliker, "Towards modelling complex mesoscale molecular environments", International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE), 2014,

Yili Zheng, Amir Kamil, Michael B. Driscoll, Hongzhang Shan, Katherine Yelick, "UPC++: A PGAS extension for C++", International Parallel and Distributed Processing Symposium (IPDPS), May 19, 2014, 1105--1114, doi: 10.1109/IPDPS.2014.115

Partitioned Global Address Space (PGAS) languages are convenient for expressing algorithms with large, random-access data, and they have proven to provide high performance and scalability through lightweight one-sided communication and locality control. While very convenient for moving data around the system, PGAS languages have taken different views on the model of computation, with the static Single Program Multiple Data (SPMD) model providing the best scalability. In this paper we present UPC++, a PGAS extension for C++ that has three main objectives: 1) to provide an object-oriented PGAS programming model in the context of the popular C++ language, 2) to add useful parallel programming idioms unavailable in UPC, such as asynchronous remote function invocation and multidimensional arrays, to support complex scientific applications, 3) to offer an easy on-ramp to PGAS programming through interoperability with other existing parallel programming systems (e.g., MPI, OpenMP, CUDA). We implement UPC++ with a "compiler-free" approach using C++ templates and runtime libraries. We borrow heavily from previous PGAS languages and describe the design decisions that led to this particular set of language features, providing significantly more expressiveness than UPC with very similar performance characteristics. We evaluate the programmability and performance of UPC++ using five benchmarks on two representative supercomputers, demonstrating that UPC++ can deliver excellent performance at large scale up to 32K cores while offering PGAS productivity features to C++ applications.

Hongzhang Shan, Brian Austin, Wibe de Jong, Leonid Oliker, Nick Wright, Edoardo Apra, "Performance Tuning of Fock Matrix and Two Electron Integral Calculations for NWChem on Leading HPC Platforms", Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), November 2013, doi: 10.1007/978-3-319-10214-6_13

Hongzhang Shan, Brian Austin, Nicholas Wright, Erich Strohmaier, John Shalf, Katherine Yelick, "Accelerating Applications at Scale Using One-Sided Communication", Santa Barbara, CA, The 6th Conference on Partitioned Global Address Programming Models, October 10, 2012,

Download File: ScaleUsingOneSided.pdf (pdf: 522 KB)

Hongzhang Shan, Erich Strohmaier, James Amundson, Eric G. Stern, "Optimizing The Advanced Accelerator Simulation Framework Synergia Using OpenMP", IWOMP'12 Proceedings of the 8th International Conference on OpenMP, June 11, 2012,

Download File: synergia.pdf (pdf: 850 KB)

H Shan, NJ Wright, J Shalf, K Yelick, M Wagner, N Wichmann, "A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI", PMBS 11 - Proceedings of the 2nd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, Co-located with SC 11, January 1, 2011, 13--14, doi: 10.1145/2088457.2088467

Download File: pmbs11.pdf (pdf: 497 KB)

Hongzhang Shan, Erich Strohmaier, "Developing a Parameterized Performance Proxy for Sequential Scientific Kernels", 12th IEEE International Conference on High Performance Computing and Communications (HPCC), 2010, September 1, 2010, doi: 10.1109/HPCC.2010.50

Zhengji Zhao, Juan Meza, Byounghak Lee, Hongzhang Shan, Eric Strohmaier, David H. Bailey, Lin-Wang Wang, "The linearly scaling 3D fragment method for large scale electronic structure calculations", Journal of Physics: Conference Series, July 1, 2009,

Lin-Wang Wang, Byounghak Lee, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier, David H. Bailey, "Linearly scaling 3D fragment method for large-scale electronic structure calculations", Proceedings of SC08, November 2008,

Hongzhang Shan, Antypas, John Shalf, "Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark", SC, 2008, 42,

Download File: sc08.ior.pdf (pdf: 290 KB)

J. Carter, Y. He, J. Shalf, H. Shan, E. Strohmaier, H. Wasserman, "The Performance Effect of Multi-core on Scientific Applications", Proceedings of Cray User Group, 2007, LBNL 62662,

Download File: CUG2007FINAL.pdf (pdf: 149 KB)

Hongzhang Shan and John Shalf, "Using IOR to Analyze the I/O performance for HPC Platforms", CUG.org, 2007, LBNL 62647,

Download File: cug07shan.pdf (pdf: 349 KB)

L. Oliker, A. Canning, J. Carter, C. Iancu, M. Lijewski, S. Kamil, J. Shalf, H. Shan, E. Strohmaier, S. Ethier, T. Goodale, "Scientific application performance on candidate petascale platforms", Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM, 2007, doi: 10.1109/IPDPS.2007.370259

Download File: ipdps07-petascale.pdf (pdf: 4.4 MB)

J. Borrill, L. Oliker. J. Shalf, H. Shan, "Investigation Of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2007,

Download File: SC07-MADbench2.pdf (pdf: 581 KB)

H Shan, E Strohmaier, J Qiang, DH Bailey, K Yelick, "Performance modeling and optimization of a high energy colliding beam simulation code", Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 06, January 2006, doi: 10.1145/1188455.1188557

L. Oliker, R. Biswas, J. Borrill, A. Canning, J. Carter, M.J. Djomehri, H. Shan, D. Skinner, "A performance evaluation of the cray X1 for scientific applications", Lecture Notes in Computer Science, 2005, 3402:51-65,

E. Strohmaier, Hongzhang Shan, "Architecture Independent Performance Characterization and Benchmarking for Scientific Applications", International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Volendam, The Netherlands, October 2004,

Download File: mascots.pdf (pdf: 382 KB)

Hongzhang Shan, E. Strohmaier, "Performance Characterization of Cray X1 and Their Implications for Application Performance Tuning", International Conference of Supercomputing, Malo, France, June 2004,

Download File: ics04-x1.pdf (pdf: 292 KB)

H. Shan, E. Strohmaier, L. Oliker, "Optimizing Performance of Superscalar Codes for a Single Cray X1 MSP", Proceedings of the 46th Cray User Group Conference:CUG, 2004,

L. Oliker, J. Borril, A. Canning, J. Carter, H. Shan, D. Skinner, R. Biswas, J. Djomheri, "A Performance Evaluation of the Cray X1 for Scientific Applications", VECPAR'04: 6th International Meeting on High Performance Computing for Computational Science, 2004,

Download File: vecpar04-x1.pdf (pdf: 224 KB)

H. Shan, L. Oliker, R. Biswas, W. Smith, "Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration", International Conference on Advanced Computing and Communication: ADCOM, 2004,

Download File: ADCOM04final.pdf (pdf: 195 KB)

H. Shan, L. Oliker, R.Biswas, "Job Superscheduler Architecture and Performance in Computational Grid Environments", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2003,

Download File: SC03-gridsched.pdf (pdf: 113 KB)

H. Shan, J. Singh, L. Oliker, R. Biswas, "Message Passing vs. Shared Address Space on a Cluster of SMPs", International Parallel & Distributed Processing Symposium (IPDPS), 2001,

Download File: ipdps01.pdf (pdf: 194 KB)

H. Shan, J. Singh, L. Oliker, R. Biswas, "A Comparison of Three Programming Models for Adaptive Applications on the Origin2000", International Conference for High Performance Computing, Networking, Storage and Analysis (SC) - BEST STUDENT PAPER AWARD, 2000,

Download File: sc00-programmingparadigms.pdf (pdf: 209 KB)

Book Chapters

David H. Bailey, Lin-Wang Wang, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier, Byounghak Lee, "Tuning an electronic structure code", Performance Tuning of Scientific Applications, edited by David H. Bailey, Robert F. Lucas, Samuel W. Williams, (CRC Press: 2011) Pages: 339-354 doi: 10.1201/b10509

L. Oliker, A. Canning, J. Carter, C. Iancu, M. Lijewski, S. Kamil, J. Shalf, H. Shan, E. Strohmaier, S. Ethier, T. Goodale, "Performance Characteristics of Potential Petascale Scientific Applications", Petascale Computing: Algorithms and Applications. Chapman & Hall/CRC Computational Science Series (Hardcover), edited by David A. Bader, ( 2007)

Chapter

Presentation/Talks

John Shalf, Honzhan Shan, Katie Antypas, I/O Requirements for HPC Applications, 2008,

Download File: ShalfFAST08.pdf (pdf: 15 MB)

Leonid Oliker, Julian Borrill, Hongzhang Shan, John Shalf, Investigation Of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark., 2007,

Download File: SC07-MadBench-talk.ppt (ppt: 2.7 MB)

John Shalf, Honzhang Shan, User Perspective on HPC I/O Requirements., 2007,

Download File: ShalfExascaleIO.ppt (ppt: 3.3 MB)

H. Shan, J. Singh, L. Oliker, R. Biswas, Design Strategies for Irregularly Adapting Parallel Applications, SIAM Conference on Parallel Processing, 2001,

Download File: siampp01abstacta.pdf (pdf: 1.6 MB)

Reports

Hongzhang Shan, Samuel Williams, Wibe de Jong, Leonid Oliker, "Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture", LBNL Technical Report, October 2014, LBNL 6806E,

Download File: rpt83549.PDF (PDF: 615 KB)

J. Levesque, J. Larkin, M. Foster, J. Glenski, G. Geissler, S. Whalen, B. Waldecker, J. Carter, D. Skinner, Y. He, H. Wasserman, J. Shalf, H. Shan, E. Strohmaier, "Understanding and Mitigating Multicore Performance Issues on the AMD Opteron Architecture", 2007, LBNL 62500,

Download File: LBNL-62500.v3.pdf (pdf: 2.4 MB)

Hongzhang Shan, John Shalf, "Analysis of Parallel IO on Modern HPC Platforms", 2006,

Download File: IOR.doc (doc: 399 KB)

Analysis of the parallel IO requirements from a number of HPC applications, combined with microbenchmarks to aid in understanding their performance.