Performance Analysis of AI Hardware and Software
The performance characteristics of AI training and inference can be quite distinct from HPC applications despite possessing similar computational methods (large/small matrix multiplications, stencils, gather/scatter, etc...) albeit at reduced precision (single, half, BFLOAT16). Where possible, vendors are attempting to create specialized architectures subset of computations used in AI training and inference. Understanding the interplay between science, AI method, framework, and architecture is essential in not only in quantifying the computational potential for current and future architectures running AI models, but also identifying the bottlenecks and the ultimate limits of today's models.
Research Topics
Researchers
- Samuel Williams
-
Nick Wright
-
Khaled Ibrahim
-
Hai Ah Nam
-
Leonid Oliker
-
Tan Nguyen
-
Nan Ding
-
Steve Farrell
-
Wahid Bhimji
Publications
2022
Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", RESDIS, https://resdis.github.io/ws/2022/sc/, November 18, 2022,
- Download File: Methodology-for-Evaluating-the-Potential-of-Disaggregated-Memory-Systems.pdf (pdf: 5.1 MB)
K. Ibrahim, L. Oliker,, "Preprocessing Pipeline Optimization for Scientific Deep-Learning Workloads", IPDPS 22, June 3, 2022,
- Download File: SciML-optimization-12.pdf (pdf: 17 MB)
2021
Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,
- Download File: pmbs21-DL-final.pdf (pdf: 632 KB)