Performance Analysis of AI Hardware and Software

The performance characteristics of AI training and inference can be quite distinct from HPC applications despite possessing similar computational methods (large/small matrix multiplications, stencils, gather/scatter, etc...) albeit at reduced precision (single, half, BFLOAT16). Where possible, vendors are attempting to create specialized architectures subset of computations used in AI training and inference. Understanding the interplay between science, AI method, framework, and architecture is essential in not only in quantifying the computational potential for current and future architectures running AI models, but also identifying the bottlenecks and the ultimate limits of today's models.

Research Topics

Researchers

Samuel Williams
Nick Wright
Khaled Ibrahim
Hai Ah Nam
Leonid Oliker
Tan Nguyen
Nan Ding
Steve Farrell
Wahid Bhimji

Publications

Sort by: Date | Author | Type

2022

Nan Ding, Samuel Williams, Hai Ah Nam, Taylor Groves, Muaaz Gul Awan, Christopher Delay, Oguz Selvitopi, Leonid Oliker, Nicholas Wright, "Methodology for Evaluating the Potential of Disaggregated Memory Systems", RESDIS, https://resdis.github.io/ws/2022/sc/, November 18, 2022,

Download File: Methodology-for-Evaluating-the-Potential-of-Disaggregated-Memory-Systems.pdf (pdf: 5.1 MB)

K. Ibrahim, L. Oliker,, "Preprocessing Pipeline Optimization for Scientific Deep-Learning Workloads", IPDPS 22, June 3, 2022,

Download File: SciML-optimization-12.pdf (pdf: 17 MB)

2021

Khaled Z. Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas J. Wright, Samuel Williams, "Architectural Requirements for Deep Learning Workloads in HPC Environments", (BEST PAPER), Performance Modeling, Benchmarking, and Simulation (PMBS), November 2021,

Download File: pmbs21-DL-final.pdf (pdf: 632 KB)