Loading…
USENIX ATC '24 and OSDI '24
Attending this event?
Friday July 12, 2024 2:55pm - 3:20pm PDT
Zhen Xie, Binghamton University; Murali Emani, Argonne National Laboratory; Xiaodong Yu, Stevens Institute of Technology; Dingwen Tao, Indiana University; Xin He, Xidian University; Pengfei Su, University of California, Merced; Keren Zhou, George Mason University; Venkatram Vishwanath, Argonne National Laboratory

For an extended period, graphics processing units (GPUs) have stood as the exclusive choice for training deep neural network (DNN) models. Over time, to serve the growing demands in a more targeted manner, various artificial intelligence-specific hardware, referred to as AI accelerators, have been vigorously developed, aiming to provide more efficient DNN acceleration solutions. However, sufficient solutions are also heterogeneous and thus introduce complexities in accelerator selection. Given a DNN model and a training objective, such as throughput or price-performance ratio, it remains challenging to arrive at the optimal decision among many options due to high reimplementation costs and unexpected performance.

To tackle this challenge, we propose Centimani, a performance predictor that accurately and rapidly predicts DNN training throughput on various AI accelerators, thereby facilitating the accelerator selection process. To achieve this goal, we first analyze typical AI accelerators and draw observations that abstract AI accelerator designs and guide our performance modeling approach. In particular, we construct a memory estimation model and decoupled performance models to select the most appropriate batch size and predict the execution time of DNN training. We validate our approach by applying Centimani to six common DNN models on four typical AI accelerators. Results show that Centimani predicts the throughput with an average accuracy of 93.1% on single-device training and 90.4% on multiple-device training, thus the optimal accelerator corresponding to the user's training objective can be obtained.

https://www.usenix.org/conference/atc24/presentation/xie
Friday July 12, 2024 2:55pm - 3:20pm PDT
Grand Ballroom EF

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link