Loading…
USENIX ATC '24 and OSDI '24
Attending this event?
Wednesday July 10, 2024 12:00pm - 12:25pm PDT
Hao Wu, Yue Yu, and Junxiao Deng, Huazhong University of Science and Technology; Shadi Ibrahim, Inria; Song Wu and Hao Fan, Huazhong University of Science and Technology and Jinyinhu Laboratory; Ziyue Cheng, Huazhong University of Science and Technology; Hai Jin, Huazhong University of Science and Technology and Jinyinhu Laboratory

The dynamic workload and latency sensitivity of DNN inference drive a trend toward exploiting serverless computing for scalable DNN inference serving. Usually, GPUs are spatially partitioned to serve multiple co-located functions. However, existing serverless inference systems isolate functions in separate monolithic GPU runtimes (e.g., CUDA context), which is too heavy for short-lived and fine-grained functions, leading to a high startup latency, a large memory footprint, and expensive inter-function communication. In this paper, we present StreamBox, a new lightweight GPU sandbox for serverless inference workflow. StreamBox unleashes the potential of streams and efficiently realizes them for serverless inference by implementing fine-grain and auto-scaling memory management, allowing transparent and efficient intra-GPU communication across functions, and enabling PCIe bandwidth sharing among concurrent streams. Our evaluations over real-world workloads show that StreamBox reduces the GPU memory footprint by up to 82% and improves throughput by 6.7X compared to state-of-the-art serverless inference systems.

https://www.usenix.org/conference/atc24/presentation/wu-hao
Wednesday July 10, 2024 12:00pm - 12:25pm PDT
Grand Ballroom CD

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link