Name: StreamBox: A Lightweight GPU SandBox for Serverless Inference Workflow
Start: 2024-07-10T12:00:00-0700
End: 2024-07-10T12:25:00-0700

Wednesday July 10, 2024 12:00pm - 12:25pm PDT

Grand Ballroom CD

Hao Wu, Yue Yu, and Junxiao Deng, Huazhong University of Science and Technology; Shadi Ibrahim, Inria; Song Wu and Hao Fan, Huazhong University of Science and Technology and Jinyinhu Laboratory; Ziyue Cheng, Huazhong University of Science and Technology; Hai Jin, Huazhong University of Science and Technology and Jinyinhu Laboratory

The dynamic workload and latency sensitivity of DNN inference drive a trend toward exploiting serverless computing for scalable DNN inference serving. Usually, GPUs are spatially partitioned to serve multiple co-located functions. However, existing serverless inference systems isolate functions in separate monolithic GPU runtimes (e.g., CUDA context), which is too heavy for short-lived and fine-grained functions, leading to a high startup latency, a large memory footprint, and expensive inter-function communication. In this paper, we present StreamBox, a new lightweight GPU sandbox for serverless inference workflow. StreamBox unleashes the potential of streams and efficiently realizes them for serverless inference by implementing fine-grain and auto-scaling memory management, allowing transparent and efficient intra-GPU communication across functions, and enabling PCIe bandwidth sharing among concurrent streams. Our evaluations over real-world workloads show that StreamBox reduces the GPU memory footprint by up to 82% and improves throughput by 6.7X compared to state-of-the-art serverless inference systems.

https://www.usenix.org/conference/atc24/presentation/wu-hao

Wednesday July 10, 2024 12:00pm - 12:25pm PDT
Grand Ballroom CD

USENIX ATC Track 1

USENIX ATC '24 and OSDI '24

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!