About this Event
1520 Middle Drive, Knoxville, TN 37996
https://www.eecs.utk.edu/
Tensor Offloading: GPU Memory-Efficient Execution Paradigm for Large AI Models
Abstract
Given the increase of AI model size and sacristy of GPU hardware, training and deploying AI models is often constrained by GPU memory capacity. Using CPU memory as an extension to GPU memory, tensor offloading to CPU memory provides a cost-effective solution to save GPU memory while enabling larger AI models. However, this execution paradigm introduces extra data movement, and faces a series of challenges, such as tensor-migration promptness and granularity, load balancing, and tensor coherence. In this talk, we present our efforts that use tensor/computation co-offloading and a learning-based approach to address those challenges. Our work enables industry-quality transformer models with tens of billion parameters on a single GPU, a 10x increase in size compared to popular frameworks such as PyTorch, and we do so without requiring any model change from data scientists or sacrificing computational efficiency. Our work has been integrated into Microsoft DeepSpeed, and it is now being utilized across the industry to democratize the use of large AI models.
Biography
Dong Li, associate professor at Electrical Engineering and Computer Science, works at the University of California, Merced. Previously, he was a research scientist at the Oak Ridge National Laboratory (ORNL), studying computer architecture and programming models for next generation supercomputer systems. Li earned his PhD in computer science from Virginia Tech. His research focuses on high performance computing (HPC), and maintains a strong relevance to computer systems. Li received an ORNL/CSMD Distinguished Contributor Award in 2013, a CAREER Award from the National Science Foundation in 2016, Facebook faculty research award in 2021, Oracle Research Award in 2022, Amazon Research Award in 2025. His paper in SC'14 was in the best paper final list. His paper in ASPLOS'21 won the distinguished artifact award. He was also the lead PI for the NVIDIA CUDA Research Center at UC Merced. He is an associate editor for IEEE Transactions on Parallel and Distributed Systems (TPDS).
0 people are interested in this event