Lunch will be served at 11:45 AM.
Multi-GPU systems have gained significant popularity in modern computing. While employing multiple GPUs intuitively offers aggregated memory capacity and combined computational parallelism, the delivered performance rarely keeps up with the increase in GPU counts. The scalability is severely limited by several factors, such as inefficient address translation, non-uniform memory accesses, and inter-GPU communication overheads. Consequently, critical questions remain unaddressed: How to design multi-GPU computing architectures? and How to harness multi-GPU advantages in emerging applications? In this talk, I will share my research on maximizing the potential of multi-GPU computing. First, I will discuss our work of short-circuiting page table walks to mitigate the address translation wall. Next, I will introduce our lightweight invalidation approach to reduce page migration overhead. Finally, I will introduce our work on efficient multi-tenancy in multi-instance GPUs through shared-aware sub-entry TLBs. Looking ahead, I will outline my vision for next-generation computing systems, including harness GPUs advantages for LLM inference, efficient GPU virtualization, and scalable heterogeneous systems.
Bingyao Li is a final-year PhD student at the University of Pittsburgh, advised by Dr. Xulong Tang. Her research interests lie broadly in advanced computer architecture, high-performance computing, and emerging parallel applications, with a focus on GPU ecosystem from architecture to applications. Her work has been recognized by top-tier computer architecture conferences including multiple MICRO, HPCA and DAC. More information can be found at her website: https://libingyao.github.io.