Duke Computer Science Colloquium

From Mixing Time to Sample Complexity: Elucidating the Design Space of Score-Based Losses

Monday, November 4, 12:00 pm - 1:00 pm

Speaker(s): Andrej Risteki, Carnegie Mellon

Lunch

Lunch will be served at 11:45 AM.

Abstract

Score-based losses have emerged as a more computationally appealing alternative to maximum likelihood for fitting (probabilistic) generative models with an intractable likelihood (for example, energy-based models and diffusion models). What is gained by foregoing maximum likelihood is a tractable gradient-based training algorithm. What is lost is less clear: in particular, since maximum likelihood is asymptotically optimal in terms of statistical efficiency, how suboptimal are score-based losses?
I will survey a recently developing connection relating the statistical efficiency of broad families of generalized score losses, to the algorithmic efficiency of a natural inference-time algorithm: namely, the mixing time of a suitable diffusion using the score that can be used to draw samples from the model. This “dictionary” allows us to elucidate the design space for score losses with good statistical behavior, by “translating” techniques for speeding up Markov chain convergence (e.g., preconditioning and lifting). I will also touch upon a parallel story for learning discrete probability distributions, in which the "analogue" of score-based losses is played by masked prediction-like losses and some implications for designing non-autoregressive generative models for text. Finally, time-permitting, I will make some remarks on co-designing pre-training and inference time procedures in foundation models in light of recent interest in inference-time scaling laws.
Based in part on https://arxiv.org/abs/2210.00726, https://arxiv.org/abs/2306.09332, https://arxiv.org/abs/2306.01993, https://arxiv.org/abs/2407.21046.

Speaker Bio

Andrej Risteski is an Assistant Professor in the Machine Learning Department at Carnegie Mellon University. He received his PhD from the Computer Science Department at Princeton University. His research interests lie in the intersection of machine learning, statistics, and theoretical computer science, spanning topics like (probabilistic) generative models, algorithmic tools for learning and inference, representation and self-supervised learning, out-of-distribution generalization and applications of neural approaches to natural language processing and scientific domains. More broadly, the goal of his research is principled and mathematical understanding of statistical and algorithmic phenomena and problems arising in modern machine learning. He is a recipient of an Amazon Research Award, a Google Research Scholar Award and an NSF CAREER Award.

Web Link

https://duke.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d1dc59dc-d…

LSRC D106

Contact

Rong Ge

rongge@cs.duke.edu

From Mixing Time to Sample Complexity: Elucidating the Design Space of Score-Based Losses

Speaker(s): Andrej Risteki, Carnegie Mellon

Lunch

Abstract

Speaker Bio

Web Link

LSRC D106

Contact

Department of Computer Science

Undergraduate

Graduate

General Information

Connect