Duke Computer Science Seminar

Structure in High-Dimensional Reinforcement Learning

Friday, April 3, 3:00 pm - 4:00 pm

Speaker(s): Saket Tiwari, PhD, University of California at Santa Barbara

Bio

Saket Tiwari is a postdoctoral researcher at the University of California, Santa Barbara, in the REAL AI Lab. He received his PhD from Brown University under the supervision of Prof. George Konidaris. His work focuses on developing a new effective theory of deep RL in continuous environments, combining elements of optimal control, geometry, and optimization theory. Prior to his PhD, he obtained a master’s degree from the University of Massachusetts Amherst and a bachelor’s degree from IIT Bombay.

Abstract

Deep reinforcement learning (RL) for control from pixels still relies heavily on discretized models, where agents choose from finitely many actions and receive finitely many observations. This limits our ability to design better optimizers and architectures, and to understand modern RL algorithms at a deeper level. In this talk, I will introduce and analyze a new optimizer for RL agents with high-dimensional observations and overparameterized neural network function approximators. To do this, I study an episodic actor-critic algorithm for the continuous-time linear-quadratic regulator problem. The agent receives high-dimensional observations generated from low-dimensional latent states. To model overparameterized neural networks while keeping the analysis tractable, we use deep linear networks whose weight matrices have mutually orthonormal columns, meaning they lie on the Stiefel manifold. Under these assumptions, we show that the learning dynamics of the high-dimensional problem is analogous to optimization in the underlying low-dimensional latent state, revealing a hidden structure. We support this theory with experiments on pixel-based robotic control tasks trained end-to-end from reward, without auxiliary representation losses, showing that proximal policy optimization (PPO) with Stiefel-manifold optimization outperforms PPO with Adam. I will close by proposing a new metaphor for RL: one that is continuous in states, actions, and time.

Zoom Link: https://duke.zoom.us/j/97206204162?pwd=zyUwHF2DZa4SMTm9N1zNvyE9rbuEcC.1

LSRC D344

Contact

Zechen Wu

zechen.wu@duke.edu

Research page for Saket Tiwari

Structure in High-Dimensional Reinforcement Learning

Speaker(s): Saket Tiwari, PhD, University of California at Santa Barbara

LSRC D344

Contact

Department of Computer Science

Undergraduate

Graduate

General Information

Connect