Master's Thesis Proposal
[Advisor: Dr. Evangelos Theodorou]
"Integrating Perception into Safe Differentiable Control"
Tuesday, December 13
Weber SS&T 200
A great challenge exists at the intersection of perception and controls – integrating the uncertainty present in perception-based state and obstacle estimation into safe control and trajectory optimization. This thesis proposes a model-based learning framework with a policy defined by a safe differentiable optimal controller. We will leverage many of the ideas of the world model, an unsupervised reinforcement learning technique that has achieved human-level or better-than-human performance on many Atari games. Specifically, we intend on training a variational autoencoder, a common unsupervised image processing technique, to learn a latent space representation that is decoded into a form that a safety function can be defined on, such as a depth map or occupancy grid. We will learn the dynamics of this latent space, as well as a mapping from the latent space directly to a safety function, to provide a differentiable controller with information on how the agent and the environment changes over time as a function of the control actions. The controller will have the safety function embedded into the dynamics using barrier states. The barrier state (BaS), and its discrete counterpart (DBaS), is a recently developed method of embedding the safety of a system into the dynamics, providing greater safety information than penalty methods, a regularizing effect, and safety guarantees to complex dynamical systems in environments with many obstacles. Tolerant discrete barrier states (TDBaS) approximate the safety guarantees of DBaS while improving exploration, allowing for unsafe initial state trajectories, and providing several parameters that can be intuitively tuned for any application. This thesis explores how differentiable trajectory optimization can learn these TDBaS safety parameters given safety uncertainty in a reinforcement learning setting with limited supervision. Towards this end, we will explore a variety of strategies and structures for the encoder-decoder network, the dynamics network, the safety function network, and the differentiable controller such as Parametric Differentiable Dynamic Programming (PDDP), Pontryagin Differentiable Programming (PDP), Barrier Nets, and Differentiable MPC. We will test this framework in simulation, and if time allows, on hardware in the Indoor Flight Laboratory or Robotarium.
• Prof. Evangelos Theodorou – School of Aerospace Engineering (advisor)
• Prof. Kyriakos G. Vamvoudakis – School of Aerospace Engineering
• Prof. Patricio Vela – School of Electrical and Computer Engineering