Actor-Critic Method for Solving High Dimensional Hamilton-Jacobi-Bellman type PDEs
Jianfeng Lu
Actor-Critic Method for Solving High Dimensional Hamilton-Jacobi-Bellman type PDEs
In this talk, we will discuss numerical approach to solve high dimensional Hamilton-Jacobi-Bellman (HJB) type elliptic partial differential equations (PDEs). The HJB PDEs, reformulated as optimal control problems, are tackled by the actor-critic framework inspired by reinforcement learning, based on neural network parametrization of the value and control functions. Within the actor-critic framework, we employ a policy gradient approach to improve the control, while for the value function, we derive a variance reduced least-squares temporal difference method using stochastic calculus. We will also discuss convergence analysis for the actor-critic method, in particular the policy gradient method for solving stochastic optimal control. Joint work with Jiequn Han (Flatiron Institute) and Mo Zhou (Duke University).