Overview of Deep Reinforcement Learning Methods
Prof. Steven L. Brunton
Slide at 19:38
Summary (AI generated)
This passage discusses a policy update method that utilizes the temporal difference signal from a value learner in order to update the value function. The critic provides an error signal that is used to update the policy, resulting in a combination of value and gradient policy information.
One method that can be used in the context of deep neural networks is the advantage actor-critic network, which utilizes a deep dueling Q network to split the quality function into the value function and the advantage of taking an action. The actor is a deep policy network with weights θ, while the critic is a deep dueling Q network that assesses the quality of taking an action in a given state.