Overview of Deep Reinforcement Learning Methods

Prof. Steven L. Brunton

Slide at 19:38

ACTOR-CRITIC NETWORK
s, a) a, 0)
ACTOR: POLICY BASED
CRITIC: VALUE BASED
USE TD SIGNAL FROM CRITIC TO
UPDATE POLICY PARAMETERS

Share slide

Summary (AI generated)

This passage discusses a policy update method that utilizes the temporal difference signal from a value learner in order to update the value function. The critic provides an error signal that is used to update the policy, resulting in a combination of value and gradient policy information.

One method that can be used in the context of deep neural networks is the advantage actor-critic network, which utilizes a deep dueling Q network to split the quality function into the value function and the advantage of taking an action. The actor is a deep policy network with weights θ, while the critic is a deep dueling Q network that assesses the quality of taking an action in a given state.