Overview of Deep Reinforcement Learning Methods

Prof. Steven L. Brunton

Slide at 21:03

ADVANTAGE ACTOR-CRITIC NETWORK
ACTOR: DEEP POLICY
NETWORK
Q(Skako2)
CRITIC: DEEP DUELING
Q NETWORK
UPDATE

Share slide

Summary (AI generated)

In this transcript, the speaker discusses the policy iteration and policy gradient iteration techniques. They note that the latter is faster than traditional model-free techniques, but requires a model with parameters (θ) to take the derivative. The speaker also mentions the use of an actor-critic method, in which a Q network is used to learn the quality function, and the policy is updated using a policy gradient network. The speaker notes that this approach combines value-based and policy-based optimization, which is different from Q-learning, where the Q function is updated based on Q information and the policy is optimized separately. Overall, the speaker finds this approach to be a cool and innovative way to optimize policies.