Overview of Deep Reinforcement Learning Methods

Prof. Steven L. Brunton

Slide at 17:05

DEEP Q-LEARNING
Qnew ()ad a + -
Q(s,a) ~ Q(s,a,6)
PARAMETERIZE Q FUNCTION WITH NN
ADVANTAGE NETWORK
Q(s,a,0)=V(s,A1)+A(s,a,02
DEEP DUELING Q NETWORK (DDQN)

Share slide

Summary (AI generated)

There's a variation of the quality function called deep dueling Q networks or dueling deep Q networks (D.D.Q.N.). This method splits the quality function into two networks: a value network that is a function of the current state, and an advantage network that determines the advantage of taking an action in that state. This architecture is useful when the difference in quality for different actions is subtle. The value function is optimized to explain the Q function from the state, while the advantage network determines the effect of taking actions.

Another important concept in reinforcement learning is actor-critic learning. Actor-critic methods combine the best of policy-based and value-based learning. In actor-critic learning, there are two learners: an actor and a critic. The actor learns a good policy, while the critic critiques that policy based on its estimate of the value function. Essentially, the actor represents the policy and the critic learns the value function.

One simple way to implement actor-critic learning is to use the policy gradient algorithm. The parameters of the policy are updated based on information from the critic's estimate of the value function.