Overview of Deep Reinforcement Learning Methods
Prof. Steven L. Brunton
Slide at 21:03
Summary (AI generated)
In this transcript, the speaker discusses the policy iteration and policy gradient iteration techniques. They note that the latter is faster than traditional model-free techniques, but requires a model with parameters (θ) to take the derivative. The speaker also mentions the use of an actor-critic method, in which a Q network is used to learn the quality function, and the policy is updated using a policy gradient network. The speaker notes that this approach combines value-based and policy-based optimization, which is different from Q-learning, where the Q function is updated based on Q information and the policy is optimized separately. Overall, the speaker finds this approach to be a cool and innovative way to optimize policies.