Overview of Deep Reinforcement Learning Methods

Prof. Steven L. Brunton

Slide at 14:01

DEEP Q-LEARNING
a + -
Q(s,a) ~ Q(s,a,0)
PARAMETERIZE Q FUNCTION WITH NN

Share slide

Summary (AI generated)

In solving this problem, I will write down the cost function involved. This is the neural network cost function used when building a deep Q learner. Essentially, the loss function that the network is trying to minimize is the expectation of the square of the temporal difference error. The cue functions are parameters represented by theta, and the neural network will use stochastic gradient descent back propagation to optimize these parameters to give the best possible Q function that minimizes the temporal difference error.

There is strong evidence that biological learners are also minimizing this temporal difference error at some level of their neurological hardware. The update can be turned into a loss function, and the neural network can optimize those parameters for a powerful approach. This has been seen in the Deepmind Atari video game playing, where a deep Q learner with a convolutional level layer can learn from the pixel space.