Skip to main content

Tutorial 3: Learning to Act, Q-Learning

Difficulty level

In this tutorial, you will learn how to act in the more realistic setting of sequential decisions, formalized by Markov Decision Processes (MDPs). In a sequential decision problem, the actions executed in one state not only may lead to immediate rewards (as in a bandit problem), but may also affect the states experienced next (unlike a bandit problem). Each individual action may therefore affect affect all future rewards. Thus, making decisions in this setting requires considering each action in terms of their expected cumulative future reward.

Topics covered in this lesson
  • What grid worlds are and how they help in evaluating simple reinforcement learning agents
  • The basics of the Q-learning algorithm for estimating action values
  • How the concept of exploration and exploitation, reviewed in the bandit case, also applies to the sequential decision setting

Experience with Python Programming Language.

Back to the course