Tutorial 3: Learning to Act: Q-Learning

Neuromatch Academy

Difficulty level

Beginner

Speaker

Type

Duration

11:16

Topic

Computational neuroscience

In this tutorial, you will learn how to act in the more realistic setting of sequential decisions, formalized by Markov Decision Processes (MDPs). In a sequential decision problem, the actions executed in one state not only may lead to immediate rewards (as in a bandit problem), but may also affect the states experienced next (unlike a bandit problem). Each individual action may therefore affect affect all future rewards. Thus, making decisions in this setting requires considering each action in terms of their expected cumulative future reward.

Topics covered in this lesson

What grid worlds are and how they help in evaluating simple reinforcement learning agents
The basics of the Q-learning algorithm for estimating action values
How the concept of exploration and exploitation, reviewed in the bandit case, also applies to the sequential decision setting

External Links

Tutorial Exercises

Tutorial Slides

Neuromatch Academy

Prerequisites

Experience with Python Programming Language