Skip to main content

Reinforcement Learning

Level
Beginner

Neuromatch Academy aims to introduce traditional and emerging tools of computational neuroscience to trainees. It is appropriate for student population ranging from undergraduates to faculty in academic settings and also includes industry professionals. In addition to teaching the technical details of computational methods, Neuromatch Academy also provide a curriculum centered on modern neuroscience concepts taught by leading professors along with explicit instruction on how and why to apply models.

 

This course provides an introduction to the features of a Reinforcement Learning (RL) system, general methods for predicting state values, an overview of the control problem in RL, and brief introduction to function approximation and deep RL.

Course Features
Lectures
Videos
Tutorials
Suggested reading
Recordings of question and answer sessions
Discussion forum on Neurostars.org
Lessons of this Course
1
1
Duration:
39:12
Speaker:

This lecture provides an introduction to a variety of topics in reinforcement learning.

2
2
Duration:
6:57
Speaker:

This tutorial presents how to estimate state-value functions in a classical conditioning paradigm using Temporal Difference (TD) learning and examine TD-errors at the presentation of the conditioned and unconditioned stimulus (CS and US) under different CS-US contingencies. These exercises will provide you with an understanding of both how reward prediction errors (RPEs) behave in classical conditioning and what we should expect to see if dopamine represents a "canonical" model-free RPE.

3
3
Duration:
6:55
Speaker:

In this tutorial, you will use 'bandits' to understand the fundamentals of how a policy interacts with the learning algorithm in reinforcement learning.

4
4
Duration:
11:16

In this tutorial, you will learn how to act in the more realistic setting of sequential decisions, formalized by Markov Decision Processes (MDPs). In a sequential decision problem, the actions executed in one state not only may lead to immediate rewards (as in a bandit problem), but may also affect the states experienced next (unlike a bandit problem). Each individual action may therefore affect affect all future rewards. Thus, making decisions in this setting requires considering each action in terms of their expected cumulative future reward.

5
5
Duration:
9:10

In this tutorial, you will implement one of the simplest model-based reinforcement learning algorithms, Dyna-Q. You will understand what a world model is, how it can improve the agent's policy, and the situations in which model-based algorithms are more advantageous than their model-free counterparts.

6
6
Duration:
33:25
Speaker:

This lecture highlights up-and-coming issues in the neuroscience of reinforcement learning.