Attention and Transformer

NYU Center for Data Science

Difficulty level

Advanced

Speaker

Type

Duration

1:12:00

Topic

Deep learning

This tutorial covers advanced concept of energy-based models. The lecture is a part of the Associative Memories module of the the Deep Learning Course at NYU's Center for Data Science.

Topics covered in this lesson

Chapters:

00:00 – Welcome to class
00:15 – Listening to YouTube from the terminal
00:36 – Summarising papers with @Notion
01:45 – Reading papers collaboratively
03:15 – Attention! Self / cross, hard / soft
06:44 – Use cases: set encoding! 1
2:10 – Self-attention
28:45 – Key-value store
29:32 – Queries, keys, and values → self-attention
39:49 – Queries, keys, and values → cross-attention
45:27 – Implementation details
48:11 – The Transformer: an encoder-predictor-decoder architecture
54:59 – The Transformer encoder
56:47 – The Transformer “decoder” (which is an encoder-predictor-decoder module)
1:01:49 – Jupyter Notebook and PyTorch implementation of a Transformer encoder
1:10:51 – Goodbye :)

External Links

Lecture Slides

Lecture Notes

Transformers on GitHub

Yann LeCun's Deep learning course at CDS

Prerequisites

Back to the course

Deep Learning: Associative Memories

Attention and Transformer

Contact info

Links

Related sites