98 - Analyzing Information Flow In Transformers, With Elena Voita

98 - Analyzing Information Flow In Transformers, With Elena Voita

What function do the different attention heads se…
37 Minuten
Podcast
Podcaster
**The podcast is currently on hiatus. For more ac…

Beschreibung

vor 6 Jahren
What function do the different attention heads serve in
multi-headed attention models? In this episode, Lena describes how
to use attribution methods to assess the importance and
contribution of different heads in several tasks, and describes a
gating mechanism to prune the number of effective heads used when
combined with an auxiliary loss. Then, we discuss Lena’s work on
studying the evolution of representations of individual tokens in
transformers model. Lena’s homepage: https://lena-voita.github.io/
Blog posts: https://lena-voita.github.io/posts/acl19_heads.html
https://lena-voita.github.io/posts/emnlp19_evolution.html Papers:
https://arxiv.org/abs/1905.09418 https://arxiv.org/abs/1909.01380

Kommentare (0)

Lade Inhalte...

Abonnenten

15
15