98 - Analyzing Information Flow In Transformers, With Elena Voita

What function do the different attention heads se…

37 Minuten

33.97 MB

Podcast

Podcaster

NLP Highlights

**The podcast is currently on hiatus. For more ac…

Wissenschaft

Beschreibung

vor 6 Jahren

What function do the different attention heads serve in
multi-headed attention models? In this episode, Lena describes how
to use attribution methods to assess the importance and
contribution of different heads in several tasks, and describes a
gating mechanism to prune the number of effective heads used when
combined with an auxiliary loss. Then, we discuss Lena’s work on
studying the evolution of representations of individual tokens in
transformers model. Lena’s homepage: https://lena-voita.github.io/
Blog posts: https://lena-voita.github.io/posts/acl19_heads.html
https://lena-voita.github.io/posts/emnlp19_evolution.html Papers:
https://arxiv.org/abs/1905.09418 https://arxiv.org/abs/1909.01380