98 - Analyzing Information Flow In Transformers, With Elena Voita
What function do the different attention heads se…
37 Minuten
Podcast
Podcaster
Beschreibung
vor 6 Jahren
What function do the different attention heads serve in
multi-headed attention models? In this episode, Lena describes how
to use attribution methods to assess the importance and
contribution of different heads in several tasks, and describes a
gating mechanism to prune the number of effective heads used when
combined with an auxiliary loss. Then, we discuss Lena’s work on
studying the evolution of representations of individual tokens in
transformers model. Lena’s homepage: https://lena-voita.github.io/
Blog posts: https://lena-voita.github.io/posts/acl19_heads.html
https://lena-voita.github.io/posts/emnlp19_evolution.html Papers:
https://arxiv.org/abs/1905.09418 https://arxiv.org/abs/1909.01380
multi-headed attention models? In this episode, Lena describes how
to use attribution methods to assess the importance and
contribution of different heads in several tasks, and describes a
gating mechanism to prune the number of effective heads used when
combined with an auxiliary loss. Then, we discuss Lena’s work on
studying the evolution of representations of individual tokens in
transformers model. Lena’s homepage: https://lena-voita.github.io/
Blog posts: https://lena-voita.github.io/posts/acl19_heads.html
https://lena-voita.github.io/posts/emnlp19_evolution.html Papers:
https://arxiv.org/abs/1905.09418 https://arxiv.org/abs/1909.01380
Weitere Episoden
30 Minuten
vor 2 Jahren
51 Minuten
vor 2 Jahren
45 Minuten
vor 2 Jahren
48 Minuten
vor 2 Jahren
36 Minuten
vor 2 Jahren
In Podcasts werben
Kommentare (0)