128 - Dynamic Benchmarking, with Douwe Kiela

128 - Dynamic Benchmarking, with Douwe Kiela

We discussed adversarial dataset construction and…
47 Minuten
Podcast
Podcaster
Welcome to the NLP highlights podcast, where we i…

Beschreibung

vor 1 Jahr
We discussed adversarial dataset construction and dynamic
benchmarking in this episode with Douwe Kiela, a research scientist
at Facebook AI Research who has been working on a dynamic
benchmarking platform called Dynabench. Dynamic benchmarking tries
to address the issue of many recent datasets getting solved with
little progress being made towards solving the corresponding tasks.
The idea is to involve models in the data collection loop to
encourage humans to provide data points that are hard for those
models, thereby continuously collecting harder datasets. We
discussed the details of this approach, and some potential caveats.
We also discussed dynamic leaderboards, a recent addition to
Dynabench that rank systems based on their utility given specific
use cases. Papers discussed in this episode: 1. Dynabench:
Rethinking Benchmarking in NLP
(https://www.semanticscholar.org/paper/Dynabench%3A-Rethinking-Benchmarking-in-NLP-Kiela-Bartolo/77a096d80eb4dd4ccd103d1660c5a5498f7d026b)
2. Dynaboard: An Evaluation-As-A-Service Platform for Holistic
Next-Generation Benchmarking
(https://www.semanticscholar.org/paper/Dynaboard%3A-An-Evaluation-As-A-Service-Platform-for-Ma-Ethayarajh/d25bb256e5b69f769a429750217b0d9ec1cf4d86)
3. Adversarial NLI: A New Benchmark for Natural Language
Understanding
(https://www.semanticscholar.org/paper/Adversarial-NLI%3A-A-New-Benchmark-for-Natural-Nie-Williams/9d87300892911275520a4f7a5e5abf4f1c002fec)
4. DynaSent: A Dynamic Benchmark for Sentiment Analysis
(https://www.semanticscholar.org/paper/DynaSent%3A-A-Dynamic-Benchmark-for-Sentiment-Potts-Wu/284dfcf7f25ca87b2db235c6cdc848b4143d3923)
Douwe Kiela's webpage: https://douwekiela.github.io/ The hosts for
this episode are Pradeep Dasigi and Alexis Ross.

Kommentare (0)

Lade Inhalte...

Abonnenten

15
15
:
: