We discussed adversarial dataset construction and dynamic
benchmarking in this episode with Douwe Kiela, a research scientist
at Facebook AI Research who has been working on a dynamic
benchmarking platform called Dynabench. Dynamic benchmarking tries
to address the issue of many recent datasets getting solved with
little progress being made towards solving the corresponding tasks.
The idea is to involve models in the data collection loop to
encourage humans to provide data points that are hard for those
models, thereby continuously collecting harder datasets. We
discussed the details of this approach, and some potential caveats.
We also discussed dynamic leaderboards, a recent addition to
Dynabench that rank systems based on their utility given specific
use cases. Papers discussed in this episode: 1. Dynabench:
Rethinking Benchmarking in NLP
2. Dynaboard: An Evaluation-As-A-Service Platform for Holistic
Next-Generation Benchmarking
3. Adversarial NLI: A New Benchmark for Natural Language
4. DynaSent: A Dynamic Benchmark for Sentiment Analysis
Douwe Kiela's webpage: https://douwekiela.github.io/ The hosts for
this episode are Pradeep Dasigi and Alexis Ross.

