Construct Validity Checklist

Informed by our systematic review, we provide eight recommendations to ensure the construct validity of your benchmark. Download a PDF version of the checklist or the LaTeX code for use in your paper.

0/0
Tick items in the checklist below

Define the phenomenon

Measure only the phenomenon

Construct a representative dataset for the task

Acknowledge limitations of reusing datasets

Prepare for contamination

Use statistical methods to compare models

Conduct an error analysis

Justify construct validity