Skip to content

Reproducibility Checklist

Use this checklist before sharing a benchmark result, supplementary archive, or manuscript table.

Required run information

  • Manifest file or manifest copy from the generated run directory.
  • run_id.
  • lrdbench version.
  • Git commit when run from a repository checkout.
  • Python version and platform from manifest/environment.json.
  • Random seeds from the manifest.
  • Benchmark mode and target estimand.

Required outputs

  • tables/run_summary.csv.
  • tables/per_stratum_metrics.csv.
  • tables/leaderboard.csv when a leaderboard is reported.
  • tables/estimator_metadata.csv.
  • tables/failures.csv and tables/failure_map.csv.
  • raw/records.csv, raw/estimates.csv, and raw/metrics.csv.
  • artefacts/artefact_index.csv.

Checks to run

lrdbench validate path/to/manifest.yaml
lrdbench run path/to/manifest.yaml
lrdbench validate-output reports/<run_id>

For code changes, run the relevant test subset before reporting results:

pytest tests/integration/test_smoke_run.py tests/integration/test_custom_estimator_workflow.py

Reporting guidance

State whether results are ground-truth, stress-test, or observational. Report leaderboard rules and component weights whenever ranks are shown. For observational data, avoid accuracy language unless external truth is available and declared in the benchmark design.