Skip to content

Leaderboard Submission Policy

Leaderboards are accepted only as summaries of reproducible benchmark runs. A submitted rank is not a universal estimator ranking.

Required Materials

A leaderboard submission must include:

  • the manifest or packaged suite name;
  • the lrdbench version or Git commit;
  • the output contract version;
  • manifest/environment.json;
  • artefacts/artefact_index.csv;
  • tables/leaderboard.csv;
  • raw metric exports under raw/;
  • confirmation that lrdbench validate-output <run_root> passed.

Eligible Runs

Use tracked public suites for comparable public submissions:

  • public_small_* for quick checks;
  • public_medium_* for comparable public benchmark submissions.

Custom manifests are welcome for discussion, but they should be labelled as custom and should not be mixed with canonical public-suite leaderboards.

Estimator Requirements

Each estimator must declare:

  • name and version;
  • family;
  • target estimand;
  • assumptions;
  • parameter settings;
  • uncertainty behavior;
  • known failure modes.

Third-party estimators should link to implementation code and tests. Invalid estimates, missing uncertainty, and warnings must remain visible in exported outputs.

Interpretation Rules

  • Report component metrics alongside composite ranks.
  • Do not compare estimators that target incompatible estimands unless the manifest explicitly defines the comparison.
  • Do not use observational-mode leaderboards as evidence of ground-truth accuracy.
  • Treat stress-test leaderboards as robustness summaries for declared contaminations only.
  • Include failure and missing-output rates when reporting results.

Review Criteria

Submissions may be rejected or marked non-comparable when:

  • required artefacts are missing;
  • the output contract check fails;
  • estimator metadata is incomplete;
  • a custom manifest is presented as a canonical public-suite result;
  • interpretation claims exceed the benchmark mode or metric definitions.