Leaderboard Submission Policy¶
Leaderboards are accepted only as summaries of reproducible benchmark runs. A submitted rank is not a universal estimator ranking.
Required Materials¶
A leaderboard submission must include:
- the manifest or packaged suite name;
- the
lrdbenchversion or Git commit; - the output contract version;
manifest/environment.json;artefacts/artefact_index.csv;tables/leaderboard.csv;- raw metric exports under
raw/; - confirmation that
lrdbench validate-output <run_root>passed.
Eligible Runs¶
Use tracked public suites for comparable public submissions:
public_small_*for quick checks;public_medium_*for comparable public benchmark submissions.
Custom manifests are welcome for discussion, but they should be labelled as custom and should not be mixed with canonical public-suite leaderboards.
Estimator Requirements¶
Each estimator must declare:
- name and version;
- family;
- target estimand;
- assumptions;
- parameter settings;
- uncertainty behavior;
- known failure modes.
Third-party estimators should link to implementation code and tests. Invalid estimates, missing uncertainty, and warnings must remain visible in exported outputs.
Interpretation Rules¶
- Report component metrics alongside composite ranks.
- Do not compare estimators that target incompatible estimands unless the manifest explicitly defines the comparison.
- Do not use observational-mode leaderboards as evidence of ground-truth accuracy.
- Treat stress-test leaderboards as robustness summaries for declared contaminations only.
- Include failure and missing-output rates when reporting results.
Review Criteria¶
Submissions may be rejected or marked non-comparable when:
- required artefacts are missing;
- the output contract check fails;
- estimator metadata is incomplete;
- a custom manifest is presented as a canonical public-suite result;
- interpretation claims exceed the benchmark mode or metric definitions.