Reproducibility Checklist¶
Use this checklist before sharing a benchmark result, supplementary archive, or manuscript table.
Required run information¶
- Manifest file or manifest copy from the generated run directory.
run_id.lrdbenchversion.- Git commit when run from a repository checkout.
- Python version and platform from
manifest/environment.json. - Random seeds from the manifest.
- Benchmark mode and target estimand.
Required outputs¶
tables/run_summary.csv.tables/per_stratum_metrics.csv.tables/leaderboard.csvwhen a leaderboard is reported.tables/estimator_metadata.csv.tables/failures.csvandtables/failure_map.csv.raw/records.csv,raw/estimates.csv, andraw/metrics.csv.artefacts/artefact_index.csv.
Checks to run¶
lrdbench validate path/to/manifest.yaml
lrdbench run path/to/manifest.yaml
lrdbench validate-output reports/<run_id>
For code changes, run the relevant test subset before reporting results:
pytest tests/integration/test_smoke_run.py tests/integration/test_custom_estimator_workflow.py
Reporting guidance¶
State whether results are ground-truth, stress-test, or observational. Report leaderboard rules and component weights whenever ranks are shown. For observational data, avoid accuracy language unless external truth is available and declared in the benchmark design.