Public Small Outputs¶
This page records the expected output shape for the tracked public_small_* suites. Generated
reports are intentionally ignored by Git; these notes provide a clean-clone verification target.
Expected Artefacts¶
Each suite should produce:
html/report.html;tables/run_summary.csv;tables/per_stratum_metrics.csv;tables/leaderboard.csv;tables/estimator_metadata.csv;tables/failures.csv;tables/failure_map.csv;tables/benchmark_uncertainty.csvwhen the manifest enables benchmark uncertainty;tables/estimator_disagreement.csvwhen disagreement metrics are requested;tables/scale_window_sensitivity.csvwhen variant-sensitivity metrics are requested;tables/stress_metrics.csvfor stress-test suites;manifest/environment.json;artefacts/artefact_index.csv;- raw result-store tables under
raw/.
Some optional CSVs may be present but empty when their metric family is not requested. For example,
canonical public-small runs create an empty scale_window_sensitivity.csv.
Local Reference Runs¶
The following runs were produced locally on 2026-04-26 with PYTHONPATH=src python -m
lrdbench.cli.main run <manifest>.
| Suite | Run ID | Per-stratum rows | Benchmark uncertainty rows | Disagreement rows | Failure rows | Leaderboard rows | Suite-specific rows |
|---|---|---|---|---|---|---|---|
public_small_canonical_ground_truth |
2fa8e0ca-6153-48e1-af31-418fcc8fdd80 |
112 | 24 | 40 | 21 | 3 | 0 sensitivity |
public_small_stress_contamination |
460126b8-a619-4e76-ad78-c7b24c4bd195 |
424 | 144 | 148 | 84 | 3 | 180 stress |
public_small_null_false_positive |
e6b97def-bfdb-4496-8d11-fc4bf9e2aac8 |
75 | 15 | 52 | 14 | 3 | 0 sensitivity |
public_small_sensitivity_disagreement |
ac062328-1caf-4075-919e-144342ea89c8 |
156 | 36 | 112 | 50 | 6 | 42 sensitivity |
The row counts above exclude CSV headers. Different package versions may change numeric values or row counts; such changes should be intentional and reflected in the changelog.
Verification Commands¶
lrdbench validate configs/suites/public_small_canonical_ground_truth.yaml
lrdbench validate configs/suites/public_small_stress_contamination.yaml
lrdbench validate configs/suites/public_small_null_false_positive.yaml
lrdbench validate configs/suites/public_small_sensitivity_disagreement.yaml
Run suites one at a time when checking generated reports:
lrdbench run configs/suites/public_small_canonical_ground_truth.yaml