Architecture¶

This page describes the internal structure of lrdbench for contributors and advanced users. If you only want to run benchmarks, the Quickstart is the better starting point.

The benchmark loop¶

Every benchmark run follows the same orchestration path:

Manifest (YAML)
    │
    ▼
Record materialisation  ←  generators  |  observational sources
    │
    ▼
Optional ML training  ←  ml_training block (data-driven estimators only)
    │
    ▼
Estimation  ←  estimator registry (+ optional third-party plugins)
    │
    ▼
Evaluation  ←  mode-specific evaluator (ground_truth / stress_test / observational)
    │
    ▼
Leaderboards  ←  weighted-rank builder
    │
    ▼
Result store  ←  CSV files under reports/<run_id>/
    │
    ▼
Reports  ←  HTML / CSV / LaTeX / figures

BenchmarkRunner.run() in runner.py is the single entry point that wires these stages together. Each stage is implemented by a dedicated module so that individual pieces can be tested and replaced in isolation.

Key modules¶

Module	Responsibility
`cli.main`	Argparse front-end (`run`, `validate`, `list-*`, `validate-output`).
`manifest`	Load and parse YAML manifests into `BenchmarkManifest` dataclasses.
`runner`	`BenchmarkRunner` — orchestrates the full loop and collects `plugin_provenance`.
`execution`	`collect_fit_jobs` + `run_fit_jobs` — manages the `(record × estimator)` grid, optional thread pools, and on-disk estimate caches.
`evaluator`	`GroundTruthEvaluator`, `StressTestEvaluator` (shares GT), `ObservationalEvaluator` — compute metrics from records and estimates.
`leaderboard`	`WeightedRankLeaderboardBuilder` — composes metric columns into ranked rows.
`reporter`	`SimpleHtmlCsvReporter` — renders HTML, CSV, LaTeX, and matplotlib figures.
`result_store`	`CsvResultStore` — persists raw records, estimates, metrics, leaderboards, and artefacts as CSV/JSON.
`registries`	`EstimatorRegistry`, `GeneratorRegistry`, `ContaminationRegistry` — look-up tables for pluggable components.
`plugin_loader`	Safe, failure-transparent loading of third-party estimator plugins via environment variables.
`schema`	Immutable dataclasses (`SeriesRecord`, `EstimateResult`, `MetricSpec`, …) that form the public data contract.
`interfaces`	Abstract base classes (`BaseEstimator`, `BaseGenerator`, `BaseContamination`, …) that define extension points.

Extension points¶

Adding a generator¶

Subclass BaseGenerator and implement family, version, and generate().
Register it in defaults.build_default_generator_registry().
Reference the family name in a manifest source.generator_grid block.

Adding a contamination operator¶

Subclass BaseContamination and implement name, family, version, and apply().
Register it in defaults.build_default_contamination_registry().
Reference the operator name in a manifest contamination.operators block.

Adding an estimator¶

Subclass BaseEstimator and implement spec and fit().
Register it in defaults.build_default_estimator_registry() or use the third-party plugin workflow.
Declare it in a manifest estimators block with name, family, target_estimand, and optional params.

For a complete walkthrough, see Adding estimators.

Provenance and reproducibility¶

Every synthetic record carries a ProvenanceRecord with: - a stable record_id (SHA-1 hash of manifest id, family, parameters, and replicate index), - the generator seed derived deterministically from the manifest's global_seed, - timestamps and software version metadata.

Contaminated records extend this history via contamination_history, preserving the clean parent id, operator name, parameters, and severity label.

Every run writes manifest/environment.json containing Python version, platform, package versions, seed policy, and execution settings. This makes a report self-describing: given the manifest, the package version, and the data sources, the run should be bitwise reproducible.

Output contract¶

The framework enforces a machine-readable output contract (configs/contracts/public_output_contract.json). lrdbench validate-output <run_root> checks that: - all required files exist, - all required CSV columns are present, - the contract version matches the expected schema.

Any change that adds, removes, or renames output columns must update this contract and the contract version.