Skip to content

Adding Estimators

This guide shows the shortest path for adding and testing an estimator outside the built-in registry.

For built-in supervised baselines such as MLRandomForest, MLSVR, MLCNN, and MLLSTM, see Data-driven estimators. Those estimators are already registered and use the manifest-level ml_training protocol instead of custom registry enrolment.

1. Implement BaseEstimator

Create a class that stores its EstimatorSpec and returns EstimateResult from fit.

from __future__ import annotations

import time
import numpy as np

from lrdbench.interfaces import BaseEstimator
from lrdbench.schema import EstimateResult, EstimatorSpec, SeriesRecord


class MyEstimator(BaseEstimator):
    VERSION = "0.1.0"

    def __init__(self, spec: EstimatorSpec) -> None:
        self._spec = spec

    @property
    def spec(self) -> EstimatorSpec:
        return self._spec

    def fit(self, record: SeriesRecord) -> EstimateResult:
        t0 = time.perf_counter()
        x = np.asarray(record.values, dtype=float)
        if x.size < 32:
            return EstimateResult(
                record_id=record.record_id,
                estimator_name=self.spec.name,
                point=None,
                runtime_seconds=time.perf_counter() - t0,
                valid=False,
                failure_reason="insufficient_signal_for_my_estimator",
                estimator_version=self.VERSION,
            )
        point = float(np.clip(np.var(x) / (np.var(x) + np.var(np.diff(x))), 0.0, 1.0))
        return EstimateResult(
            record_id=record.record_id,
            estimator_name=self.spec.name,
            point=point,
            runtime_seconds=time.perf_counter() - t0,
            valid=True,
            diagnostics={"example_only": True},
            estimator_version=self.VERSION,
        )


def build_my_estimator(spec: EstimatorSpec) -> MyEstimator:
    return MyEstimator(spec)

2. Smoke Test The Estimator

import numpy as np

from lrdbench.testing import estimator_spec, smoke_fit_estimator

spec = estimator_spec(
    name="MyEstimator",
    family="external",
    target_estimand="hurst_scaling_proxy",
    assumptions=("finite_variance",),
    params={"min_n": 32},
)

out = smoke_fit_estimator(
    build_my_estimator(spec),
    np.sin(np.linspace(0.0, 12.0, 128)),
    min_value=0.0,
    max_value=1.0,
)

Also test invalid paths:

from lrdbench.testing import assert_invalid_estimate, synthetic_series_record

out = build_my_estimator(spec).fit(synthetic_series_record([1.0, 2.0]))
assert_invalid_estimate(out, reason_contains="insufficient_signal")

3. Register Programmatically

from lrdbench.registries import EstimatorRegistry
from lrdbench.runner import BenchmarkRunner

registry = EstimatorRegistry()
registry.register("MyEstimator", build_my_estimator)

runner = BenchmarkRunner(estimators=registry)

The manifest estimator entry should use the same name:

estimators:
  - name: MyEstimator
    family: external
    target_estimand: hurst_scaling_proxy
    assumptions: [finite_variance]
    supports_ci: false
    supports_diagnostics: true
    params:
      min_n: 32

4. Contributor Expectations

Before proposing an estimator for the built-in registry, include:

  • a clear target estimand;
  • assumptions and operating regime;
  • parameter defaults;
  • invalid-input behavior;
  • diagnostics, if available;
  • uncertainty behavior, if available;
  • at least one smoke test and one invalid-input test;
  • a short note on known failure modes.

For public benchmark comparisons, report the manifest, estimator metadata, output contract version, and generated artefacts/artefact_index.csv.