Skip to content

Frequently Asked Questions

Installation and dependencies

lrdbench run fails with ModuleNotFoundError: No module named 'matplotlib'

Install the reporting extras:

pip install "lrdbench[reports]"

If you use data-driven estimators (Random Forest, SVR, CNN, LSTM), also install:

pip install "lrdbench[ml,nn,reports]"

See Installation for the full extras matrix.

Manifest errors

Manifest validation error: unknown top-level manifest keys

lrdbench validates manifests strictly. Only the keys listed in Benchmark protocol are allowed at the top level. Common mistakes:

  • Typos like estimator instead of estimators.
  • Putting estimator-specific keys (e.g. min_scale) at the top level instead of under params.

Run lrdbench validate my_manifest.yaml to see the exact offending key.

estimator 'X' must declare target_estimand

Every estimator entry in a manifest must include target_estimand. This is a deliberate design choice: the framework refuses to guess what an estimator is trying to measure. Example:

estimators:
  - name: DFA
    family: temporal
    target_estimand: hurst_scaling_proxy
    params:
      min_scale: 4
      max_scale: 64

Estimation failures

My estimator returns all invalid / NaN results

Check the signal length against the estimator's minimum requirements. In the result store, read tables/failures.csv to see per-estimator invalid counts.

Common causes:

  • Short series: Many estimators need at least 64–128 samples. The aggregation estimators (AbsoluteMoment, Variance, VarianceResidual) and wavelet estimators are especially sensitive to short records.
  • Constant or zero-variance series: RS and spectral estimators return invalid when the standard deviation is near zero.
  • All-NaN input: Observational loaders drop NaNs; if the result is empty, every estimator will fail.

Why do bootstrap confidence intervals look very wide?

The default block length is max(4, n // 10). For long-memory series this is a pragmatic compromise, but it may be too short for very persistent processes or too long for short records. You can override it per estimator:

estimators:
  - name: DFA
    params:
      n_bootstrap: 200
      bootstrap_block_len: 32

See Benchmark protocol for more on uncertainty blocks.

Reproducibility

How do I know if my run reproduced correctly?

Use the output contract validator:

lrdbench validate-output reports/<run_id>

This checks that all required files and columns are present. For full reproducibility, keep the manifest, the package version, and the global seed. Every run writes manifest/environment.json inside the report directory with exact versions.

Can I re-use estimates from a previous run?

Yes. Enable the estimate cache in the manifest:

execution:
  estimate_cache_dir: .lrdbench_cache
  cache_read: true
  cache_write: true

The cache key is a hash of the series values, estimator name, and parameter schema, so identical inputs will skip re-computation.

Customisation

How do I add my own estimator without forking the repository?

Use the third-party plugin workflow. Set an environment variable pointing to your Python module:

export LRD_BENCH_ESTIMATOR_PLUGIN=my_package.my_estimators
lrdbench run my_manifest.yaml

See Third-party estimator workflow for details.

Can I benchmark on my own CSV data?

Yes. Use observational mode with a csv_series_index source:

mode: observational
source:
  type: csv_series_index
  series:
    - file: data/sensor_1.csv
      column: amplitude
      record_id: sensor_1

See Observational data tutorial and examples/quickstart_observational.py.