Deborah.DeborahCore

Deborah.DeborahCoreModule
module DeborahCore

Deborah.DeborahCore — Bias-corrected ML pipeline.

Deborah.DeborahCore provides the end-to-end workflow to learn regression models for trace-like observables, apply bias correction, and materialize study-ready artifacts (vectorized X/Y bundles, and summary tables). It orchestrates configuration parsing, path/name construction, dataset partitioning, feature/target preparation, multiple ML backends (Ridge/Lasso/LightGBM variants), and result writing/printing.

Scope & Responsibilities

  • Configuration & paths: parse a single TOML into strongly-typed structs and construct reproducible analysis paths and filenames.
  • Data preparation: split labeled data into training/bias-correction sets; vectorize X/Y bundles for ML; emit LB/TR/BC/UL artifacts for inspection and reuse.
  • Model training: Run baseline and machine-learning training sequences. Ridge and Lasso are provided via JuliaAI/MLJ.jl. LightGBM is available either through JuliaAI/MLJ.jl or via PyCall.jl. The JuliaAI/MLJ.jl-based LightGBM branch (internally referred to as MiddleGBM) additionally supports optional hyperparameter scanning/tuning and per-split evaluation (TR/BC/UL).
  • Outputs & reporting: write machine predictions (flattened/matrix forms), residual plots (MiddleGBM option), and jackknife/bootstrap summaries for downstream tools.

Key Components

Public API (typical entry points)

Minimal Usage

julia> using Deborah
julia> run_DeborahWizard()
julia> run_Deborah("config_Deborah.toml")

Notes

  • Splits: labeled set (LB) is partitioned into TR (training) and BC (bias correction) according to LBP and TRP; UL is the remaining unlabeled set.
  • Shapes: ML expects flattened vectors; helper utils convert between flattened and $N_\text{cnf} \times N_\text{src}$ matrices for analysis and file dumps.
  • MiddleGBM (LightGBM): can auto-produce learning curves and residual plots; ensure plotting dependencies are available when jobid === nothing.
  • PyCall.jl backend: requires Python LightGBM available to the PyCall.jl environment.

See Also

source