Deborah.DeborahCore.MLSequenceMiddleGBM

Deborah.DeborahCore.MLSequenceMiddleGBM.ml_sequence_MiddleGBM — Method

ml_sequence_MiddleGBM(; 
    model_tag::String,
    X_data::Dict{String, NamedTuple},
    Y_tr_vec::Vector{T},
    Y_bc_vec::Vector{T},
    Y_ul_vec::Vector{T},
    Y_lb_vec::Vector{T},
    tr_conf_arr::Vector{Int},
    bc_conf_arr::Vector{Int},
    ul_conf_arr::Vector{Int},
    partition::DatasetPartitioner.DatasetPartitionInfo,
    X_list::Vector{String},
    paths::PathConfigBuilderDeborah.DeborahPathConfig,
    jobid::Union{Nothing, String}
) -> Tuple{Any, Dict{Symbol, Matrix}} where T<:Real

Train and evaluate a LightGBM model using the JuliaAI/MLJ.jl framework, with additional hyperparameter tuning and diagnostics compared to Deborah.DeborahCore.MLSequenceLightGBM.ml_sequence_LightGBM.

This MiddleGBM variant:

scans simple learning curves on the training set,
performs random-search tuning over num_iterations, min_data_in_leaf, and learning_rate,
logs the best-found hyperparameters and L2 scores to a TOML info file, and
optionally saves residual plots for the training, bias-correction, and unlabeled sets when jobid === nothing.

As in the base LightGBM pipeline, it generates predicted $Y$ matrices for training, bias correction, unlabeled, and labeled sets, which can be used for downstream bias-corrected ML estimation and multi-ensemble analyses.

Keyword Arguments

model_tag::String : Short model identifier.
X_data::Dict{String, NamedTuple} : Input feature dictionary. Each key maps to a NamedTuple with vectors for :tr, :bc, :ul, :lb.
Y_tr_vec::Vector{T} : Target vector for the training set.
Y_bc_vec::Vector{T} : Target vector for the bias-correction set.
Y_ul_vec::Vector{T} : Target vector for the unlabeled set.
Y_lb_vec::Vector{T} : Target vector for the full labeled set.
tr_conf_arr::Vector{Int} : Row-wise configuration index mapping for the training set.
bc_conf_arr::Vector{Int} : Row-wise configuration index mapping for the bias-correction set.
ul_conf_arr::Vector{Int} : Row-wise configuration index mapping for the unlabeled set.
partition::DatasetPartitioner.DatasetPartitionInfo : Configuration and counts for dataset partitioning.
X_list::Vector{String} : Ordered list of feature names to be used.
paths::PathConfigBuilderDeborah.DeborahPathConfig : Contains path strings for saving logs and analysis output.
jobid::Union{Nothing, String} : Optional job tag for logging. If nothing, additional diagnostic plots are written.

Returns

Tuple{Any, Dict{Symbol, Matrix}}
- mach : Trained JuliaAI/MLJ.jl machine wrapping the tuned LightGBM model (or nothing if training is skipped).
- Y_mats : Dictionary of dense matrices with keys:
  - :Y_tr → true $Y$ on the training set
  - :Y_bc → true $Y$ on the bias-correction set
  - :Y_ul → true $Y$ on the unlabeled set
  - :Y_lb → true $Y$ on the full labeled set
  - :YP_tr → predicted $Y$ on the training set
  - :YP_bc → predicted $Y$ on the bias-correction set
  - :YP_ul → predicted $Y$ on the unlabeled set

source

Deborah.DeborahCore.MLSequenceMiddleGBM.run_MiddleGBM_learning_curves — Function

run_MiddleGBM_learning_curves(
    model::LGBMRegressor,
    features::NamedTuple,
    targets::Vector{Float64},
    res_dir::String,
    outsuffix::String,
    jobid::Union{Nothing, String}=nothing
) -> Nothing

Generate and save learning curve plots for the given LightGBM model with hyperparameter sweeps for num_iterations, learning_rate, and min_data_in_leaf.

This function evaluates model performance via cross-validation across a range of hyperparameter values, and saves the results as cropped PDF plots in res_dir.

Arguments

model::LGBMRegressor : JuliaAI/MLJ.jl-compatible model object.
features::NamedTuple : Feature table (NamedTuple) used for model training.
targets::Vector{Float64} : Corresponding target values for regression.
res_dir::String : Directory in which to save the resulting plots.
outsuffix::String : Filename suffix to differentiate output.
jobid::Union{Nothing, String} : Optional job ID for structured logging.

Output Files

PDF plots are saved to res_dir with filenames:
- optimal_boosting_stage_$(outsuffix).pdf
- optimal_learning_rate_$(outsuffix).pdf
- optimal_min_data_in_leaf_$(outsuffix).pdf

Notes

Plotting is performed only when jobid === nothing.
When plotting, PyPlot.jl + $\LaTeX$ style is configured via: Deborah.Rebekah.PyPlotLaTeX.set_pyplot_latex_style.

Returns

Nothing

source

Deborah.DeborahCore.MLSequenceMiddleGBM.save_MiddleGBM_plot — Method

save_MiddleGBM_plot(
    predictions::AbstractVector, 
    ground_truth::AbstractVector, 
    filepath::String,
    res_dir::String
) -> Nothing

Generate and save a residual plot comparing predictions to ground truth values.

This function computes the relative residuals:

\[ \frac{\text{prediction} - \text{truth}}{\text{truth}}\]

for each configuration index and visualizes the error trend. The resulting plot is saved at the given filepath. If the file is a PDF, pdfcrop is automatically applied to remove extra margins.

Arguments

predictions::AbstractVector : Model predictions for a given target set.
ground_truth::AbstractVector : True target values corresponding to the predictions.
filepath::String : Full file path (including filename and extension) to save the plot. Should typically end with .pdf or .png.
res_dir::String : Directory where auxiliary output/logging may be stored.

Output

A residual plot saved to the specified filepath.

Notes

Uses PyPlot.jl for plotting.
If filepath ends in .pdf, pdfcrop is run automatically to trim whitespace.
If the output directory (res_dir) does not exist, it should be created before calling this function.

Returns

Nothing

source