Deborah.DeborahCore.MLSequenceMiddleGBM

Deborah.DeborahCore.MLSequenceMiddleGBM.ml_sequence_MiddleGBMMethod
ml_sequence_MiddleGBM(; 
    model_tag::String,
    X_data::Dict{String, NamedTuple},
    Y_tr_vec::Vector{T},
    Y_bc_vec::Vector{T},
    Y_ul_vec::Vector{T},
    Y_lb_vec::Vector{T},
    tr_conf_arr::Vector{Int},
    bc_conf_arr::Vector{Int},
    ul_conf_arr::Vector{Int},
    partition::DatasetPartitioner.DatasetPartitionInfo,
    X_list::Vector{String},
    paths::PathConfigBuilderDeborah.DeborahPathConfig,
    jobid::Union{Nothing, String}
) -> Tuple{Any, Dict{Symbol, Matrix}} where T<:Real

Train and evaluate a LightGBM model using the JuliaAI/MLJ.jl framework, with additional hyperparameter tuning and diagnostics compared to Deborah.DeborahCore.MLSequenceLightGBM.ml_sequence_LightGBM.

This MiddleGBM variant:

  • scans simple learning curves on the training set,
  • performs random-search tuning over num_iterations, min_data_in_leaf, and learning_rate,
  • logs the best-found hyperparameters and L2 scores to a TOML info file, and
  • optionally saves residual plots for the training, bias-correction, and unlabeled sets when jobid === nothing.

As in the base LightGBM pipeline, it generates predicted $Y$ matrices for training, bias correction, unlabeled, and labeled sets, which can be used for downstream bias-corrected ML estimation and multi-ensemble analyses.

Keyword Arguments

  • model_tag::String : Short model identifier.
  • X_data::Dict{String, NamedTuple} : Input feature dictionary. Each key maps to a NamedTuple with vectors for :tr, :bc, :ul, :lb.
  • Y_tr_vec::Vector{T} : Target vector for the training set.
  • Y_bc_vec::Vector{T} : Target vector for the bias-correction set.
  • Y_ul_vec::Vector{T} : Target vector for the unlabeled set.
  • Y_lb_vec::Vector{T} : Target vector for the full labeled set.
  • tr_conf_arr::Vector{Int} : Row-wise configuration index mapping for the training set.
  • bc_conf_arr::Vector{Int} : Row-wise configuration index mapping for the bias-correction set.
  • ul_conf_arr::Vector{Int} : Row-wise configuration index mapping for the unlabeled set.
  • partition::DatasetPartitioner.DatasetPartitionInfo : Configuration and counts for dataset partitioning.
  • X_list::Vector{String} : Ordered list of feature names to be used.
  • paths::PathConfigBuilderDeborah.DeborahPathConfig : Contains path strings for saving logs and analysis output.
  • jobid::Union{Nothing, String} : Optional job tag for logging. If nothing, additional diagnostic plots are written.

Returns

  • Tuple{Any, Dict{Symbol, Matrix}}
    • mach : Trained JuliaAI/MLJ.jl machine wrapping the tuned LightGBM model (or nothing if training is skipped).
    • Y_mats : Dictionary of dense matrices with keys:
      • :Y_tr → true $Y$ on the training set
      • :Y_bc → true $Y$ on the bias-correction set
      • :Y_ul → true $Y$ on the unlabeled set
      • :Y_lb → true $Y$ on the full labeled set
      • :YP_tr → predicted $Y$ on the training set
      • :YP_bc → predicted $Y$ on the bias-correction set
      • :YP_ul → predicted $Y$ on the unlabeled set
source
Deborah.DeborahCore.MLSequenceMiddleGBM.run_MiddleGBM_learning_curvesFunction
run_MiddleGBM_learning_curves(
    model::LGBMRegressor,
    features::NamedTuple,
    targets::Vector{Float64},
    res_dir::String,
    outsuffix::String,
    jobid::Union{Nothing, String}=nothing
) -> Nothing

Generate and save learning curve plots for the given LightGBM model with hyperparameter sweeps for num_iterations, learning_rate, and min_data_in_leaf.

This function evaluates model performance via cross-validation across a range of hyperparameter values, and saves the results as cropped PDF plots in res_dir.

Arguments

  • model::LGBMRegressor : JuliaAI/MLJ.jl-compatible model object.
  • features::NamedTuple : Feature table (NamedTuple) used for model training.
  • targets::Vector{Float64} : Corresponding target values for regression.
  • res_dir::String : Directory in which to save the resulting plots.
  • outsuffix::String : Filename suffix to differentiate output.
  • jobid::Union{Nothing, String} : Optional job ID for structured logging.

Output Files

  • PDF plots are saved to res_dir with filenames:
    • optimal_boosting_stage_$(outsuffix).pdf
    • optimal_learning_rate_$(outsuffix).pdf
    • optimal_min_data_in_leaf_$(outsuffix).pdf

Notes

Returns

  • Nothing
source
Deborah.DeborahCore.MLSequenceMiddleGBM.save_MiddleGBM_plotMethod
save_MiddleGBM_plot(
    predictions::AbstractVector, 
    ground_truth::AbstractVector, 
    filepath::String,
    res_dir::String
) -> Nothing

Generate and save a residual plot comparing predictions to ground truth values.

This function computes the relative residuals:

\[ \frac{\text{prediction} - \text{truth}}{\text{truth}}\]

for each configuration index and visualizes the error trend. The resulting plot is saved at the given filepath. If the file is a PDF, pdfcrop is automatically applied to remove extra margins.

Arguments

  • predictions::AbstractVector : Model predictions for a given target set.
  • ground_truth::AbstractVector : True target values corresponding to the predictions.
  • filepath::String : Full file path (including filename and extension) to save the plot. Should typically end with .pdf or .png.
  • res_dir::String : Directory where auxiliary output/logging may be stored.

Output

  • A residual plot saved to the specified filepath.

Notes

  • Uses PyPlot.jl for plotting.
  • If filepath ends in .pdf, pdfcrop is run automatically to trim whitespace.
  • If the output directory (res_dir) does not exist, it should be created before calling this function.

Returns

  • Nothing
source