Deborah.DeborahCore.MLSequenceMiddleGBM
Deborah.DeborahCore.MLSequenceMiddleGBM.ml_sequence_MiddleGBM — Methodml_sequence_MiddleGBM(;
model_tag::String,
X_data::Dict{String, NamedTuple},
Y_tr_vec::Vector{T},
Y_bc_vec::Vector{T},
Y_ul_vec::Vector{T},
Y_lb_vec::Vector{T},
tr_conf_arr::Vector{Int},
bc_conf_arr::Vector{Int},
ul_conf_arr::Vector{Int},
partition::DatasetPartitioner.DatasetPartitionInfo,
X_list::Vector{String},
paths::PathConfigBuilderDeborah.DeborahPathConfig,
jobid::Union{Nothing, String}
) -> Tuple{Any, Dict{Symbol, Matrix}} where T<:RealTrain and evaluate a LightGBM model using the JuliaAI/MLJ.jl framework, with additional hyperparameter tuning and diagnostics compared to Deborah.DeborahCore.MLSequenceLightGBM.ml_sequence_LightGBM.
This MiddleGBM variant:
- scans simple learning curves on the training set,
- performs random-search tuning over
num_iterations,min_data_in_leaf, andlearning_rate, - logs the best-found hyperparameters and L2 scores to a
TOMLinfo file, and - optionally saves residual plots for the training, bias-correction, and unlabeled sets when
jobid === nothing.
As in the base LightGBM pipeline, it generates predicted $Y$ matrices for training, bias correction, unlabeled, and labeled sets, which can be used for downstream bias-corrected ML estimation and multi-ensemble analyses.
Keyword Arguments
model_tag::String: Short model identifier.X_data::Dict{String, NamedTuple}: Input feature dictionary. Each key maps to aNamedTuplewith vectors for:tr,:bc,:ul,:lb.Y_tr_vec::Vector{T}: Target vector for the training set.Y_bc_vec::Vector{T}: Target vector for the bias-correction set.Y_ul_vec::Vector{T}: Target vector for the unlabeled set.Y_lb_vec::Vector{T}: Target vector for the full labeled set.tr_conf_arr::Vector{Int}: Row-wise configuration index mapping for the training set.bc_conf_arr::Vector{Int}: Row-wise configuration index mapping for the bias-correction set.ul_conf_arr::Vector{Int}: Row-wise configuration index mapping for the unlabeled set.partition::DatasetPartitioner.DatasetPartitionInfo: Configuration and counts for dataset partitioning.X_list::Vector{String}: Ordered list of feature names to be used.paths::PathConfigBuilderDeborah.DeborahPathConfig: Contains path strings for saving logs and analysis output.jobid::Union{Nothing, String}: Optional job tag for logging. Ifnothing, additional diagnostic plots are written.
Returns
Tuple{Any, Dict{Symbol, Matrix}}mach: TrainedJuliaAI/MLJ.jlmachine wrapping the tunedLightGBMmodel (ornothingif training is skipped).Y_mats: Dictionary of dense matrices with keys::Y_tr→ true $Y$ on the training set:Y_bc→ true $Y$ on the bias-correction set:Y_ul→ true $Y$ on the unlabeled set:Y_lb→ true $Y$ on the full labeled set:YP_tr→ predicted $Y$ on the training set:YP_bc→ predicted $Y$ on the bias-correction set:YP_ul→ predicted $Y$ on the unlabeled set
Deborah.DeborahCore.MLSequenceMiddleGBM.run_MiddleGBM_learning_curves — Functionrun_MiddleGBM_learning_curves(
model::LGBMRegressor,
features::NamedTuple,
targets::Vector{Float64},
res_dir::String,
outsuffix::String,
jobid::Union{Nothing, String}=nothing
) -> NothingGenerate and save learning curve plots for the given LightGBM model with hyperparameter sweeps for num_iterations, learning_rate, and min_data_in_leaf.
This function evaluates model performance via cross-validation across a range of hyperparameter values, and saves the results as cropped PDF plots in res_dir.
Arguments
model::LGBMRegressor:JuliaAI/MLJ.jl-compatible model object.features::NamedTuple: Feature table (NamedTuple) used for model training.targets::Vector{Float64}: Corresponding target values for regression.res_dir::String: Directory in which to save the resulting plots.outsuffix::String: Filename suffix to differentiate output.jobid::Union{Nothing, String}: Optional job ID for structured logging.
Output Files
- PDF plots are saved to
res_dirwith filenames:optimal_boosting_stage_$(outsuffix).pdfoptimal_learning_rate_$(outsuffix).pdfoptimal_min_data_in_leaf_$(outsuffix).pdf
Notes
- Plotting is performed only when
jobid === nothing. - When plotting,
PyPlot.jl+ $\LaTeX$ style is configured via:Deborah.Rebekah.PyPlotLaTeX.set_pyplot_latex_style.
Returns
Nothing
Deborah.DeborahCore.MLSequenceMiddleGBM.save_MiddleGBM_plot — Methodsave_MiddleGBM_plot(
predictions::AbstractVector,
ground_truth::AbstractVector,
filepath::String,
res_dir::String
) -> NothingGenerate and save a residual plot comparing predictions to ground truth values.
This function computes the relative residuals:
\[ \frac{\text{prediction} - \text{truth}}{\text{truth}}\]
for each configuration index and visualizes the error trend. The resulting plot is saved at the given filepath. If the file is a PDF, pdfcrop is automatically applied to remove extra margins.
Arguments
predictions::AbstractVector: Model predictions for a given target set.ground_truth::AbstractVector: True target values corresponding to the predictions.filepath::String: Full file path (including filename and extension) to save the plot. Should typically end with.pdfor.png.res_dir::String: Directory where auxiliary output/logging may be stored.
Output
- A residual plot saved to the specified
filepath.
Notes
- Uses
PyPlot.jlfor plotting. - If
filepathends in.pdf,pdfcropis run automatically to trim whitespace. - If the output directory (
res_dir) does not exist, it should be created before calling this function.
Returns
Nothing