Deborah.DeborahCore.MLInputPreparer

Deborah.DeborahCore.MLInputPreparer.MLInputBundle — Type

struct MLInputBundle{T<:Real}

Container for machine learning input and target data used in the Deborah.DeborahCore pipeline.

This struct stores the full feature set X_data, target vectors Y_*_vec for each partition (training set, bias correction set, unlabeled set, and labeled set), the raw data matrix Y_df, and the corresponding configuration index arrays used to assemble them.

Type Parameters

T<:Real : Element type of all target vectors and matrices (typically Float64).

Fields

X_data::Dict{String, NamedTuple} : Preprocessed feature dictionary, keyed by filename.
Y_df::Matrix{T} : Original raw $Y$ matrix ($N_{\text{cfg}} \times N_{\text{src}}$).
Y_tr_vec::Vector{T} : Flattened $Y$ vector for training set.
Y_bc_vec::Vector{T} : Flattened $Y$ vector for bias-correction set.
Y_ul_vec::Vector{T} : Flattened $Y$ vector for unlabeled set.
Y_lb_vec::Vector{T} : Flattened $Y$ vector for labeled set.
conf_arr::Vector{Int} : Mapping from global row index to configuration index.
tr_conf_arr::Vector{Int} : Row indices used for training set $Y$.
bc_conf_arr::Vector{Int} : Row indices used for bias-correction set $Y$.
ul_conf_arr::Vector{Int} : Row indices used for unlabeled set $Y$.
lb_conf_arr::Vector{Int} : Row indices used for labeled set $Y$.

source

Deborah.DeborahCore.MLInputPreparer.prepare_ML_inputs — Method

prepare_ML_inputs(
    partition::DatasetPartitioner.DatasetPartitionInfo, 
    X_file_list::Vector{String}, 
    Y_file::String, 
    paths::PathConfigBuilderDeborah.DeborahPathConfig; 
    jobid::Union{Nothing, String}=nothing,
    dump::Bool=false, 
    read_column_X::Vector{Int},
    read_column_Y::Int,
    index_column::Int
) -> MLInputBundle

Load and organize all machine learning input data from raw .dat files.

This function loads the target data (Y_file) and a list of input feature files (X_file_list), extracts specific columns from each using read_column_X and read_column_Y, applies dataset partitioning according to the partition object, and returns the labeled and unlabeled splits of features and targets in a structured format suitable for training and evaluation.

Arguments

partition::DatasetPartitioner.DatasetPartitionInfo Struct defining how configuration indices are split into lb, tr, bc, and ul sets.
X_file_list::Vector{String} List of feature file names, e.g., ["plaq.dat", "rect.dat"].
Y_file::String Name of the target file to be used as Y.
paths::PathConfigBuilderDeborah.DeborahPathConfig Struct with directory and filename conventions used for reading/writing data.
jobid::Union{Nothing, String} Optional identifier used for structured logging or job tracking.
dump::Bool = false Whether to save preprocessed X feature vectors into disk files.
read_column_X::Vector{Int} A vector of $1$-based column indices, one for each feature file in X_file_list. read_column_X[i] is used to select a column from file X_file_list[i].
read_column_Y::Int $1$-based column index to extract from the Y_file.
index_column::Int $1$-based column index from which to read the configuration indices in the Y_file. If set to 0, configuration indices will be auto-generated as 1:N_cnf.

Returns

Deborah.DeborahCore.MLInputPreparer.MLInputBundle Composite struct containing:
- X_dict::Dict{String, NamedTuple} → partitioned input feature vectors (:lb, :tr, :bc, :ul)
- Y_lb, Y_tr, Y_bc, Y_ul → target label vectors
- configuration index arrays for each group

source