Deborah.DeborahCore.MLInputPreparer
Deborah.DeborahCore.MLInputPreparer.MLInputBundle — Typestruct MLInputBundle{T<:Real}Container for machine learning input and target data used in the Deborah.DeborahCore pipeline.
This struct stores the full feature set X_data, target vectors Y_*_vec for each partition (training set, bias correction set, unlabeled set, and labeled set), the raw data matrix Y_df, and the corresponding configuration index arrays used to assemble them.
Type Parameters
T<:Real: Element type of all target vectors and matrices (typicallyFloat64).
Fields
X_data::Dict{String, NamedTuple}: Preprocessed feature dictionary, keyed by filename.Y_df::Matrix{T}: Original raw $Y$ matrix ($N_{\text{cfg}} \times N_{\text{src}}$).Y_tr_vec::Vector{T}: Flattened $Y$ vector for training set.Y_bc_vec::Vector{T}: Flattened $Y$ vector for bias-correction set.Y_ul_vec::Vector{T}: Flattened $Y$ vector for unlabeled set.Y_lb_vec::Vector{T}: Flattened $Y$ vector for labeled set.conf_arr::Vector{Int}: Mapping from global row index to configuration index.tr_conf_arr::Vector{Int}: Row indices used for training set $Y$.bc_conf_arr::Vector{Int}: Row indices used for bias-correction set $Y$.ul_conf_arr::Vector{Int}: Row indices used for unlabeled set $Y$.lb_conf_arr::Vector{Int}: Row indices used for labeled set $Y$.
Deborah.DeborahCore.MLInputPreparer.prepare_ML_inputs — Methodprepare_ML_inputs(
partition::DatasetPartitioner.DatasetPartitionInfo,
X_file_list::Vector{String},
Y_file::String,
paths::PathConfigBuilderDeborah.DeborahPathConfig;
jobid::Union{Nothing, String}=nothing,
dump::Bool=false,
read_column_X::Vector{Int},
read_column_Y::Int,
index_column::Int
) -> MLInputBundleLoad and organize all machine learning input data from raw .dat files.
This function loads the target data (Y_file) and a list of input feature files (X_file_list), extracts specific columns from each using read_column_X and read_column_Y, applies dataset partitioning according to the partition object, and returns the labeled and unlabeled splits of features and targets in a structured format suitable for training and evaluation.
Arguments
partition::DatasetPartitioner.DatasetPartitionInfoStruct defining how configuration indices are split intolb,tr,bc, andulsets.X_file_list::Vector{String}List of feature file names, e.g.,["plaq.dat", "rect.dat"].Y_file::StringName of the target file to be used asY.paths::PathConfigBuilderDeborah.DeborahPathConfigStruct with directory and filename conventions used for reading/writing data.jobid::Union{Nothing, String}Optional identifier used for structured logging or job tracking.dump::Bool = falseWhether to save preprocessedXfeature vectors into disk files.read_column_X::Vector{Int}A vector of $1$-based column indices, one for each feature file inX_file_list.read_column_X[i]is used to select a column from fileX_file_list[i].read_column_Y::Int$1$-based column index to extract from theY_file.index_column::Int$1$-based column index from which to read the configuration indices in theY_file. If set to0, configuration indices will be auto-generated as1:N_cnf.
Returns
Deborah.DeborahCore.MLInputPreparer.MLInputBundleComposite struct containing:X_dict::Dict{String, NamedTuple}→ partitioned input feature vectors (:lb,:tr,:bc,:ul)Y_lb,Y_tr,Y_bc,Y_ul→ target label vectors- configuration index arrays for each group