Deborah.DeborahCore.FeaturePipeline
Deborah.DeborahCore.FeaturePipeline.build_namedtuple_splitset — Functionbuild_namedtuple_splitset(
X_data::Dict{String, NamedTuple},
split::String,
key_order::Vector{String},
jobid::Union{Nothing, String} = nothing
) -> NamedTupleConstruct a NamedTuple of feature vectors corresponding to a given data split.
This function extracts the specified data split (tr, bc, ul, and lb) from each entry in the input feature dictionary and combines them into a column-indexed NamedTuple, suitable for JuliaAI/MLJ.jl model input.
Arguments
X_data::Dict{String, NamedTuple}: A dictionary mapping input keys to 4-way split feature tuples (lb,tr,bc,ul).split::String: Which dataset split to extract (tr,bc,ul,lb).key_order::Vector{String}: The order of keys to use when building columns.jobid::Union{Nothing, String}: Optional identifier string for logging or debugging purposes.
Returns
NamedTuple: A tuple with keys:Column1,:Column2, ... containing the feature vectors for the specified split.
Deborah.DeborahCore.FeaturePipeline.run_feature_pipeline — Methodrun_feature_pipeline(
read_column_X::Vector{Int},
keys::Vector{String},
path::String,
conf_arr::Vector{Int},
partition::DatasetPartitioner.DatasetPartitionInfo,
paths::PathConfigBuilderDeborah.DeborahPathConfig;
dump::Bool=true,
jobid::Union{Nothing, String}=nothing
) -> Dict{String, NamedTuple{(:lb, :tr, :bc, :ul), NTuple{4, Vector{Float64}}}}Run the feature preprocessing pipeline across multiple input feature files.
For each feature key (e.g., "plaq.dat", "rect.dat"), this function:
- Loads the corresponding raw
.datfile as a matrix, - Extracts the column specified in
read_column_X[i], - Partitions the data into four groups (
lb,tr,bc,ul), - Optionally dumps each partition to disk.
This version allows specifying a separate column index for each input feature file.
Arguments
read_column_X::Vector{Int}A vector of $1$-based column indices, one for each feature inkeys.read_column_X[i]is used to extract a column from the feature filekeys[i]. Each column is treated as an independent scalar input feature.keys::Vector{String}List of feature file base names, such as["plaq.dat", "rect.dat"].path::StringDirectory path containing the raw.datfeature files.conf_arr::Vector{Int}Configuration indices associated with rows in the feature files.partition::DatasetPartitioner.DatasetPartitionInfoStruct containing index vectors that define the four partitions:lb: labeled settr: training setbc: bias correction setul: unlabeled set
paths::PathConfigBuilderDeborah.DeborahPathConfigStruct containing global path settings, such as.analysis_dir,.overall_name.dump::Bool = trueIf true, each split feature vector is saved to disk as.datfiles.jobid::Union{Nothing, String}Optional identifier string for logging or debugging purposes.
Returns
Dict{String, NamedTuple{(:lb, :tr, :bc, :ul), NTuple{4, Vector{Float64}}}}Dictionary mapping each feature name to aNamedTupleof split vectors.