Deborah.DeborahCore.BaselineSequence
Deborah.DeborahCore.BaselineSequence.baseline_sequence — Functionbaseline_sequence(
ML_inputs::MLInputPreparer.MLInputBundle,
partition::DatasetPartitioner.DatasetPartitionInfo,
paths::PathConfigBuilderDeborah.DeborahPathConfig,
jobid::Union{Nothing, String}=nothing;
read_column_Y::Int,
dump::Bool=true
) -> Dict{Symbol, Matrix{T}} where T<:RealConstruct a baseline prediction sequence where inputs and targets are identical, i.e., $X = Y = Y^P$, using the specified observable column from the data.
Arguments
ML_inputs::MLInputPreparer.MLInputBundleStruct containing:Y_df::Matrix{T}: Full observable matrix ($N_\text{cnf} \times N_\text{src}$).conf_arr::Vector{Int}: Configuration index array.
partition::DatasetPartitionInfoContains partition indices for:lb,:tr,:bc, and:ul.paths::DeborahPathConfigContains output directory information for saving results.jobid::Union{Nothing, String}Optional identifier for logging or batch tracking.
Keyword Arguments
read_column_Y::Int$1$-based column index to extract fromY_df.dump::Bool = trueIf true, saves all generatedY_*andYP_*vectors to disk.
Returns
Dict{Symbol, Matrix{T}}Dictionary containing entries like:Y_bc,:YP_bc, etc., where each is a $1$-column matrix ($N \times 1$) corresponding to a partition.
Behavior
- For each partition tag (e.g.,
:tr,:ul, etc.), generates:Y_tag: extracted fromY_df[:, read_column_Y]YP_tag: identical copy ofY_tag
- Does not perform any training or randomization – this is a deterministic identity baseline.
- Outputs are structurally identical to those of full ML pipelines, making them suitable for direct performance comparisons.
Notes
- Internally calls
Deborah.Sarah.XYInfoGenerator.gen_X_info,Deborah.DeborahCore.XYMLInfoGenerator.gen_XY_ML_info,Deborah.DeborahCore.XYMLVectorizer.gen_XY_ML, andDeborah.DeborahCore.XYMLVectorizer.mat_XY_ML, using shared logic from ML pipelines.
Deborah.DeborahCore.BaselineSequence.random_sequence — Functionrandom_sequence(
ML_inputs::MLInputPreparer.MLInputBundle,
partition::DatasetPartitioner.DatasetPartitionInfo,
paths::PathConfigBuilderDeborah.DeborahPathConfig,
ranseed::Int,
jobid::Union{Nothing, String}=nothing;
read_column_Y::Int,
dump::Bool=true
) -> Dict{Symbol, Matrix{T}} where T<:RealGenerate a synthetic prediction sequence using Gaussian noise with jackknife-estimated variance, based on the input observable matrix.
This function is used primarily to create random baseline predictions for machine learning workflows, especially when training is skipped or for testing stochastic behavior.
Arguments
ML_inputs::MLInputPreparer.MLInputBundleStruct containing input matrixY_df::Matrix{T}and configuration arrayconf_arr::Vector{Int}.Y_dfshould be 2D ($N_\text{cnf} \times N_\text{src}$); a single column is selected usingread_column_Y.partition::DatasetPartitionInfoStruct defining data partitions:lb_idx,tr_idx,bc_idx,ul_idx. These are used to split the data into labeled, training, bias-correction, and unlabeled sets.paths::DeborahPathConfigConfiguration for output filenames and directories.ranseed::IntSeed for RNG to ensure reproducibility.jobid::Union{Nothing, String}Optional job identifier for logging (e.g., withDeborah.Sarah.JobLoggerTools.println_benji).
Keyword Arguments
read_column_Y::Int$1$-based column index to select fromY_df.dump::Bool=trueIf true, generated vectors are saved to disk in standardY_*/YP_*format.
Behavior
- Computes jackknife-based standard deviation per partition using the selected column from
Y_df. - For partitions used in prediction (
YP_*), generates Gaussian random values with mean zero and matching standard deviation. - Partitions not involved in prediction (e.g.,
:lb, sometimes:bc) receive empty vectors. - Output is a dictionary with keys
:Y_lb,:Y_tr, ...,:YP_tr,:YP_ul, etc.
Returns
Dict{Symbol, Matrix{T}}Dictionary mapping tags like:Y_bc,:YP_bc, etc., to corresponding 1-column matrices ($N \times 1$).