Deborah.DeborahCore.BaselineSequence

Deborah.DeborahCore.BaselineSequence.baseline_sequence — Function

baseline_sequence(
    ML_inputs::MLInputPreparer.MLInputBundle,
    partition::DatasetPartitioner.DatasetPartitionInfo, 
    paths::PathConfigBuilderDeborah.DeborahPathConfig, 
    jobid::Union{Nothing, String}=nothing;
    read_column_Y::Int, 
    dump::Bool=true
) -> Dict{Symbol, Matrix{T}} where T<:Real

Construct a baseline prediction sequence where inputs and targets are identical, i.e., $X = Y = Y^P$, using the specified observable column from the data.

Arguments

ML_inputs::MLInputPreparer.MLInputBundle Struct containing:
- Y_df::Matrix{T}: Full observable matrix ($N_\text{cnf} \times N_\text{src}$).
- conf_arr::Vector{Int}: Configuration index array.
partition::DatasetPartitionInfo Contains partition indices for :lb, :tr, :bc, and :ul.
paths::DeborahPathConfig Contains output directory information for saving results.
jobid::Union{Nothing, String} Optional identifier for logging or batch tracking.

Keyword Arguments

read_column_Y::Int $1$-based column index to extract from Y_df.
dump::Bool = true If true, saves all generated Y_* and YP_* vectors to disk.

Returns

Dict{Symbol, Matrix{T}} Dictionary containing entries like :Y_bc, :YP_bc, etc., where each is a $1$-column matrix ($N \times 1$) corresponding to a partition.

Behavior

For each partition tag (e.g., :tr, :ul, etc.), generates:
- Y_tag: extracted from Y_df[:, read_column_Y]
- YP_tag: identical copy of Y_tag
Does not perform any training or randomization – this is a deterministic identity baseline.
Outputs are structurally identical to those of full ML pipelines, making them suitable for direct performance comparisons.

Notes

Internally calls Deborah.Sarah.XYInfoGenerator.gen_X_info, Deborah.DeborahCore.XYMLInfoGenerator.gen_XY_ML_info, Deborah.DeborahCore.XYMLVectorizer.gen_XY_ML, and Deborah.DeborahCore.XYMLVectorizer.mat_XY_ML, using shared logic from ML pipelines.

source

Deborah.DeborahCore.BaselineSequence.random_sequence — Function

random_sequence(
    ML_inputs::MLInputPreparer.MLInputBundle,
    partition::DatasetPartitioner.DatasetPartitionInfo, 
    paths::PathConfigBuilderDeborah.DeborahPathConfig,
    ranseed::Int, 
    jobid::Union{Nothing, String}=nothing; 
    read_column_Y::Int, 
    dump::Bool=true
) -> Dict{Symbol, Matrix{T}} where T<:Real

Generate a synthetic prediction sequence using Gaussian noise with jackknife-estimated variance, based on the input observable matrix.

This function is used primarily to create random baseline predictions for machine learning workflows, especially when training is skipped or for testing stochastic behavior.

Arguments

ML_inputs::MLInputPreparer.MLInputBundle Struct containing input matrix Y_df::Matrix{T} and configuration array conf_arr::Vector{Int}. Y_df should be 2D ($N_\text{cnf} \times N_\text{src}$); a single column is selected using read_column_Y.
partition::DatasetPartitionInfo Struct defining data partitions: lb_idx, tr_idx, bc_idx, ul_idx. These are used to split the data into labeled, training, bias-correction, and unlabeled sets.
paths::DeborahPathConfig Configuration for output filenames and directories.
ranseed::Int Seed for RNG to ensure reproducibility.
jobid::Union{Nothing, String} Optional job identifier for logging (e.g., with Deborah.Sarah.JobLoggerTools.println_benji).

Keyword Arguments

read_column_Y::Int $1$-based column index to select from Y_df.
dump::Bool=true If true, generated vectors are saved to disk in standard Y_*/YP_* format.

Behavior

Computes jackknife-based standard deviation per partition using the selected column from Y_df.
For partitions used in prediction (YP_*), generates Gaussian random values with mean zero and matching standard deviation.
Partitions not involved in prediction (e.g., :lb, sometimes :bc) receive empty vectors.
Output is a dictionary with keys :Y_lb, :Y_tr, ..., :YP_tr, :YP_ul, etc.

Returns

Dict{Symbol, Matrix{T}} Dictionary mapping tags like :Y_bc, :YP_bc, etc., to corresponding 1-column matrices ($N \times 1$).

source