Deborah.DeborahCore.BaselineSequence

Deborah.DeborahCore.BaselineSequence.baseline_sequenceFunction
baseline_sequence(
    ML_inputs::MLInputPreparer.MLInputBundle,
    partition::DatasetPartitioner.DatasetPartitionInfo, 
    paths::PathConfigBuilderDeborah.DeborahPathConfig, 
    jobid::Union{Nothing, String}=nothing;
    read_column_Y::Int, 
    dump::Bool=true
) -> Dict{Symbol, Matrix{T}} where T<:Real

Construct a baseline prediction sequence where inputs and targets are identical, i.e., $X = Y = Y^P$, using the specified observable column from the data.

Arguments

Keyword Arguments

  • read_column_Y::Int $1$-based column index to extract from Y_df.

  • dump::Bool = true If true, saves all generated Y_* and YP_* vectors to disk.

Returns

  • Dict{Symbol, Matrix{T}} Dictionary containing entries like :Y_bc, :YP_bc, etc., where each is a $1$-column matrix ($N \times 1$) corresponding to a partition.

Behavior

  • For each partition tag (e.g., :tr, :ul, etc.), generates:
    • Y_tag: extracted from Y_df[:, read_column_Y]
    • YP_tag: identical copy of Y_tag
  • Does not perform any training or randomization – this is a deterministic identity baseline.
  • Outputs are structurally identical to those of full ML pipelines, making them suitable for direct performance comparisons.

Notes

source
Deborah.DeborahCore.BaselineSequence.random_sequenceFunction
random_sequence(
    ML_inputs::MLInputPreparer.MLInputBundle,
    partition::DatasetPartitioner.DatasetPartitionInfo, 
    paths::PathConfigBuilderDeborah.DeborahPathConfig,
    ranseed::Int, 
    jobid::Union{Nothing, String}=nothing; 
    read_column_Y::Int, 
    dump::Bool=true
) -> Dict{Symbol, Matrix{T}} where T<:Real

Generate a synthetic prediction sequence using Gaussian noise with jackknife-estimated variance, based on the input observable matrix.

This function is used primarily to create random baseline predictions for machine learning workflows, especially when training is skipped or for testing stochastic behavior.

Arguments

Keyword Arguments

  • read_column_Y::Int $1$-based column index to select from Y_df.

  • dump::Bool=true If true, generated vectors are saved to disk in standard Y_*/YP_* format.

Behavior

  • Computes jackknife-based standard deviation per partition using the selected column from Y_df.
  • For partitions used in prediction (YP_*), generates Gaussian random values with mean zero and matching standard deviation.
  • Partitions not involved in prediction (e.g., :lb, sometimes :bc) receive empty vectors.
  • Output is a dictionary with keys :Y_lb, :Y_tr, ..., :YP_tr, :YP_ul, etc.

Returns

  • Dict{Symbol, Matrix{T}} Dictionary mapping tags like :Y_bc, :YP_bc, etc., to corresponding 1-column matrices ($N \times 1$).
source