Deborah.DeborahEsther.EstherDependencyManager
Deborah.DeborahEsther.EstherDependencyManager.ensure_TrM_exists — Functionensure_TrM_exists(
toml_path::String,
jobid::Union{Nothing, String}=nothing
) -> NothingEnsures that all required $\text{Tr} \, M^{-n} \; (n=1,2,3,4)$ data files exist for each input group. If any expected file is missing, the function automatically invokes EstherDependencyManager.run_Deborah_from_Esther to regenerate the missing outputs.
Arguments
toml_path::String: Path to theTOMLconfiguration file specifying input features, models, and output options.jobid::Union{Nothing, String}: Optional job ID string used for logging.
Behavior
- Parses the configuration and checks for the presence of output files associated with each
TrMigroup. - Each
TrMigroup ($\text{Tr} \, M^{-n} \; (n=1,2,3,4)$) consists of (X,Y,model) triplets. - For each group, it verifies the existence of files such as
Y_info,Y_bc,YP_bc, etc. - If any required file is missing,
EstherDependencyManager.run_Deborah_from_Estheris called to regenerate the outputs.
Returns
Nothing: This is a side-effect function that ensures required files exist or are created.
Deborah.DeborahEsther.EstherDependencyManager.generate_toml_dict — Methodgenerate_toml_dict(
location::String,
ensemble::String,
analysis_header::String,
X::Vector{String},
Y::String,
model::String,
read_column_X::Vector{Int},
read_column_Y::Int,
index_column::Int,
LBP::Int,
TRP::Int,
IDX_shift::Int,
dump_X::Bool,
ranseed::Int,
N_bs::Int,
blk_size::Int,
method::String,
bin_size::Int,
abbreviation::Dict{String,String},
use_abbreviation::Bool
) -> DictConstruct a TOML-compatible configuration dictionary for the Deborah.DeborahCore workflow.
Arguments
location::String: Base directory for outputs.ensemble::String: Ensemble identifier (e.g.,"L8T4b1.60k13570").analysis_header::String: Analysis folder prefix (e.g.,"analysis").X::Vector{String}: Input feature keys.Y::String: Target observable key.model::String: Model type (e.g.,"LightGBM","Lasso").read_column_X::Vector{Int}: $1$-based value-column indices for eachXfile.read_column_Y::Int: $1$-based value-column index for theYfile.index_column::Int: $1$-based column index of configuration IDs in files.LBP::Int: Label group ID (label partition parameter).TRP::Int: Training group ID (training partition parameter).IDX_shift::Int: Index offset/shift used byDeborah.DeborahCore.dump_X::Bool: Whether to dump input matrices.ranseed::Int: Random seed for bootstrap.N_bs::Int: Number of bootstrap replicates.blk_size::Int: Block length for block bootstrap ($\ge 1$).method::String: Block-bootstrap scheme to encode inTOML:"nonoverlapping"— Nonoverlapping Block Bootstrap (NBB)"moving"— Moving Block Bootstrap (MBB)"circular"— Circular Block Bootstrap (CBB)
bin_size::Int: Jackknife bin size.abbreviation::Dict{String,String}: Abbreviation map for input encoding.use_abbreviation::Bool: Use abbreviations in paths/filenames iftrue.
Returns
Dict: A nested,TOML-ready dictionary including (at minimum) sections/keys for:- data/paths (
location,ensemble,analysis_header,abbreviations), - IO columns (
read_column_X,read_column_Y,index_column,IDX_shift,dump_X), - bootstrap (
ranseed,N_bs,blk_size,method), - jackknife (
bin_size), - partitions (
LBP,TRP), - model (
model,X,Y).
- data/paths (
Notes
- All column indices are $1$-based.
methodmust be one of the three literals above; invalid values should be rejected by the caller.
Deborah.DeborahEsther.EstherDependencyManager.run_Deborah_from_Esther — Functionrun_Deborah_from_Esther(
location::String,
ensemble::String,
analysis_header::String,
X::Vector{String},
Y::String,
model::String,
read_column_X::Vector{Int},
read_column_Y::Int,
index_column::Int,
LBP::Int,
TRP::Int,
IDX_shift::Int,
dump_X::Bool,
ranseed::Int,
N_bs::Int,
blk_size::Int,
method::String,
bin_size::Int,
overall_name::String,
abbreviation::Dict{String,String},
use_abbreviation::Bool,
jobid::Union{Nothing, String}=nothing
) -> NothingInvoke Deborah.DeborahCore from within Deborah.Esther if required $\text{Tr} \, M^{-n} \; (n=1,2,3,4)$ files are missing. This function generates a temporary TOML configuration from the provided arguments, writes it to disk, and launches the Deborah.DeborahCore workflow.
Arguments
location::String: Base path for output directory.ensemble::String: Ensemble identifier (e.g.,"L8T4b1.60k13570").analysis_header::String: Analysis name prefix (e.g.,"analysis").X::Vector{String}: Input feature list.Y::String: Target observable key.model::String: Model name (e.g.,"LightGBM").read_column_X::Vector{Int}: $1$-based column indices for values in eachXfile.read_column_Y::Int: $1$-based column index for values in theYfile.index_column::Int: $1$-based column index for configuration index.LBP::Int: Label group ID.TRP::Int: Training group ID.IDX_shift::Int: Index offset shift used byDeborah.DeborahCore.dump_X::Bool: Whether to dump input matrices.ranseed::Int: Random seed for bootstrap.N_bs::Int: Number of bootstrap replicates.blk_size::Int: Block length used for block bootstrap ($\ge 1$).method::String: Block-bootstrap scheme to use:"nonoverlapping"— Nonoverlapping Block Bootstrap (NBB)."moving"— Moving Block Bootstrap (MBB)."circular"— Circular Block Bootstrap (CBB; wrap-around windows).
bin_size::Int: Jackknife bin size.overall_name::String: Unified name tag for output files.abbreviation::Dict{String,String}: Abbreviation map for input encoding.use_abbreviation::Bool: Whether to use abbreviations in paths/filenames.jobid::Union{Nothing,String}: Optional job ID for logging.
Behavior
- Resolves output directory according to
use_abbreviation. - Saves the
TOMLasconfig_Deborah_*.tomlunder theoutput_dir. - Launches
Deborah.DeborahCorewith the generated configuration.
Returns
Nothing— side-effecting helper.
Notes
methodmust be one of"nonoverlapping","moving","circular"; invalid values should raise an error before launching.- If existing Deborah outputs are present and valid, this function should be a no-op (caller-dependent).