Deborah.Sarah.DatasetPartitioner
Deborah.Sarah.DatasetPartitioner.DatasetPartitionInfo — Typestruct DatasetPartitionInfoHolds full information about how configurations are partitioned into labeled, training, bias-correction, and unlabeled subsets.
Fields
M_tot::Int: Total number of data rows (N_cnf$\times$N_src).N_cnf::Int: Number of configurations.N_src::Int: Number of source vectors per configuration ($1$ if not applicable).N_lb::Int: Number of labeled set.N_tr::Int: Number of training set.N_lb_src::Int: Number of total labeled set (N_lb$\times$N_src).N_tr_src::Int: Number of total training set (N_tr$\times$N_src).N_bc::Int: Number of bias-correction set (N_lb - N_tr).N_bc_persrc::Int: Number of bias correction set per source.N_ul::Int: Number of unlabeled set (N_cnf - N_lb).N_ul_persrc::Int: Number of unlabeled set per source (same as above).lb_idx::Vector{Int}: Indices used for labeled set (lengthN_lb).tr_idx::Vector{Int}: Subset oflb_idxused for training.bc_idx::Vector{Int}: Complement oftr_idxwithinlb_idx.ul_idx::Vector{Int}: Complement oflb_idxwithin all configurations.
Deborah.Sarah.DatasetPartitioner.find_equivalent_shift — Methodfind_equivalent_shift(
lb_idx::Vector{Int},
N_cnf::Int,
N_lb::Int,
shift_given::Int
) -> Union{Int, Nothing}Given a set of labeled indices, determine what shift amount would generate the same set via modular sampling logic.
Arguments
lb_idx::Vector{Int}: Reference labeled indices.N_cnf::Int: Total number of configurations.N_lb::Int: Number of labeled configs.shift_given::Int: Initial shift amount to test first.
Returns
Intif a matching shift is found, otherwisenothing.
Deborah.Sarah.DatasetPartitioner.gen_set_idx — Functiongen_set_idx(
N_cnf::Int,
N_lb::Int,
N_tr::Int,
N_bc::Int,
N_ul::Int,
IDX_shift::Int,
jobid::Union{Nothing, String}=nothing
) -> Tuple{Vector{Int}, Vector{Int}, Vector{Int}, Vector{Int}}Generates index vectors for labeled (lb), training (tr), bias-correction (bc), and unlabeled (ul) configurations in a deterministic but shiftable way.
Arguments
N_cnf::Int: Total number of configurations.N_lb::Int: Number of labeled set.N_tr::Int: Number of training set (subset of labeled).N_bc::Int: Number of bias correction set.N_ul::Int: Number of unlabeled set.IDX_shift::Int: Configuration shift amount.jobid: Optional job logging tag.
Returns
- Tuple of
(lb_idx, tr_idx, bc_idx, ul_idx)as vectors ofInt.