Deborah.Sarah.DatasetPartitioner

Deborah.Sarah.DatasetPartitioner.DatasetPartitionInfoType
struct DatasetPartitionInfo

Holds full information about how configurations are partitioned into labeled, training, bias-correction, and unlabeled subsets.

Fields

  • M_tot::Int : Total number of data rows (N_cnf $\times$ N_src).

  • N_cnf::Int : Number of configurations.

  • N_src::Int : Number of source vectors per configuration ($1$ if not applicable).

  • N_lb::Int : Number of labeled set.

  • N_tr::Int : Number of training set.

  • N_lb_src::Int : Number of total labeled set (N_lb $\times$ N_src).

  • N_tr_src::Int : Number of total training set (N_tr $\times$ N_src).

  • N_bc::Int : Number of bias-correction set (N_lb - N_tr).

  • N_bc_persrc::Int : Number of bias correction set per source.

  • N_ul::Int : Number of unlabeled set (N_cnf - N_lb).

  • N_ul_persrc::Int : Number of unlabeled set per source (same as above).

  • lb_idx::Vector{Int} : Indices used for labeled set (length N_lb).

  • tr_idx::Vector{Int} : Subset of lb_idx used for training.

  • bc_idx::Vector{Int} : Complement of tr_idx within lb_idx.

  • ul_idx::Vector{Int} : Complement of lb_idx within all configurations.

source
Deborah.Sarah.DatasetPartitioner.find_equivalent_shiftMethod
find_equivalent_shift(
    lb_idx::Vector{Int},
    N_cnf::Int,
    N_lb::Int,
    shift_given::Int
) -> Union{Int, Nothing}

Given a set of labeled indices, determine what shift amount would generate the same set via modular sampling logic.

Arguments

  • lb_idx::Vector{Int} : Reference labeled indices.
  • N_cnf::Int : Total number of configurations.
  • N_lb::Int : Number of labeled configs.
  • shift_given::Int : Initial shift amount to test first.

Returns

  • Int if a matching shift is found, otherwise nothing.
source
Deborah.Sarah.DatasetPartitioner.gen_set_idxFunction
gen_set_idx(
    N_cnf::Int, 
    N_lb::Int, 
    N_tr::Int, 
    N_bc::Int, 
    N_ul::Int,
    IDX_shift::Int, 
    jobid::Union{Nothing, String}=nothing
) -> Tuple{Vector{Int}, Vector{Int}, Vector{Int}, Vector{Int}}

Generates index vectors for labeled (lb), training (tr), bias-correction (bc), and unlabeled (ul) configurations in a deterministic but shiftable way.

Arguments

  • N_cnf::Int : Total number of configurations.
  • N_lb::Int : Number of labeled set.
  • N_tr::Int : Number of training set (subset of labeled).
  • N_bc::Int : Number of bias correction set.
  • N_ul::Int : Number of unlabeled set.
  • IDX_shift::Int : Configuration shift amount.
  • jobid : Optional job logging tag.

Returns

  • Tuple of (lb_idx, tr_idx, bc_idx, ul_idx) as vectors of Int.
source