Deborah.Rebekah.Comparison

Deborah.Rebekah.Comparison.bhattacharyya_coeff_normalsMethod
bhattacharyya_coeff_normals(
    μa::Float64, 
    σa::Float64,
    μb::Float64, 
    σb::Float64;
    σ_floor::Float64=1e-12
) -> Float64

Compute the Bhattacharyya coefficient ($\mathrm{BC}$) between two normal distributions $\mathcal{N}(\mu_a, \sigma_a^2)$ and $\mathcal{N}(\mu_b, \sigma_b^2)$.

Range

  • $\mathrm{BC} \in [0, 1]$
  • $\mathrm{BC} = 1$: complete overlap (identical distributions)
  • $\mathrm{BC} = 0$: no overlap in the Bhattacharyya sense

Notes

  • Inputs σa, σb are interpreted as $1 \sigma$ standard deviations.
  • To avoid degeneracy when $\sigma \approx 0$, both σa and σb are floored:

\[ \sigma \leftarrow \max(\sigma, \sigma_{\text{floor}})\]

Formula

For two normals,

\[\mathrm{BC}(\mu_a, \sigma_a; \mu_b, \sigma_b) \;=\; \sqrt{ \frac{2\,\sigma_a \sigma_b}{\sigma_a^2 + \sigma_b^2} } \; \exp\!\left( - \frac{(\mu_a - \mu_b)^2}{4(\sigma_a^2 + \sigma_b^2)} \right).\]

Notes

  • This function interprets input uncertainties as $1 \sigma$ standard deviations.
  • To avoid degeneracy when $\sigma \approx 0$, a small floor σ_floor is applied.
source
Deborah.Rebekah.Comparison.build_bhattacharyya_dictsMethod
build_bhattacharyya_dicts(
    new_dict::Dict{String, Array{Float64,2}},
    keys::Vector{Symbol},
    pred_tags::Vector{Symbol},
    orig_tag::Symbol,
    labels::Vector,
    trains_ext::Vector;
    σ_floor::Float64 = 1e-12,
    also_hellinger::Bool = false
) -> Tuple{Dict{Tuple{String,String}, Array{Float64,2}},
         Union{Nothing, Dict{Tuple{String,String}, Array{Float64,2}}}}

Construct Bhattacharyya-coefficient ($\mathrm{BC}$) matrices — and optionally Hellinger-distance matrices – for each (observable key, prediction tag) pair against a fixed reference tag, following the same naming scheme as build_overlap_and_error_dicts.

Key construction matches your logic:

  • If key == :Deborah
    • pred = string(tag)
    • orig = "Y:" * string(orig_tag)
  • Else
    • pred = string(key) * ":" * string(tag)
    • orig = string(key) * ":Y:" * string(orig_tag)

For each grid point (ilb, itr): -μ_pred = new_dict[pred * ":avg"][ilb, itr]

  • σ_pred = new_dict[pred * ":err"][ilb, itr]
  • μ_orig = new_dict[orig * ":avg"][ilb, itr]
  • σ_orig = new_dict[orig * ":err"][ilb, itr]

and BC = bhattacharyya_coeff_normals(μ_pred, σ_pred, μ_orig, σ_orig).

Arguments

  • new_dict : Must contain the 2D arrays for ":avg" and ":err" of both pred/orig.
  • keys : Observable keys (e.g., [:Deborah, :cond, :susp, :skew, :kurt]).
  • pred_tags : Prediction tags (e.g., [:Y_P1, :Y_P2]).
  • orig_tag : Reference tag (e.g., :Y_BS).
  • labels : Row axis (LBP-like index).
  • trains_ext : Column axis (TRP-like index).
  • σ_floor : Small $\sigma$ floor to stabilize degenerate cases.
  • also_hellinger : If true, also returns Hellinger matrices.

Returns

  • bc_dict : Dict[(pred, orig)] $\Rightarrow$ 2D Float64 matrix of $\mathrm{BC}$ in [0,1].
  • H_dict :Dict[...] of Hellinger in [0,1] (if also_hellinger is true), else nothing.
source
Deborah.Rebekah.Comparison.build_jsd_dictsMethod
build_jsd_dicts(
    new_dict::Dict{String, Array{Float64,2}},
    keys::Vector{Symbol},
    pred_tags::Vector{Symbol},
    orig_tag::Symbol,
    labels::Vector,
    trains_ext::Vector;
    σ_floor::Float64 = 1e-12,
    k::Float64 = 8.0,
    n::Int = 1201
) -> Dict{Tuple{String,String}, Array{Float64,2}}

Construct Jensen-Shannon divergence ($\mathrm{JSD}$, base-2) matrices for each (observable key, prediction tag) pair against a fixed reference tag, following the same naming scheme as build_overlap_and_error_dicts.

Key construction matches your logic:

  • If key == :Deborah
    • pred = string(tag)
    • orig = "Y:" * string(orig_tag)
  • Else
    • pred = string(key) * ":" * string(tag)
    • orig = string(key) * ":Y:" * string(orig_tag)

For each grid point (ilb, itr):

  • μ_pred = new_dict[pred * ":avg"][ilb, itr]
  • σ_pred = new_dict[pred * ":err"][ilb, itr]
  • μ_orig = new_dict[orig * ":avg"][ilb, itr]
  • σ_orig = new_dict[orig * ":err"][ilb, itr]

and JSD = jsd_normals(μ_pred, σ_pred, μ_orig, σ_orig; ...).

Arguments

  • new_dict : Must contain the 2D arrays for ":avg" and ":err" of both pred/orig.
  • keys : Observable keys (e.g., [:Deborah, :cond, :susp, :skew, :kurt]).
  • pred_tags: Prediction tags (e.g., [:Y_P1, :Y_P2]).
  • orig_tag : Reference tag (e.g., :Y_BS).
  • labels : Row axis (LBP-like index).
  • trains_ext: Column axis (TRP-like index).
  • σ_floor, k, n: Parameters forwarded to jsd_normals.

Returns

  • jsd_dict : Dict[(pred, orig)] $\Rightarrow$ 2D Float64 matrix of $\mathrm{JSD}$ in $[0,1]$ (0 best).
source
Deborah.Rebekah.Comparison.build_overlap_and_error_dictsMethod
build_overlap_and_error_dicts(
    new_dict::Dict{String, Array{Float64,2}},
    keys::Vector{Symbol},
    pred_tags::Vector{Symbol},
    orig_tag::Symbol,
    labels::Vector,
    trains_ext::Vector
) -> Tuple{Dict{Tuple{String,String}, Array{Int,2}}, Dict{Tuple{String,String}, Array{Float64,2}}}

Construct dictionaries containing overlap flags and error ratios between prediction and reference observables.

This function iterates over all combinations of observable keys and prediction tags to compare against a fixed reference. For each pair, it computes:

The results are returned as two dictionaries keyed by (pred, orig) string pairs.

Arguments

  • new_dict: Dictionary of 2D arrays containing average and error values for all observables.
  • keys: Observable types (e.g., :Deborah, :TrM1, etc.).
  • pred_tags: Prediction tags (e.g., :Y_P1, :Y_P2).
  • orig_tag: Tag to use for the reference data (usually :Y_BS).
  • labels: Labeled set index vector for the vertical axis (LBP).
  • trains_ext: Training set index vector for the horizontal axis (TRP).

Returns

  • A tuple of two dictionaries:
    • chk_dict: Maps (pred, orig) → overlap matrix (Int)
    • err_dict: Maps (pred, orig) → error ratio matrix (Float64)
source
Deborah.Rebekah.Comparison.check_overlapMethod
check_overlap(
    a::Float64, 
    ea::Float64, 
    b::Float64, 
    eb::Float64
) -> Int

Check whether two values $\mu_a ± \sigma_a$ and $\mu_b ± \sigma_b$ statistically overlap.

  • Returns 2 if both intervals mutually include each other.
  • Returns 1 if only one direction overlaps.
  • Returns 0 if there is no overlap at all.

Arguments

  • a::Float64 : Central value of the first measurement.
  • ea::Float64 : Uncertainty (error bar) of the first value.
  • b::Float64 : Central value of the second measurement.
  • eb::Float64 : Uncertainty (error bar) of the second value.

Returns

  • Int : Overlap status
    • 0: no overlap
    • 1: one-sided overlap
    • 2: mutual overlap
source
Deborah.Rebekah.Comparison.check_overlap_type_bMethod
check_overlap_type_b(
    μa::Float64, 
    σa::Float64, 
    μb::Float64; 
    σ_floor::Float64=1e-12
) -> Float64

Return a normalized separation between two central values using the uncertainty of a:

\[d \;\equiv\; \frac{|\mu_a - \mu_b|}{\max(\sigma_a, \sigma_{\text{floor}})}.\]

Interpretation

  • d = 0 : identical central values.
  • d = 1 : μb is one-σ away from μa (measured relative to σa).
  • Larger d means farther separation in units of σa.

Notes

  • This is intentionally asymmetric: it measures distance relative to σa only.
  • A small floor σ_floor avoids division by (near) zero.

See also

source
Deborah.Rebekah.Comparison.err_ratioMethod
err_ratio(
    mles_err::Float64, 
    orig_err::Float64
) -> Float64

Compute the relative error ratio between machine-learned and original estimates.

This function returns the ratio of $\sigma_{\text{MLES}} / \sigma_{\text{ORIG}}$, where $\sigma_{\text{MLES}}$ is the estimated error from a machine-learned method, and $\sigma_{\text{ORIG}}$ is the baseline error.

Arguments

  • mles_err::Float64: Machine-learned error estimate.
  • orig_err::Float64: Original (baseline) error estimate.

Returns

  • A Float64 representing the relative error ratio.
source
Deborah.Rebekah.Comparison.hellinger_from_bcMethod
hellinger_from_bc(bc::Float64) -> Float64

Convert Bhattacharyya coefficient into Hellinger distance.

Range

  • $H \in [0,1]$
  • $H = 0$: best (identical distributions)
  • $H = 1$: worst (maximal separation)

Formula

Given a Bhattacharyya coefficient ($\mathrm{BC}$),

\[H = \sqrt{\,1 - \mathrm{BC}\,}.\]

Here, ($\mathrm{BC}$) is clamped into ($[0,1]$) before evaluation.

source
Deborah.Rebekah.Comparison.jsd_normalsMethod
jsd_normals(
    μa::Float64, 
    σa::Float64,
    μb::Float64, 
    σb::Float64;
    σ_floor::Float64 = 1e-12,
    k::Float64 = 8.0,
    n::Int = 2001
) -> Float64

Compute the Jensen-Shannon divergence ($\mathrm{JSD}$), base-2, between two univariate normal distributions $\mathcal{N}(\mu_a,\sigma_a^2)$ and $\mathcal{N}(\mu_b,\sigma_b^2)$ numerically.

Range

  • $\mathrm{JSD}_2 \in [0,1]$
  • $\mathrm{JSD}_2 = 0 \Leftrightarrow$ identical distributions
  • $\mathrm{JSD}_2 = 1 \Leftrightarrow$ maximally different (in the $\mathrm{JSD}$ sense, base-2)

Definition (base-2)

For densities $p,q$ and the mixture $m=\tfrac12(p+q)$,

\[\mathrm{JSD}_2(P\|Q) \;=\; \frac12 \int_{\mathbb{R}} p(x)\,\log_2\!\frac{p(x)}{m(x)}\,dx \;+\; \frac12 \int_{\mathbb{R}} q(x)\,\log_2\!\frac{q(x)}{m(x)}\,dx.\]

Specialization to normals

Let $p = \mathcal{N}(\mu_a,\sigma_a^2)$, $q=\mathcal{N}(\mu_b,\sigma_b^2)$. There is no simple closed-form for $\mathrm{JSD}$ between two general Gaussians, so this routine computes it by numerical quadrature over a finite window that covers both tails.

Integration window

We integrate on

\[[L, R] \;=\; \Big[\min(\mu_a - k\,\sigma_a^{\ast},\, \mu_b - k\,\sigma_b^{\ast}),\; \max(\mu_a + k\,\sigma_a^{\ast},\, \mu_b + k\,\sigma_b^{\ast})\Big],\]

where $\sigma^{\ast} = \max(\sigma,\sigma_{\text{floor}})$. The hyperparameter $k$ controls tail coverage (default $k=8$).

Uniform grid approximation

Using n evenly spaced points $x_1,\dots,x_n$ on $[L,R]$ with spacing $\Delta x$,

\[\widehat{\mathrm{JSD}}_2 \;=\; \Delta x \sum_{i=1}^{n} \frac12\Big(p(x_i)\,\log_2\!\frac{p(x_i)}{m(x_i)} + q(x_i)\,\log_2\!\frac{q(x_i)}{m(x_i)}\Big), \qquad m(x_i)=\tfrac12\big(p(x_i)+q(x_i)\big).\]

Here $p(x)$ and $q(x)$ are the normal p.d.f.'s:

\[\phi(x;\mu,\sigma) =\frac{1}{\sqrt{2\pi}\,\sigma}\exp\!\Big(-\tfrac12\big(\tfrac{x-\mu}{\sigma}\big)^2\Big), \quad \sigma \leftarrow \max(\sigma, \sigma_{\text{floor}}).\]

Numerical stability & safeguards

  • Sigma floor: $\sigma \leftarrow \max(\sigma,\sigma_{\text{floor}})$ prevents degeneracy as $\sigma\to 0$.
  • Window fallback: if $R\le L$ due to extreme inputs, a tiny symmetric window around $\tfrac{\mu_a+\mu_b}{2}$ is used.
  • Clamping: the final result is clamped into $[0,1]$ to absorb floating-point round-off.

Parameters

  • μa::Float64, σa::Float64, μb::Float64, σb::Float64 — normal means and $1\sigma$ standard deviations.
  • σ_floor::Float64=1e-12 — lower bound for standard deviations.
  • k::Float64=8.0 — tail coverage in units of $\sigma$.
  • n::Int=2001 — number of grid points (increase for accuracy, at higher cost).

Returns

  • Float64$\widehat{\mathrm{JSD}}_2 \in [0,1]$, base-2.

Complexity

  • Time: $\mathcal{O}(n)$ evaluations of pdf and logs.
  • Memory: $\mathcal{O}(1)$ besides the grid iterator.

Jensen-Shannon divergence ($\mathrm{JSD}$) between two univariate normal distributions $N(\mu_a, \sigma_a^2)$ and $N(\mu_b, \sigma_b^2)$, computed numerically with base-2 logarithms.

  • Returns a value in $[0, 1]$, where $0$ means identical, $1$ means maximally different.
  • Uses a finite window $[L, R]$ with L = min(μa - kσa, μb - kσb), R = max(μa + kσa, μb + kσb).
  • Integrates on a uniform grid of n points. Larger n or k increases accuracy.

Notes

  • σ_floor avoids degeneracy when $\sigma \approx 0$ (interprets inputs as $1\sigma$ standard deviation).
  • No dependencies on external packages.
source