Deborah.Rebekah.Comparison

Deborah.Rebekah.Comparison.bhattacharyya_coeff_normals — Method

bhattacharyya_coeff_normals(
    μa::Float64, 
    σa::Float64,
    μb::Float64, 
    σb::Float64;
    σ_floor::Float64=1e-12
) -> Float64

Compute the Bhattacharyya coefficient ($\mathrm{BC}$) between two normal distributions $\mathcal{N}(\mu_a, \sigma_a^2)$ and $\mathcal{N}(\mu_b, \sigma_b^2)$.

Range

$\mathrm{BC} \in [0, 1]$
$\mathrm{BC} = 1$: complete overlap (identical distributions)
$\mathrm{BC} = 0$: no overlap in the Bhattacharyya sense

Notes

Inputs σa, σb are interpreted as $1 \sigma$ standard deviations.
To avoid degeneracy when $\sigma \approx 0$, both σa and σb are floored:

\[ \sigma \leftarrow \max(\sigma, \sigma_{\text{floor}})\]

Formula

For two normals,

\[\mathrm{BC}(\mu_a, \sigma_a; \mu_b, \sigma_b) \;=\; \sqrt{ \frac{2\,\sigma_a \sigma_b}{\sigma_a^2 + \sigma_b^2} } \; \exp\!\left( - \frac{(\mu_a - \mu_b)^2}{4(\sigma_a^2 + \sigma_b^2)} \right).\]

Notes

This function interprets input uncertainties as $1 \sigma$ standard deviations.
To avoid degeneracy when $\sigma \approx 0$, a small floor σ_floor is applied.

source

Deborah.Rebekah.Comparison.build_bhattacharyya_dicts — Method

build_bhattacharyya_dicts(
    new_dict::Dict{String, Array{Float64,2}},
    keys::Vector{Symbol},
    pred_tags::Vector{Symbol},
    orig_tag::Symbol,
    labels::Vector,
    trains_ext::Vector;
    σ_floor::Float64 = 1e-12,
    also_hellinger::Bool = false
) -> Tuple{Dict{Tuple{String,String}, Array{Float64,2}},
         Union{Nothing, Dict{Tuple{String,String}, Array{Float64,2}}}}

Construct Bhattacharyya-coefficient ($\mathrm{BC}$) matrices — and optionally Hellinger-distance matrices – for each (observable key, prediction tag) pair against a fixed reference tag, following the same naming scheme as build_overlap_and_error_dicts.

Key construction matches your logic:

If key == :Deborah
- pred = string(tag)
- orig = "Y:" * string(orig_tag)
Else
- pred = string(key) * ":" * string(tag)
- orig = string(key) * ":Y:" * string(orig_tag)

For each grid point (ilb, itr): -μ_pred = new_dict[pred * ":avg"][ilb, itr]

σ_pred = new_dict[pred * ":err"][ilb, itr]
μ_orig = new_dict[orig * ":avg"][ilb, itr]
σ_orig = new_dict[orig * ":err"][ilb, itr]

and BC = bhattacharyya_coeff_normals(μ_pred, σ_pred, μ_orig, σ_orig).

Arguments

new_dict : Must contain the 2D arrays for ":avg" and ":err" of both pred/orig.
keys : Observable keys (e.g., [:Deborah, :cond, :susp, :skew, :kurt]).
pred_tags : Prediction tags (e.g., [:Y_P1, :Y_P2]).
orig_tag : Reference tag (e.g., :Y_BS).
labels : Row axis (LBP-like index).
trains_ext : Column axis (TRP-like index).
σ_floor : Small $\sigma$ floor to stabilize degenerate cases.
also_hellinger : If true, also returns Hellinger matrices.

Returns

bc_dict : Dict[(pred, orig)] $\Rightarrow$ 2D Float64 matrix of $\mathrm{BC}$ in [0,1].
H_dict :Dict[...] of Hellinger in [0,1] (if also_hellinger is true), else nothing.

source

Deborah.Rebekah.Comparison.build_jsd_dicts — Method

build_jsd_dicts(
    new_dict::Dict{String, Array{Float64,2}},
    keys::Vector{Symbol},
    pred_tags::Vector{Symbol},
    orig_tag::Symbol,
    labels::Vector,
    trains_ext::Vector;
    σ_floor::Float64 = 1e-12,
    k::Float64 = 8.0,
    n::Int = 1201
) -> Dict{Tuple{String,String}, Array{Float64,2}}

Construct Jensen-Shannon divergence ($\mathrm{JSD}$, base-2) matrices for each (observable key, prediction tag) pair against a fixed reference tag, following the same naming scheme as build_overlap_and_error_dicts.

Key construction matches your logic:

If key == :Deborah
- pred = string(tag)
- orig = "Y:" * string(orig_tag)
Else
- pred = string(key) * ":" * string(tag)
- orig = string(key) * ":Y:" * string(orig_tag)

For each grid point (ilb, itr):

μ_pred = new_dict[pred * ":avg"][ilb, itr]
σ_pred = new_dict[pred * ":err"][ilb, itr]
μ_orig = new_dict[orig * ":avg"][ilb, itr]
σ_orig = new_dict[orig * ":err"][ilb, itr]

and JSD = jsd_normals(μ_pred, σ_pred, μ_orig, σ_orig; ...).

Arguments

new_dict : Must contain the 2D arrays for ":avg" and ":err" of both pred/orig.
keys : Observable keys (e.g., [:Deborah, :cond, :susp, :skew, :kurt]).
pred_tags: Prediction tags (e.g., [:Y_P1, :Y_P2]).
orig_tag : Reference tag (e.g., :Y_BS).
labels : Row axis (LBP-like index).
trains_ext: Column axis (TRP-like index).
σ_floor, k, n: Parameters forwarded to jsd_normals.

Returns

jsd_dict : Dict[(pred, orig)] $\Rightarrow$ 2D Float64 matrix of $\mathrm{JSD}$ in $[0,1]$ (0 best).

source

Deborah.Rebekah.Comparison.build_overlap_and_error_dicts — Method

build_overlap_and_error_dicts(
    new_dict::Dict{String, Array{Float64,2}},
    keys::Vector{Symbol},
    pred_tags::Vector{Symbol},
    orig_tag::Symbol,
    labels::Vector,
    trains_ext::Vector
) -> Tuple{Dict{Tuple{String,String}, Array{Int,2}}, Dict{Tuple{String,String}, Array{Float64,2}}}

Construct dictionaries containing overlap flags and error ratios between prediction and reference observables.

This function iterates over all combinations of observable keys and prediction tags to compare against a fixed reference. For each pair, it computes:

An overlap matrix (0, 1, 2) via check_overlap
An error ratio matrix via err_ratio

The results are returned as two dictionaries keyed by (pred, orig) string pairs.

Arguments

new_dict: Dictionary of 2D arrays containing average and error values for all observables.
keys: Observable types (e.g., :Deborah, :TrM1, etc.).
pred_tags: Prediction tags (e.g., :Y_P1, :Y_P2).
orig_tag: Tag to use for the reference data (usually :Y_BS).
labels: Labeled set index vector for the vertical axis (LBP).
trains_ext: Training set index vector for the horizontal axis (TRP).

Returns

A tuple of two dictionaries:
- chk_dict: Maps (pred, orig) → overlap matrix (Int)
- err_dict: Maps (pred, orig) → error ratio matrix (Float64)

source

Deborah.Rebekah.Comparison.check_overlap — Method

check_overlap(
    a::Float64, 
    ea::Float64, 
    b::Float64, 
    eb::Float64
) -> Int

Check whether two values $\mu_a ± \sigma_a$ and $\mu_b ± \sigma_b$ statistically overlap.

Returns 2 if both intervals mutually include each other.
Returns 1 if only one direction overlaps.
Returns 0 if there is no overlap at all.

Arguments

a::Float64 : Central value of the first measurement.
ea::Float64 : Uncertainty (error bar) of the first value.
b::Float64 : Central value of the second measurement.
eb::Float64 : Uncertainty (error bar) of the second value.

Returns

Int : Overlap status
- 0: no overlap
- 1: one-sided overlap
- 2: mutual overlap

source

Deborah.Rebekah.Comparison.check_overlap_type_b — Method

check_overlap_type_b(
    μa::Float64, 
    σa::Float64, 
    μb::Float64; 
    σ_floor::Float64=1e-12
) -> Float64

Return a normalized separation between two central values using the uncertainty of a:

\[d \;\equiv\; \frac{|\mu_a - \mu_b|}{\max(\sigma_a, \sigma_{\text{floor}})}.\]

Interpretation

d = 0 : identical central values.
d = 1 : μb is one-σ away from μa (measured relative to σa).
Larger d means farther separation in units of σa.

Notes

This is intentionally asymmetric: it measures distance relative to σa only.
A small floor σ_floor avoids division by (near) zero.

See also

bhattacharyya_coeff_normals — overlap proxy using both variances.
check_overlap — interval-overlap classifier (0/1/2).

source

Deborah.Rebekah.Comparison.err_ratio — Method

err_ratio(
    mles_err::Float64, 
    orig_err::Float64
) -> Float64

Compute the relative error ratio between machine-learned and original estimates.

This function returns the ratio of $\sigma_{\text{MLES}} / \sigma_{\text{ORIG}}$, where $\sigma_{\text{MLES}}$ is the estimated error from a machine-learned method, and $\sigma_{\text{ORIG}}$ is the baseline error.

Arguments

mles_err::Float64: Machine-learned error estimate.
orig_err::Float64: Original (baseline) error estimate.

Returns

A Float64 representing the relative error ratio.

source

Deborah.Rebekah.Comparison.hellinger_from_bc — Method

hellinger_from_bc(bc::Float64) -> Float64

Convert Bhattacharyya coefficient into Hellinger distance.

Range

$H \in [0,1]$
$H = 0$: best (identical distributions)
$H = 1$: worst (maximal separation)

Formula

Given a Bhattacharyya coefficient ($\mathrm{BC}$),

\[H = \sqrt{\,1 - \mathrm{BC}\,}.\]

Here, ($\mathrm{BC}$) is clamped into ($[0,1]$) before evaluation.

source

Deborah.Rebekah.Comparison.jsd_normals — Method

jsd_normals(
    μa::Float64, 
    σa::Float64,
    μb::Float64, 
    σb::Float64;
    σ_floor::Float64 = 1e-12,
    k::Float64 = 8.0,
    n::Int = 2001
) -> Float64

Compute the Jensen-Shannon divergence ($\mathrm{JSD}$), base-2, between two univariate normal distributions $\mathcal{N}(\mu_a,\sigma_a^2)$ and $\mathcal{N}(\mu_b,\sigma_b^2)$ numerically.

Range

$\mathrm{JSD}_2 \in [0,1]$
$\mathrm{JSD}_2 = 0 \Leftrightarrow$ identical distributions
$\mathrm{JSD}_2 = 1 \Leftrightarrow$ maximally different (in the $\mathrm{JSD}$ sense, base-2)

Definition (base-2)

For densities $p,q$ and the mixture $m=\tfrac12(p+q)$,

\[\mathrm{JSD}_2(P\|Q) \;=\; \frac12 \int_{\mathbb{R}} p(x)\,\log_2\!\frac{p(x)}{m(x)}\,dx \;+\; \frac12 \int_{\mathbb{R}} q(x)\,\log_2\!\frac{q(x)}{m(x)}\,dx.\]

Specialization to normals

Let $p = \mathcal{N}(\mu_a,\sigma_a^2)$, $q=\mathcal{N}(\mu_b,\sigma_b^2)$. There is no simple closed-form for $\mathrm{JSD}$ between two general Gaussians, so this routine computes it by numerical quadrature over a finite window that covers both tails.

Integration window

We integrate on

\[[L, R] \;=\; \Big[\min(\mu_a - k\,\sigma_a^{\ast},\, \mu_b - k\,\sigma_b^{\ast}),\; \max(\mu_a + k\,\sigma_a^{\ast},\, \mu_b + k\,\sigma_b^{\ast})\Big],\]

where $\sigma^{\ast} = \max(\sigma,\sigma_{\text{floor}})$. The hyperparameter $k$ controls tail coverage (default $k=8$).

Uniform grid approximation

Using n evenly spaced points $x_1,\dots,x_n$ on $[L,R]$ with spacing $\Delta x$,

\[\widehat{\mathrm{JSD}}_2 \;=\; \Delta x \sum_{i=1}^{n} \frac12\Big(p(x_i)\,\log_2\!\frac{p(x_i)}{m(x_i)} + q(x_i)\,\log_2\!\frac{q(x_i)}{m(x_i)}\Big), \qquad m(x_i)=\tfrac12\big(p(x_i)+q(x_i)\big).\]

Here $p(x)$ and $q(x)$ are the normal p.d.f.'s:

\[\phi(x;\mu,\sigma) =\frac{1}{\sqrt{2\pi}\,\sigma}\exp\!\Big(-\tfrac12\big(\tfrac{x-\mu}{\sigma}\big)^2\Big), \quad \sigma \leftarrow \max(\sigma, \sigma_{\text{floor}}).\]

Numerical stability & safeguards

Sigma floor: $\sigma \leftarrow \max(\sigma,\sigma_{\text{floor}})$ prevents degeneracy as $\sigma\to 0$.
Window fallback: if $R\le L$ due to extreme inputs, a tiny symmetric window around $\tfrac{\mu_a+\mu_b}{2}$ is used.
Clamping: the final result is clamped into $[0,1]$ to absorb floating-point round-off.

Parameters

μa::Float64, σa::Float64, μb::Float64, σb::Float64 — normal means and $1\sigma$ standard deviations.
σ_floor::Float64=1e-12 — lower bound for standard deviations.
k::Float64=8.0 — tail coverage in units of $\sigma$.
n::Int=2001 — number of grid points (increase for accuracy, at higher cost).

Returns

Float64 — $\widehat{\mathrm{JSD}}_2 \in [0,1]$, base-2.

Complexity

Time: $\mathcal{O}(n)$ evaluations of pdf and logs.
Memory: $\mathcal{O}(1)$ besides the grid iterator.

Jensen-Shannon divergence ($\mathrm{JSD}$) between two univariate normal distributions $N(\mu_a, \sigma_a^2)$ and $N(\mu_b, \sigma_b^2)$, computed numerically with base-2 logarithms.

Returns a value in $[0, 1]$, where $0$ means identical, $1$ means maximally different.
Uses a finite window $[L, R]$ with L = min(μa - kσa, μb - kσb), R = max(μa + kσa, μb + kσb).
Integrates on a uniform grid of n points. Larger n or k increases accuracy.

Notes

σ_floor avoids degeneracy when $\sigma \approx 0$ (interprets inputs as $1\sigma$ standard deviation).
No dependencies on external packages.

source