Deborah.Rebekah.Comparison
Deborah.Rebekah.Comparison.bhattacharyya_coeff_normals — Methodbhattacharyya_coeff_normals(
μa::Float64,
σa::Float64,
μb::Float64,
σb::Float64;
σ_floor::Float64=1e-12
) -> Float64Compute the Bhattacharyya coefficient ($\mathrm{BC}$) between two normal distributions $\mathcal{N}(\mu_a, \sigma_a^2)$ and $\mathcal{N}(\mu_b, \sigma_b^2)$.
Range
- $\mathrm{BC} \in [0, 1]$
- $\mathrm{BC} = 1$: complete overlap (identical distributions)
- $\mathrm{BC} = 0$: no overlap in the Bhattacharyya sense
Notes
- Inputs
σa,σbare interpreted as $1 \sigma$ standard deviations. - To avoid degeneracy when $\sigma \approx 0$, both
σaandσbare floored:
\[ \sigma \leftarrow \max(\sigma, \sigma_{\text{floor}})\]
Formula
For two normals,
\[\mathrm{BC}(\mu_a, \sigma_a; \mu_b, \sigma_b) \;=\; \sqrt{ \frac{2\,\sigma_a \sigma_b}{\sigma_a^2 + \sigma_b^2} } \; \exp\!\left( - \frac{(\mu_a - \mu_b)^2}{4(\sigma_a^2 + \sigma_b^2)} \right).\]
Notes
- This function interprets input uncertainties as $1 \sigma$ standard deviations.
- To avoid degeneracy when $\sigma \approx 0$, a small floor
σ_flooris applied.
Deborah.Rebekah.Comparison.build_bhattacharyya_dicts — Methodbuild_bhattacharyya_dicts(
new_dict::Dict{String, Array{Float64,2}},
keys::Vector{Symbol},
pred_tags::Vector{Symbol},
orig_tag::Symbol,
labels::Vector,
trains_ext::Vector;
σ_floor::Float64 = 1e-12,
also_hellinger::Bool = false
) -> Tuple{Dict{Tuple{String,String}, Array{Float64,2}},
Union{Nothing, Dict{Tuple{String,String}, Array{Float64,2}}}}Construct Bhattacharyya-coefficient ($\mathrm{BC}$) matrices — and optionally Hellinger-distance matrices – for each (observable key, prediction tag) pair against a fixed reference tag, following the same naming scheme as build_overlap_and_error_dicts.
Key construction matches your logic:
- If
key == :Deborahpred = string(tag)orig = "Y:" * string(orig_tag)
- Else
pred = string(key) * ":" * string(tag)orig = string(key) * ":Y:" * string(orig_tag)
For each grid point (ilb, itr): -μ_pred = new_dict[pred * ":avg"][ilb, itr]
σ_pred = new_dict[pred * ":err"][ilb, itr]μ_orig = new_dict[orig * ":avg"][ilb, itr]σ_orig = new_dict[orig * ":err"][ilb, itr]
and BC = bhattacharyya_coeff_normals(μ_pred, σ_pred, μ_orig, σ_orig).
Arguments
new_dict: Must contain the 2D arrays for":avg"and":err"of bothpred/orig.keys: Observable keys (e.g.,[:Deborah, :cond, :susp, :skew, :kurt]).pred_tags: Prediction tags (e.g.,[:Y_P1, :Y_P2]).orig_tag: Reference tag (e.g.,:Y_BS).labels: Row axis (LBP-like index).trains_ext: Column axis (TRP-like index).σ_floor: Small $\sigma$ floor to stabilize degenerate cases.also_hellinger: If true, also returns Hellinger matrices.
Returns
bc_dict:Dict[(pred, orig)]$\Rightarrow$ 2DFloat64matrix of $\mathrm{BC}$ in[0,1].H_dict:Dict[...]of Hellinger in[0,1](ifalso_hellingeristrue), elsenothing.
Deborah.Rebekah.Comparison.build_jsd_dicts — Methodbuild_jsd_dicts(
new_dict::Dict{String, Array{Float64,2}},
keys::Vector{Symbol},
pred_tags::Vector{Symbol},
orig_tag::Symbol,
labels::Vector,
trains_ext::Vector;
σ_floor::Float64 = 1e-12,
k::Float64 = 8.0,
n::Int = 1201
) -> Dict{Tuple{String,String}, Array{Float64,2}}Construct Jensen-Shannon divergence ($\mathrm{JSD}$, base-2) matrices for each (observable key, prediction tag) pair against a fixed reference tag, following the same naming scheme as build_overlap_and_error_dicts.
Key construction matches your logic:
- If
key == :Deborahpred = string(tag)orig = "Y:" * string(orig_tag)
- Else
pred = string(key) * ":" * string(tag)orig = string(key) * ":Y:" * string(orig_tag)
For each grid point (ilb, itr):
μ_pred = new_dict[pred * ":avg"][ilb, itr]σ_pred = new_dict[pred * ":err"][ilb, itr]μ_orig = new_dict[orig * ":avg"][ilb, itr]σ_orig = new_dict[orig * ":err"][ilb, itr]
and JSD = jsd_normals(μ_pred, σ_pred, μ_orig, σ_orig; ...).
Arguments
new_dict: Must contain the 2D arrays for":avg"and":err"of bothpred/orig.keys: Observable keys (e.g.,[:Deborah, :cond, :susp, :skew, :kurt]).pred_tags: Prediction tags (e.g.,[:Y_P1, :Y_P2]).orig_tag: Reference tag (e.g.,:Y_BS).labels: Row axis (LBP-like index).trains_ext: Column axis (TRP-like index).σ_floor,k,n: Parameters forwarded tojsd_normals.
Returns
jsd_dict:Dict[(pred, orig)]$\Rightarrow$ 2DFloat64matrix of $\mathrm{JSD}$ in $[0,1]$ (0best).
Deborah.Rebekah.Comparison.build_overlap_and_error_dicts — Methodbuild_overlap_and_error_dicts(
new_dict::Dict{String, Array{Float64,2}},
keys::Vector{Symbol},
pred_tags::Vector{Symbol},
orig_tag::Symbol,
labels::Vector,
trains_ext::Vector
) -> Tuple{Dict{Tuple{String,String}, Array{Int,2}}, Dict{Tuple{String,String}, Array{Float64,2}}}Construct dictionaries containing overlap flags and error ratios between prediction and reference observables.
This function iterates over all combinations of observable keys and prediction tags to compare against a fixed reference. For each pair, it computes:
- An overlap matrix (
0,1,2) viacheck_overlap - An error ratio matrix via
err_ratio
The results are returned as two dictionaries keyed by (pred, orig) string pairs.
Arguments
new_dict: Dictionary of 2D arrays containing average and error values for all observables.keys: Observable types (e.g.,:Deborah,:TrM1, etc.).pred_tags: Prediction tags (e.g.,:Y_P1,:Y_P2).orig_tag: Tag to use for the reference data (usually:Y_BS).labels: Labeled set index vector for the vertical axis (LBP).trains_ext: Training set index vector for the horizontal axis (TRP).
Returns
- A tuple of two dictionaries:
chk_dict: Maps(pred, orig)→ overlap matrix(Int)err_dict: Maps(pred, orig)→ error ratio matrix(Float64)
Deborah.Rebekah.Comparison.check_overlap — Methodcheck_overlap(
a::Float64,
ea::Float64,
b::Float64,
eb::Float64
) -> IntCheck whether two values $\mu_a ± \sigma_a$ and $\mu_b ± \sigma_b$ statistically overlap.
- Returns
2if both intervals mutually include each other. - Returns
1if only one direction overlaps. - Returns
0if there is no overlap at all.
Arguments
a::Float64: Central value of the first measurement.ea::Float64: Uncertainty (error bar) of the first value.b::Float64: Central value of the second measurement.eb::Float64: Uncertainty (error bar) of the second value.
Returns
Int: Overlap status0: no overlap1: one-sided overlap2: mutual overlap
Deborah.Rebekah.Comparison.check_overlap_type_b — Methodcheck_overlap_type_b(
μa::Float64,
σa::Float64,
μb::Float64;
σ_floor::Float64=1e-12
) -> Float64Return a normalized separation between two central values using the uncertainty of a:
\[d \;\equiv\; \frac{|\mu_a - \mu_b|}{\max(\sigma_a, \sigma_{\text{floor}})}.\]
Interpretation
d = 0: identical central values.d = 1:μbis one-σ away fromμa(measured relative toσa).- Larger
dmeans farther separation in units ofσa.
Notes
- This is intentionally asymmetric: it measures distance relative to
σaonly. - A small floor
σ_flooravoids division by (near) zero.
See also
bhattacharyya_coeff_normals— overlap proxy using both variances.check_overlap— interval-overlap classifier (0/1/2).
Deborah.Rebekah.Comparison.err_ratio — Methoderr_ratio(
mles_err::Float64,
orig_err::Float64
) -> Float64Compute the relative error ratio between machine-learned and original estimates.
This function returns the ratio of $\sigma_{\text{MLES}} / \sigma_{\text{ORIG}}$, where $\sigma_{\text{MLES}}$ is the estimated error from a machine-learned method, and $\sigma_{\text{ORIG}}$ is the baseline error.
Arguments
mles_err::Float64: Machine-learned error estimate.orig_err::Float64: Original (baseline) error estimate.
Returns
- A
Float64representing the relative error ratio.
Deborah.Rebekah.Comparison.hellinger_from_bc — Methodhellinger_from_bc(bc::Float64) -> Float64Convert Bhattacharyya coefficient into Hellinger distance.
Range
- $H \in [0,1]$
- $H = 0$: best (identical distributions)
- $H = 1$: worst (maximal separation)
Formula
Given a Bhattacharyya coefficient ($\mathrm{BC}$),
\[H = \sqrt{\,1 - \mathrm{BC}\,}.\]
Here, ($\mathrm{BC}$) is clamped into ($[0,1]$) before evaluation.
Deborah.Rebekah.Comparison.jsd_normals — Methodjsd_normals(
μa::Float64,
σa::Float64,
μb::Float64,
σb::Float64;
σ_floor::Float64 = 1e-12,
k::Float64 = 8.0,
n::Int = 2001
) -> Float64Compute the Jensen-Shannon divergence ($\mathrm{JSD}$), base-2, between two univariate normal distributions $\mathcal{N}(\mu_a,\sigma_a^2)$ and $\mathcal{N}(\mu_b,\sigma_b^2)$ numerically.
Range
- $\mathrm{JSD}_2 \in [0,1]$
- $\mathrm{JSD}_2 = 0 \Leftrightarrow$ identical distributions
- $\mathrm{JSD}_2 = 1 \Leftrightarrow$ maximally different (in the $\mathrm{JSD}$ sense, base-2)
Definition (base-2)
For densities $p,q$ and the mixture $m=\tfrac12(p+q)$,
\[\mathrm{JSD}_2(P\|Q) \;=\; \frac12 \int_{\mathbb{R}} p(x)\,\log_2\!\frac{p(x)}{m(x)}\,dx \;+\; \frac12 \int_{\mathbb{R}} q(x)\,\log_2\!\frac{q(x)}{m(x)}\,dx.\]
Specialization to normals
Let $p = \mathcal{N}(\mu_a,\sigma_a^2)$, $q=\mathcal{N}(\mu_b,\sigma_b^2)$. There is no simple closed-form for $\mathrm{JSD}$ between two general Gaussians, so this routine computes it by numerical quadrature over a finite window that covers both tails.
Integration window
We integrate on
\[[L, R] \;=\; \Big[\min(\mu_a - k\,\sigma_a^{\ast},\, \mu_b - k\,\sigma_b^{\ast}),\; \max(\mu_a + k\,\sigma_a^{\ast},\, \mu_b + k\,\sigma_b^{\ast})\Big],\]
where $\sigma^{\ast} = \max(\sigma,\sigma_{\text{floor}})$. The hyperparameter $k$ controls tail coverage (default $k=8$).
Uniform grid approximation
Using n evenly spaced points $x_1,\dots,x_n$ on $[L,R]$ with spacing $\Delta x$,
\[\widehat{\mathrm{JSD}}_2 \;=\; \Delta x \sum_{i=1}^{n} \frac12\Big(p(x_i)\,\log_2\!\frac{p(x_i)}{m(x_i)} + q(x_i)\,\log_2\!\frac{q(x_i)}{m(x_i)}\Big), \qquad m(x_i)=\tfrac12\big(p(x_i)+q(x_i)\big).\]
Here $p(x)$ and $q(x)$ are the normal p.d.f.'s:
\[\phi(x;\mu,\sigma) =\frac{1}{\sqrt{2\pi}\,\sigma}\exp\!\Big(-\tfrac12\big(\tfrac{x-\mu}{\sigma}\big)^2\Big), \quad \sigma \leftarrow \max(\sigma, \sigma_{\text{floor}}).\]
Numerical stability & safeguards
- Sigma floor: $\sigma \leftarrow \max(\sigma,\sigma_{\text{floor}})$ prevents degeneracy as $\sigma\to 0$.
- Window fallback: if $R\le L$ due to extreme inputs, a tiny symmetric window around $\tfrac{\mu_a+\mu_b}{2}$ is used.
- Clamping: the final result is clamped into $[0,1]$ to absorb floating-point round-off.
Parameters
μa::Float64, σa::Float64, μb::Float64, σb::Float64— normal means and $1\sigma$ standard deviations.σ_floor::Float64=1e-12— lower bound for standard deviations.k::Float64=8.0— tail coverage in units of $\sigma$.n::Int=2001— number of grid points (increase for accuracy, at higher cost).
Returns
Float64— $\widehat{\mathrm{JSD}}_2 \in [0,1]$, base-2.
Complexity
- Time: $\mathcal{O}(n)$ evaluations of pdf and logs.
- Memory: $\mathcal{O}(1)$ besides the grid iterator.
Jensen-Shannon divergence ($\mathrm{JSD}$) between two univariate normal distributions $N(\mu_a, \sigma_a^2)$ and $N(\mu_b, \sigma_b^2)$, computed numerically with base-2 logarithms.
- Returns a value in $[0, 1]$, where $0$ means identical, $1$ means maximally different.
- Uses a finite window $[L, R]$ with
L = min(μa - kσa, μb - kσb),R = max(μa + kσa, μb + kσb). - Integrates on a uniform grid of
npoints. Largernorkincreases accuracy.
Notes
σ_flooravoids degeneracy when $\sigma \approx 0$ (interprets inputs as $1\sigma$ standard deviation).- No dependencies on external packages.