Deborah.Rahab.HistogramOrigML
Deborah.Rahab.HistogramOrigML.plot_histogram_orig_vs_ml — Methodplot_histogram_orig_vs_ml(
trace_data::Dict{String, Vector{Vector{Float64}}},
trace_idx::Int,
nbins::Int,
overall_name::String;
subset::AbstractString="all",
x_min::Union{Nothing, Real}=nothing,
x_max::Union{Nothing, Real}=nothing,
print_clipping::Bool=true,
clipping_report_prefix::AbstractString="",
outfile::AbstractString="histogram_bins.dat",
include_out_of_range_counts_in_file::Bool=true,
save_file::Bool=false,
plot_dir::AbstractString=""
) -> NothingPlot histograms comparing original vs. ML-predicted trace data for a given observable index, optionally clip the plotting range, optionally save the histogram plot as a cropped PDF (with a heatmap-style filename convention), and always save binned counts to a .dat file.
This function compares two datasets (OG vs. ML) selected by subset, renders overlaid histograms on a common bin grid, and writes the bin counts to outfile. If x_min and/or x_max are provided, values outside the chosen range are discarded (clipped) before histogramming; discarded counts are reported (optionally) and can be written to the .dat output. If save_file=true, the histogram figure is saved to PDF (and cropped via pdfcrop when available) using a rule-based basename that includes subset and overall_name.
Arguments
trace_data: Dictionary containing vectors of traces, keyed by:"Y_tr": Training set values,"Y_bc": Bias-correction set values,"Y_ul": Unlabeled set values (original),"YP_ul": Unlabeled set values (ML-predicted),"YP_bc": Bias-correction set values (ML-predicted) (required whensubset="bc").
trace_idx: Index of the observable in eachtrace_data[key]entry. For example, iftrace_idxcorresponds to $\mathrm{Tr}\,M^{-n}$, thentrace_idx=1,2,3,4typically represent different inverse-trace powers or related observables in your pipeline.nbins: Number of histogram bins.overall_name: Suffix used to construct the output histogram plot filename whensave_file=true, analogous to the heatmap naming convention used elsewhere. (Does not affectoutfile.)
Keyword Arguments
subset: Select which subset to compare. Allowed values (case-insensitive):"all": (default) Use the original combined behavior:- OG =
vcat(Y_tr, Y_bc, Y_ul) - ML =
vcat(Y_tr, Y_bc, YP_ul)
- OG =
"tr": Training-only comparison:- OG =
Y_tr - ML =
Y_tr
- OG =
"bc": Bias-correction-only comparison:- OG =
Y_bc - ML =
YP_bc
trace_data["YP_bc"]to exist.)- OG =
"ul": Unlabeled-only comparison:- OG =
Y_ul - ML =
YP_ul
- OG =
x_min: If provided, forces the histogram minimum $x$-range (values below are discarded).x_max: If provided, forces the histogram maximum $x$-range (values above are discarded).print_clipping: Iftrue, prints a short report of discarded points due to clipping.clipping_report_prefix: Optional prefix prepended to clipping report lines (useful in batch logs).outfile: Output filename for the tab-delimited bin-count table. This.datfile is always written.include_out_of_range_counts_in_file: Iftrue, appends two extra columns (OG_oob,ML_oob) holding the number of discarded points for each dataset (same value repeated per row for convenience).save_file: Iftrue, saves the histogram plot as a PDF using a rule-based filename and (if available) runspdfcropto produce a cropped PDF.plot_dir: Output directory for the histogram plot PDF whensave_file=true. If empty, the current directory (".") is used.
Behavior
- First, selects the datasets to compare based on
subset:subset="all":- OG =
vcat(Y_tr, Y_bc, Y_ul) - ML =
vcat(Y_tr, Y_bc, YP_ul)
- OG =
subset="tr":- OG =
Y_tr - ML =
Y_tr
- OG =
subset="bc":- OG =
Y_bc - ML =
YP_bc
- OG =
subset="ul":- OG =
Y_ul - ML =
YP_ul
- OG =
- Determines a common plotting/binning range:
- If
x_min/x_maxare bothnothing, uses data-driven min/max from both datasets. - Otherwise uses the user-specified bound(s) and fills any missing side from the data-driven bound.
- If
- If
x_minand/orx_maxare provided, values outside[final_min, final_max]are discarded before histogramming. The number of discarded points is computed separately for OG and ML, printed whenprint_clipping=true, and optionally written to the.datfile (seeinclude_out_of_range_counts_in_file). - Renders overlaid histograms using a shared
bin_edgesgrid. Legend labels are adjusted bysubset:subset="all": legend showsOGandML.- Otherwise: legend shows
OG-<SUBSET>andML-<SUBSET>(e.g.,OG-UL,ML-UL).
- If
save_file=true, saves the histogram plot as a cropped PDF (whenpdfcropis available) using a heatmap-style basename convention:- If
trace_idx == 1:histogram_pbp_<subset>_<overall_name>.pdf - Else:
histogram_trdinv<trace_idx>_<subset>_<overall_name>.pdf
plot_dir(or"."ifplot_diris empty). - If
- Always writes the binned histogram counts to
outfileas a tab-delimited text file.
Output
Displays the histogram inline.
If
save_file=true, writes a histogram plot PDF intoplot_dirwith a rule-based filename.Writes
outfilewith columns:Bin Min Max OG ML [OGoob MLoob]
Returns
Nothing(side effects: plot display, optional PDF save, and.datoutput written).