Implements the Kullback-Leibler Divergence (KLD) calculation between two probability distributions using histograms. The class can detect drift by comparing the divergence to a predefined threshold.
Details
The Kullback-Leibler Divergence (KLD) is a measure of how one probability distribution diverges from a second, expected probability distribution. This class uses histograms to approximate the distributions and calculates the KLD to detect changes over time. If the divergence exceeds a predefined threshold, it signals a detected drift.
References
Kullback, S., and Leibler, R.A. (1951). On Information and Sufficiency. Annals of Mathematical Statistics, 22(1), 79-86.
Public fields
epsilonValue to add to small probabilities to avoid log(0) issues.
baseThe base of the logarithm used in KLD calculation.
binsNumber of bins used for the histogram.
drift_levelThe threshold for detecting drift.
drift_detectedBoolean indicating if drift has been detected.
pInitial distribution.
kl_resultThe result of the KLD calculation.
Methods
Method new()
Initializes the KLDivergence class.
Usage
KLDivergence$new(epsilon = 1e-10, base = exp(1), bins = 10, drift_level = 0.2)Examples
set.seed(123) # Setting a seed for reproducibility
initial_data <- c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0)
kld <- KLDivergence$new(bins = 10, drift_level = 0.2)
kld$set_initial_distribution(initial_data)
new_data <- c(0.2, 0.2, 0.3, 0.4, 0.4, 0.5, 0.6, 0.7, 0.7, 0.8)
kld$add_distribution(new_data)
kl_result <- kld$get_kl_result()
message(paste("KL Divergence:", kl_result))
#> KL Divergence: 6.00903559691594
if (kld$is_drift_detected()) {
message("Drift detected.")
}
#> Drift detected.
