KLDivergence {datadriftR} | R Documentation |
Kullback-Leibler Divergence (KLD) for Change Detection
Description
Implements the Kullback-Leibler Divergence (KLD) calculation between two probability distributions using histograms. The class can detect drift by comparing the divergence to a predefined threshold.
Details
The Kullback-Leibler Divergence (KLD) is a measure of how one probability distribution diverges from a second, expected probability distribution. This class uses histograms to approximate the distributions and calculates the KLD to detect changes over time. If the divergence exceeds a predefined threshold, it signals a detected drift.
Public fields
epsilon
Value to add to small probabilities to avoid log(0) issues.
base
The base of the logarithm used in KLD calculation.
bins
Number of bins used for the histogram.
drift_level
The threshold for detecting drift.
drift_detected
Boolean indicating if drift has been detected.
p
Initial distribution.
kl_result
The result of the KLD calculation.
Methods
Public methods
Method new()
Initializes the KLDivergence class.
Usage
KLDivergence$new(epsilon = 1e-10, base = exp(1), bins = 10, drift_level = 0.2)
Arguments
epsilon
Value to add to small probabilities to avoid log(0) issues.
base
The base of the logarithm used in KLD calculation.
bins
Number of bins used for the histogram.
drift_level
The threshold for detecting drift.
Method reset()
Resets the internal state of the detector.
Usage
KLDivergence$reset()
Method set_initial_distribution()
Sets the initial distribution.
Usage
KLDivergence$set_initial_distribution(initial_p)
Arguments
initial_p
The initial distribution.
Method add_distribution()
Adds a new distribution and calculates the KLD.
Usage
KLDivergence$add_distribution(q)
Arguments
q
The new distribution.
Method calculate_kld()
Calculates the KLD between two distributions.
Usage
KLDivergence$calculate_kld(p, q)
Arguments
p
The initial distribution.
q
The new distribution.
Returns
The KLD value.
Method get_kl_result()
Returns the current KLD result.
Usage
KLDivergence$get_kl_result()
Returns
The current KLD value.
Method is_drift_detected()
Checks if drift has been detected.
Usage
KLDivergence$is_drift_detected()
Returns
TRUE if drift is detected, otherwise FALSE.
Method clone()
The objects of this class are cloneable with this method.
Usage
KLDivergence$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
References
Kullback, S., and Leibler, R.A. (1951). On Information and Sufficiency. Annals of Mathematical Statistics, 22(1), 79-86.
Examples
set.seed(123) # Setting a seed for reproducibility
initial_data <- c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0)
kld <- KLDivergence$new(bins = 10, drift_level = 0.2)
kld$set_initial_distribution(initial_data)
new_data <- c(0.2, 0.2, 0.3, 0.4, 0.4, 0.5, 0.6, 0.7, 0.7, 0.8)
kld$add_distribution(new_data)
kl_result <- kld$get_kl_result()
message(paste("KL Divergence:", kl_result))
if (kld$is_drift_detected()) {
message("Drift detected.")
}