Implements the Kolmogorov-Smirnov test for detecting distribution changes within a window of streaming data. KSWIN is a non-parametric method for change detection that compares two samples to determine if they come from the same distribution.
Details
KSWIN is effective for detecting changes in the underlying distribution of data streams. It is particularly useful in scenarios where data properties may evolve over time, allowing for early detection of changes that might affect subsequent data processing.
References
Christoph Raab, Moritz Heusinger, Frank-Michael Schleif, Reactive Soft Prototype Computing for Concept Drift Streams, Neurocomputing, 2020.
Implementation: https://github.com/scikit-multiflow/scikit-multiflow/blob/a7e316d1cc79988a6df40da35312e00f6c4eabb2/src/skmultiflow/drift_detection/kswin.py
Public fields
alphaSignificance level for the KS test.
window_sizeTotal size of the data window used for testing.
stat_sizeNumber of data points sampled from the window for the KS test.
windowCurrent data window used for change detection.
change_detectedBoolean flag indicating whether a change has been detected.
p_valueP-value of the most recent KS test.
Methods
Method new()
Initializes the KSWIN detector with specific settings.
Usage
KSWIN$new(alpha = 0.005, window_size = 100, stat_size = 30, data = NULL)Method add_element()
Adds a new element to the data window and updates the detection status based on the KS test.
Examples
set.seed(123)
x <- c(rnorm(100, mean = 0, sd = 1), rnorm(100, mean = 3, sd = 1))
# High-level interface (returns a data.frame of detections)
detect_drift(x, method = "kswin", alpha = 0.001, window_size = 50, stat_size = 20)
#> index value type
#> 1 110 3.918997 drift
# Online usage (update one observation at a time)
kswin <- KSWIN$new(alpha = 0.001, window_size = 50, stat_size = 20)
drift_idx <- integer()
for (i in seq_along(x)) {
kswin$add_element(x[i])
if (kswin$detected_change()) {
drift_idx <- c(drift_idx, i)
}
}
drift_idx
#> [1] 109
