This class implements the HDDM_A drift detection method that uses adaptive windows to detect changes in the mean of a data stream. It is designed to monitor online streams of data and can detect increases or decreases in the process mean in a non-parametric and online manner.
Details
HDDM_A adapts to changes in the data stream by adjusting its internal windows to track the minimum and maximum values of the process mean. It triggers alerts when a significant drift from these benchmarks is detected.
References
Frías-Blanco I, del Campo-Ávila J, Ramos-Jimenez G, et al. Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Transactions on Knowledge and Data Engineering, 2014, 27(3): 810-823.
Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer. MOA: Massive Online Analysis; Journal of Machine Learning Research 11: 1601-1604, 2010.
Implementation: https://github.com/scikit-multiflow/scikit-multiflow/blob/a7e316d1cc79988a6df40da35312e00f6c4eabb2/src/skmultiflow/drift_detection/hddm_a.py
Public fields
drift_confidenceConfidence level for detecting a drift.
warning_confidenceConfidence level for warning detection.
two_side_optionBoolean flag for one-sided or two-sided mean monitoring.
total_nTotal number of samples seen.
total_cTotal cumulative sum of the samples.
n_maxMaximum window end for sample count.
c_maxMaximum window end for cumulative sum.
n_minMinimum window start for sample count.
c_minMinimum window start for cumulative sum.
n_estimationNumber of samples since the last detected change.
c_estimationCumulative sum since the last detected change.
change_detectedBoolean indicating if a change was detected.
warning_detectedBoolean indicating if a warning has been detected.
estimationCurrent estimated mean of the stream.
delayCurrent delay since the last update.
Methods
Method new()
Initializes the HDDM_A detector with specific settings.
Usage
HDDM_A$new(
drift_confidence = 0.001,
warning_confidence = 0.005,
two_side_option = TRUE
)Examples
set.seed(123) # Setting a seed for reproducibility
data_part1 <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.7, 0.3))
# Introduce a change in data distribution
data_part2 <- sample(c(0, 1), size = 100, replace = TRUE, prob = c(0.3, 0.7))
# Combine the two parts
data_stream <- c(data_part1, data_part2)
# Initialize the hddm_a object
hddm_a_instance <- HDDM_A$new()
# Iterate through the data stream
for(i in seq_along(data_stream)) {
hddm_a_instance$add_element(data_stream[i])
if(hddm_a_instance$warning_detected) {
message(paste("Warning detected at index:", i))
}
if(hddm_a_instance$change_detected) {
message(paste("Concept drift detected at index:", i))
}
}
#> Warning detected at index: 123
#> Warning detected at index: 124
#> Warning detected at index: 125
#> Warning detected at index: 126
#> Warning detected at index: 127
#> Concept drift detected at index: 128
