-
class
kenchi.outlier_detection.clustering_based.
MiniBatchKMeans
(batch_size=100, contamination=0.1, init='k-means++', init_size=None, max_iter=100, max_no_improvement=10, n_clusters=8, n_init=3, random_state=None, reassignment_ratio=0.01, tol=0.0)[source]¶ Bases:
kenchi.outlier_detection.base.BaseOutlierDetector
Outlier detector using K-means clustering.
Parameters: - batch_size (int, optional, default 100) – Size of the mini batches.
- contamination (float, default 0.1) – Proportion of outliers in the data set. Used to define the threshold.
- init (str or array-like, default 'k-means++') – Method for initialization. Valid options are [‘k-means++’|’random’].
- init_size (int, default: 3 * batch_size) – Number of samples to randomly sample for speeding up the initialization.
- max_iter (int, default 100) – Maximum number of iterations.
- max_no_improvement (int, default 10) – Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia. To disable convergence detection based on inertia, set max_no_improvement to None.
- n_clusters (int, default 8) – Number of clusters.
- n_init (int, default 3) – Number of initializations to perform.
- random_state (int or RandomState instance, default None) – Seed of the pseudo random number generator.
- reassignment_ratio (float, default 0.01) – Control the fraction of the maximum number of counts for a center to be reassigned.
- tol (float, default 0.0) – Tolerance to declare convergence.
-
anomaly_score_
¶ array-like of shape (n_samples,) – Anomaly score for each training data.
-
contamination_
¶ float – Actual proportion of outliers in the data set.
-
threshold_
¶ float – Threshold.
Examples
>>> import numpy as np >>> from kenchi.outlier_detection import MiniBatchKMeans >>> X = np.array([ ... [0., 0.], [1., 1.], [2., 0.], [3., -1.], [4., 0.], ... [5., 1.], [6., 0.], [7., -1.], [8., 0.], [1000., 1.] ... ]) >>> det = MiniBatchKMeans(n_clusters=1, random_state=0) >>> det.fit_predict(X) array([ 1, 1, 1, 1, 1, 1, 1, 1, 1, -1])
-
cluster_centers_
¶ array-like of shape (n_clusters, n_features) – Coordinates of cluster centers.
-
inertia_
¶ float – Value of the inertia criterion associated with the chosen partition.
-
labels_
¶ array-like of shape (n_samples,) – Label of each point.