kenchi.outlier_detection.clustering_based module¶
-
class
kenchi.outlier_detection.clustering_based.
MiniBatchKMeans
(batch_size=100, contamination=0.1, init='k-means++', init_size=None, max_iter=100, max_no_improvement=10, n_clusters=8, n_init=3, random_state=None, reassignment_ratio=0.01, tol=0.0)[source]¶ Bases:
kenchi.outlier_detection.base.BaseOutlierDetector
Outlier detector using K-means clustering.
Parameters: - batch_size (int, optional, default 100) – Size of the mini batches.
- contamination (float, default 0.1) – Proportion of outliers in the data set. Used to define the threshold.
- init (str or array-like, default 'k-means++') – Method for initialization. Valid options are [‘k-means++’|’random’].
- init_size (int, default: 3 * batch_size) – Number of samples to randomly sample for speeding up the initialization.
- max_iter (int, default 100) – Maximum number of iterations.
- max_no_improvement (int, default 10) – Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia. To disable convergence detection based on inertia, set max_no_improvement to None.
- n_clusters (int, default 8) – Number of clusters.
- n_init (int, default 3) – Number of initializations to perform.
- random_state (int or RandomState instance, default None) – Seed of the pseudo random number generator.
- reassignment_ratio (float, default 0.01) – Control the fraction of the maximum number of counts for a center to be reassigned.
- tol (float, default 0.0) – Tolerance to declare convergence.
-
anomaly_score_
¶ array-like of shape (n_samples,) – Anomaly score for each training data.
-
threshold_
¶ float – Threshold.
-
cluster_centers_
¶ array-like of shape (n_clusters, n_features) – Coordinates of cluster centers.
-
inertia_
¶ float – Value of the inertia criterion associated with the chosen partition.
-
labels_
¶ array-like of shape (n_samples,) – Label of each point.
-
cluster_centers_
-
inertia_
-
labels_