kenchi.outlier_detection.clustering_based module

class kenchi.outlier_detection.clustering_based.MiniBatchKMeans(batch_size=100, contamination=0.1, init='k-means++', init_size=None, max_iter=100, max_no_improvement=10, n_clusters=8, n_init=3, random_state=None, reassignment_ratio=0.01, tol=0.0)[source]

Bases: kenchi.outlier_detection.base.BaseOutlierDetector

Outlier detector using K-means clustering.

Parameters:
  • batch_size (int, optional, default 100) – Size of the mini batches.
  • contamination (float, default 0.1) – Proportion of outliers in the data set. Used to define the threshold.
  • init (str or array-like, default 'k-means++') – Method for initialization. Valid options are [‘k-means++’|’random’].
  • init_size (int, default: 3 * batch_size) – Number of samples to randomly sample for speeding up the initialization.
  • max_iter (int, default 100) – Maximum number of iterations.
  • max_no_improvement (int, default 10) – Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia. To disable convergence detection based on inertia, set max_no_improvement to None.
  • n_clusters (int, default 8) – Number of clusters.
  • n_init (int, default 3) – Number of initializations to perform.
  • random_state (int or RandomState instance, default None) – Seed of the pseudo random number generator.
  • reassignment_ratio (float, default 0.01) – Control the fraction of the maximum number of counts for a center to be reassigned.
  • tol (float, default 0.0) – Tolerance to declare convergence.
anomaly_score_

array-like of shape (n_samples,) – Anomaly score for each training data.

threshold_

float – Threshold.

cluster_centers_

array-like of shape (n_clusters, n_features) – Coordinates of cluster centers.

inertia_

float – Value of the inertia criterion associated with the chosen partition.

labels_

array-like of shape (n_samples,) – Label of each point.

cluster_centers_
inertia_
labels_
score(X, y=None)[source]

Compute the opposite value of the given data on the K-means objective.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data.
  • y (ignored) –
Returns:

score – Opposite value of the given data on the K-means objective.

Return type:

float