kenchi.outlier_detection.clustering_based module¶

class kenchi.outlier_detection.clustering_based.MiniBatchKMeans(batch_size=100, contamination=0.1, init='k-means++', init_size=None, max_iter=100, max_no_improvement=10, n_clusters=8, n_init=3, random_state=None, reassignment_ratio=0.01, tol=0.0)[source]¶

Bases: kenchi.outlier_detection.base.BaseOutlierDetector

Outlier detector using K-means clustering.

Parameters:

batch_size (int, optional, default 100) – Size of the mini batches.
contamination (float, default 0.1) – Proportion of outliers in the data set. Used to define the threshold.
init (str or array-like, default 'k-means++') – Method for initialization. Valid options are [‘k-means++’|’random’].
init_size (int, default: 3 * batch_size) – Number of samples to randomly sample for speeding up the initialization.
max_iter (int, default 100) – Maximum number of iterations.
max_no_improvement (int, default 10) – Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia. To disable convergence detection based on inertia, set max_no_improvement to None.
n_clusters (int, default 8) – Number of clusters.
n_init (int, default 3) – Number of initializations to perform.
random_state (int or RandomState instance, default None) – Seed of the pseudo random number generator.
reassignment_ratio (float, default 0.01) – Control the fraction of the maximum number of counts for a center to be reassigned.
tol (float, default 0.0) – Tolerance to declare convergence.

anomaly_score_¶: array-like of shape (n_samples,) – Anomaly score for each training data.

threshold_¶: float – Threshold.

cluster_centers_¶: array-like of shape (n_clusters, n_features) – Coordinates of cluster centers.

inertia_¶: float – Value of the inertia criterion associated with the chosen partition.

labels_¶: array-like of shape (n_samples,) – Label of each point.

cluster_centers_

inertia_

labels_

score(X, y=None)[source]¶

Compute the opposite value of the given data on the K-means objective.

Parameters:	X (array-like of shape (n_samples, n_features)) – Data. y (ignored) –
Returns:	score – Opposite value of the given data on the K-means objective.
Return type:	float