kenchi.datasets.samples_generator.make_blobs(centers=5, center_box=(-10.0, 10.0), cluster_std=1.0, contamination=0.02, n_features=25, n_samples=500, random_state=None, shuffle=True)[source]

Generate isotropic Gaussian blobs with outliers.

Parameters:
  • centers (int or array-like of shape (n_centers, n_features), default 5) – Number of centers to generate, or the fixed center locations.
  • center_box (pair of floats (min, max), default (-10.0, 10.0)) – Bounding box for each cluster center when centers are generated at random.
  • cluster_std (float or array-like of shape (n_centers,), default 1.0) – Standard deviation of the clusters.
  • contamination (float, default 0.02) – Proportion of outliers in the data set.
  • n_features (int, default 25) – Number of features for each sample.
  • n_samples (int, default 500) – Number of samples.
  • random_state (int, RandomState instance, default None) – Seed of the pseudo random number generator.
  • shuffle (bool, default True) – If True, shuffle samples.
Returns:

  • X (array-like of shape (n_samples, n_features)) – Generated data.
  • y (array-like of shape (n_samples,)) – Return -1 for outliers and +1 for inliers.

References

[1]Kriegel, H.-P., Schubert, M., and Zimek, A., “Angle-based outlier detection in high-dimensional data,” In Proceedings of SIGKDD, pp. 444-452, 2008.
[2]Sugiyama, M., and Borgwardt, K., “Rapid distance-based outlier detection via sampling,” Advances in NIPS, pp. 467-475, 2013.

Examples

>>> from kenchi.datasets import make_blobs
>>> X, y = make_blobs(n_samples=10, n_features=2, contamination=0.1)
>>> X.shape
(10, 2)
>>> y.shape
(10,)