kenchi.datasets.base module

kenchi.datasets.base.load_wdbc(contamination=0.0272, random_state=None, shuffle=True)[source]

Load and return the breast cancer wisconsin dataset.

contamination : float, default 0.0272
Proportion of outliers in the data set.
random_state : int, RandomState instance, default None
Seed of the pseudo random number generator.
shuffle : bool, default True
If True, shuffle samples.
Returns:
  • X (ndarray of shape (n_samples, n_features)) – Data.
  • y (ndarray of shape (n_samples,)) – Return -1 (malignant) for outliers and +1 (benign) for inliers.

References

[1]Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM‘11, pp. 13-24, 2011.
kenchi.datasets.base.load_pendigits(contamination=0.002, random_state=None, shuffle=True)[source]

Load and return the pendigits dataset.

contamination : float, default 0.002
Proportion of outliers in the data set.
random_state : int, RandomState instance, default None
Seed of the pseudo random number generator.
shuffle : bool, default True
If True, shuffle samples.
Returns:
  • X (ndarray of shape (n_samples, n_features)) – Data.
  • y (ndarray of shape (n_samples,)) – Return -1 (digit 4) for outliers and +1 (otherwise) for inliers.

References

[2]Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM‘11, pp. 13-24, 2011.