kenchi.datasets.base module¶

kenchi.datasets.base.load_wdbc(contamination=0.0272, random_state=None, shuffle=True)[source]¶

Load and return the breast cancer wisconsin dataset.

contamination : float, default 0.0272: Proportion of outliers in the data set.
random_state : int, RandomState instance, default None: Seed of the pseudo random number generator.
shuffle : bool, default True: If True, shuffle samples.

Returns:	X (ndarray of shape (n_samples, n_features)) – Data. y (ndarray of shape (n_samples,)) – Return -1 (malignant) for outliers and +1 (benign) for inliers.

References

[1]	Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM‘11, pp. 13-24, 2011.

kenchi.datasets.base.load_pendigits(contamination=0.002, random_state=None, shuffle=True)[source]¶

Load and return the pendigits dataset.

contamination : float, default 0.002: Proportion of outliers in the data set.
random_state : int, RandomState instance, default None: Seed of the pseudo random number generator.
shuffle : bool, default True: If True, shuffle samples.

Returns:	X (ndarray of shape (n_samples, n_features)) – Data. y (ndarray of shape (n_samples,)) – Return -1 (digit 4) for outliers and +1 (otherwise) for inliers.

References

[2]	Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM‘11, pp. 13-24, 2011.