kenchi.datasets.base module¶
-
kenchi.datasets.base.
load_wdbc
(contamination=0.0272, random_state=None, shuffle=True)[source]¶ Load and return the breast cancer wisconsin dataset.
- contamination : float, default 0.0272
- Proportion of outliers in the data set.
- random_state : int, RandomState instance, default None
- Seed of the pseudo random number generator.
- shuffle : bool, default True
- If True, shuffle samples.
Returns: - X (ndarray of shape (n_samples, n_features)) – Data.
- y (ndarray of shape (n_samples,)) – Return -1 (malignant) for outliers and +1 (benign) for inliers.
References
[1] Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM‘11, pp. 13-24, 2011.
-
kenchi.datasets.base.
load_pendigits
(contamination=0.002, random_state=None, shuffle=True)[source]¶ Load and return the pendigits dataset.
- contamination : float, default 0.002
- Proportion of outliers in the data set.
- random_state : int, RandomState instance, default None
- Seed of the pseudo random number generator.
- shuffle : bool, default True
- If True, shuffle samples.
Returns: - X (ndarray of shape (n_samples, n_features)) – Data.
- y (ndarray of shape (n_samples,)) – Return -1 (digit 4) for outliers and +1 (otherwise) for inliers.
References
[2] Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM‘11, pp. 13-24, 2011.