-
kenchi.datasets.base.
load_pendigits
(random_state=None, return_X_y=False, subset='kriegel11')[source]¶ Load and return the pendigits dataset.
Kriegel’s structure (subset=’kriegel11’) :
anomalous class class 4 n_samples 9868 n_outliers 20 n_features 16 contamination 0.002 Goldstein’s global structure (subset=’goldstein12-global’) :
anomalous class classes 0, 1, 2, 3, 4, 5, 6, 7, 9 n_samples 809 n_outliers 90 n_features 16 contamination 0.111 Goldstein’s local structure (subset=’goldstein12-local’) :
anomalous class class 4 n_samples 6724 n_outliers 10 n_features 16 contamination 0.001 Parameters: - random_state (int, RandomState instance, default None) – Seed of the pseudo random number generator.
- return_X_y (bool, default False) – If True, return
(data, target)
instead of a Bunch object. - subset (str, default 'kriegel11') – Specify the structure. Valid options are [‘goldstein12-global’|’goldstein12-local’|’kriegel11’].
Returns: data – Dictionary-like object.
Return type: Bunch
References
[1] Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017. [2] Goldstein, M., and Dengel, A., “Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm,” KI: Poster and Demo Track, pp. 59-63, 2012. [3] Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM, pp. 13-24, 2011. Examples
>>> from kenchi.datasets import load_pendigits >>> pendigits = load_pendigits(subset='kriegel11') >>> pendigits.data.shape (9868, 16) >>> pendigits = load_pendigits(subset='goldstein12-global') >>> pendigits.data.shape (809, 16) >>> pendigits = load_pendigits(subset='goldstein12-local') >>> pendigits.data.shape (6724, 16)
-
kenchi.datasets.base.
load_pima
(return_X_y=False)[source]¶ Load and return the Pima Indians diabetes dataset.
anomalous class class 1 n_samples 768 n_outliers 268 n_features 8 contamination 0.349 Parameters: return_X_y (bool, default False) – If True, return (data, target)
instead of a Bunch object.Returns: data – Dictionary-like object. Return type: Bunch References
[4] Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017. [5] Goix, N., “How to evaluate the quality of unsupervised anomaly detection algorithms?” In ICML Anomaly Detection Workshop, 2016. [6] Liu, F. T., Ting, K. M., and Zhou, Z.-H., “Isolation forest,” In Proceedings of ICDM, pp. 413-422, 2008. [7] Sugiyama, M., and Borgwardt, K., “Rapid distance-based outlier detection via sampling,” Advances in NIPS, pp. 467-475, 2013. Examples
>>> from kenchi.datasets import load_pima >>> pima = load_pima() >>> pima.data.shape (768, 8)
-
kenchi.datasets.base.
load_wdbc
(random_state=None, return_X_y=False, subset='kriegel11')[source]¶ Load and return the breast cancer Wisconsin dataset.
Goldstein’s structure (subset=’goldstein12’) :
anomalous class malignant n_samples 367 n_outliers 10 n_features 30 contamination 0.027 Kriegel’s structure (subset=’kriegel11’) :
anomalous class malignant n_samples 367 n_outliers 10 n_features 30 contamination 0.027 Sugiyama’s structure (subset=’sugiyama13’) :
anomalous class malignant n_samples 569 n_outliers 212 n_features 30 contamination 0.373 Parameters: - random_state (int, RandomState instance, default None) – Seed of the pseudo random number generator.
- return_X_y (bool, default False) – If True, return
(data, target)
instead of a Bunch object. - subset (str, default 'kriegel11') – Specify the structure. Valid options are [‘goldstein12’|’kriegel11’|’sugiyama13’].
Returns: data – Dictionary-like object.
Return type: Bunch
References
[8] Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017. [9] Goldstein, M., and Dengel, A., “Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm,” KI: Poster and Demo Track, pp. 59-63, 2012. [10] Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM, pp. 13-24, 2011. [11] Sugiyama, M., and Borgwardt, K., “Rapid distance-based outlier detection via sampling,” Advances in NIPS, pp. 467-475, 2013. Examples
>>> from kenchi.datasets import load_wdbc >>> wdbc = load_wdbc(subset='goldstein12') >>> wdbc.data.shape (367, 30) >>> wdbc = load_wdbc(subset='kriegel11') >>> wdbc.data.shape (367, 30) >>> wdbc = load_wdbc(subset='sugiyama13') >>> wdbc.data.shape (569, 30)
-
kenchi.datasets.base.
load_wilt
(return_X_y=False)[source]¶ Load and return the wilt dataset.
anomalous class class ‘w’ n_samples 4839 n_outliers 261 n_features 5 contamination 0.053 Parameters: return_X_y (bool, default False) – If True, return (data, target)
instead of a Bunch object.Returns: data – Dictionary-like object. Return type: Bunch References
[12] Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017. [13] Goix, N., “How to evaluate the quality of unsupervised anomaly detection algorithms?” In ICML Anomaly Detection Workshop, 2016. Examples
>>> from kenchi.datasets import load_wilt >>> wilt = load_wilt() >>> wilt.data.shape (4839, 5)