kenchi.datasets.base.load_pendigits(random_state=None, return_X_y=False, subset='kriegel11')[source]

Load and return the pendigits dataset.

Kriegel’s structure (subset=’kriegel11’) :

anomalous class class 4
n_samples 9868
n_outliers 20
n_features 16
contamination 0.002

Goldstein’s global structure (subset=’goldstein12-global’) :

anomalous class classes 0, 1, 2, 3, 4, 5, 6, 7, 9
n_samples 809
n_outliers 90
n_features 16
contamination 0.111

Goldstein’s local structure (subset=’goldstein12-local’) :

anomalous class class 4
n_samples 6724
n_outliers 10
n_features 16
contamination 0.001
Parameters:
  • random_state (int, RandomState instance, default None) – Seed of the pseudo random number generator.
  • return_X_y (bool, default False) – If True, return (data, target) instead of a Bunch object.
  • subset (str, default 'kriegel11') – Specify the structure. Valid options are [‘goldstein12-global’|’goldstein12-local’|’kriegel11’].
Returns:

data – Dictionary-like object.

Return type:

Bunch

References

[1]Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017.
[2]Goldstein, M., and Dengel, A., “Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm,” KI: Poster and Demo Track, pp. 59-63, 2012.
[3]Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM, pp. 13-24, 2011.

Examples

>>> from kenchi.datasets import load_pendigits
>>> pendigits = load_pendigits(subset='kriegel11')
>>> pendigits.data.shape
(9868, 16)
>>> pendigits = load_pendigits(subset='goldstein12-global')
>>> pendigits.data.shape
(809, 16)
>>> pendigits = load_pendigits(subset='goldstein12-local')
>>> pendigits.data.shape
(6724, 16)
kenchi.datasets.base.load_pima(return_X_y=False)[source]

Load and return the Pima Indians diabetes dataset.

anomalous class class 1
n_samples 768
n_outliers 268
n_features 8
contamination 0.349
Parameters:return_X_y (bool, default False) – If True, return (data, target) instead of a Bunch object.
Returns:data – Dictionary-like object.
Return type:Bunch

References

[4]Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017.
[5]Goix, N., “How to evaluate the quality of unsupervised anomaly detection algorithms?” In ICML Anomaly Detection Workshop, 2016.
[6]Liu, F. T., Ting, K. M., and Zhou, Z.-H., “Isolation forest,” In Proceedings of ICDM, pp. 413-422, 2008.
[7]Sugiyama, M., and Borgwardt, K., “Rapid distance-based outlier detection via sampling,” Advances in NIPS, pp. 467-475, 2013.

Examples

>>> from kenchi.datasets import load_pima
>>> pima = load_pima()
>>> pima.data.shape
(768, 8)
kenchi.datasets.base.load_wdbc(random_state=None, return_X_y=False, subset='kriegel11')[source]

Load and return the breast cancer Wisconsin dataset.

Goldstein’s structure (subset=’goldstein12’) :

anomalous class malignant
n_samples 367
n_outliers 10
n_features 30
contamination 0.027

Kriegel’s structure (subset=’kriegel11’) :

anomalous class malignant
n_samples 367
n_outliers 10
n_features 30
contamination 0.027

Sugiyama’s structure (subset=’sugiyama13’) :

anomalous class malignant
n_samples 569
n_outliers 212
n_features 30
contamination 0.373
Parameters:
  • random_state (int, RandomState instance, default None) – Seed of the pseudo random number generator.
  • return_X_y (bool, default False) – If True, return (data, target) instead of a Bunch object.
  • subset (str, default 'kriegel11') – Specify the structure. Valid options are [‘goldstein12’|’kriegel11’|’sugiyama13’].
Returns:

data – Dictionary-like object.

Return type:

Bunch

References

[8]Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017.
[9]Goldstein, M., and Dengel, A., “Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm,” KI: Poster and Demo Track, pp. 59-63, 2012.
[10]Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM, pp. 13-24, 2011.
[11]Sugiyama, M., and Borgwardt, K., “Rapid distance-based outlier detection via sampling,” Advances in NIPS, pp. 467-475, 2013.

Examples

>>> from kenchi.datasets import load_wdbc
>>> wdbc = load_wdbc(subset='goldstein12')
>>> wdbc.data.shape
(367, 30)
>>> wdbc = load_wdbc(subset='kriegel11')
>>> wdbc.data.shape
(367, 30)
>>> wdbc = load_wdbc(subset='sugiyama13')
>>> wdbc.data.shape
(569, 30)
kenchi.datasets.base.load_wilt(return_X_y=False)[source]

Load and return the wilt dataset.

anomalous class class ‘w’
n_samples 4839
n_outliers 261
n_features 5
contamination 0.053
Parameters:return_X_y (bool, default False) – If True, return (data, target) instead of a Bunch object.
Returns:data – Dictionary-like object.
Return type:Bunch

References

[12]Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017.
[13]Goix, N., “How to evaluate the quality of unsupervised anomaly detection algorithms?” In ICML Anomaly Detection Workshop, 2016.

Examples

>>> from kenchi.datasets import load_wilt
>>> wilt = load_wilt()
>>> wilt.data.shape
(4839, 5)