<no title>

kenchi.datasets.base.load_pendigits(random_state=None, return_X_y=False, subset='kriegel11')[source]¶

Load and return the pendigits dataset.

Kriegel’s structure (subset=’kriegel11’) :

anomalous class	class 4
n_samples	9868
n_outliers	20
n_features	16
contamination	0.002

Goldstein’s global structure (subset=’goldstein12-global’) :

anomalous class	classes 0, 1, 2, 3, 4, 5, 6, 7, 9
n_samples	809
n_outliers	90
n_features	16
contamination	0.111

Goldstein’s local structure (subset=’goldstein12-local’) :

anomalous class	class 4
n_samples	6724
n_outliers	10
n_features	16
contamination	0.001

Parameters:	random_state (int, RandomState instance, default None) – Seed of the pseudo random number generator. return_X_y (bool, default False) – If True, return `(data, target)` instead of a Bunch object. subset (str, default 'kriegel11') – Specify the structure. Valid options are [‘goldstein12-global’\|’goldstein12-local’\|’kriegel11’].
Returns:	data – Dictionary-like object.
Return type:	Bunch

References

[1]	Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017.

[2]	Goldstein, M., and Dengel, A., “Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm,” KI: Poster and Demo Track, pp. 59-63, 2012.

[3]	Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM, pp. 13-24, 2011.

Examples

>>> from kenchi.datasets import load_pendigits
>>> pendigits = load_pendigits(subset='kriegel11')
>>> pendigits.data.shape
(9868, 16)
>>> pendigits = load_pendigits(subset='goldstein12-global')
>>> pendigits.data.shape
(809, 16)
>>> pendigits = load_pendigits(subset='goldstein12-local')
>>> pendigits.data.shape
(6724, 16)

kenchi.datasets.base.load_pima(return_X_y=False)[source]¶

Load and return the Pima Indians diabetes dataset.

anomalous class	class 1
n_samples	768
n_outliers	268
n_features	8
contamination	0.349

Parameters:	return_X_y (bool, default False) – If True, return `(data, target)` instead of a Bunch object.
Returns:	data – Dictionary-like object.
Return type:	Bunch

References

[4]	Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017.

[5]	Goix, N., “How to evaluate the quality of unsupervised anomaly detection algorithms?” In ICML Anomaly Detection Workshop, 2016.

[6]	Liu, F. T., Ting, K. M., and Zhou, Z.-H., “Isolation forest,” In Proceedings of ICDM, pp. 413-422, 2008.

[7]	Sugiyama, M., and Borgwardt, K., “Rapid distance-based outlier detection via sampling,” Advances in NIPS, pp. 467-475, 2013.

Examples

>>> from kenchi.datasets import load_pima
>>> pima = load_pima()
>>> pima.data.shape
(768, 8)

kenchi.datasets.base.load_wdbc(random_state=None, return_X_y=False, subset='kriegel11')[source]¶

Load and return the breast cancer Wisconsin dataset.

Goldstein’s structure (subset=’goldstein12’) :

anomalous class	malignant
n_samples	367
n_outliers	10
n_features	30
contamination	0.027

Kriegel’s structure (subset=’kriegel11’) :

anomalous class	malignant
n_samples	367
n_outliers	10
n_features	30
contamination	0.027

Sugiyama’s structure (subset=’sugiyama13’) :

anomalous class	malignant
n_samples	569
n_outliers	212
n_features	30
contamination	0.373

Parameters:	random_state (int, RandomState instance, default None) – Seed of the pseudo random number generator. return_X_y (bool, default False) – If True, return `(data, target)` instead of a Bunch object. subset (str, default 'kriegel11') – Specify the structure. Valid options are [‘goldstein12’\|’kriegel11’\|’sugiyama13’].
Returns:	data – Dictionary-like object.
Return type:	Bunch

References

[8]	Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017.

[9]	Goldstein, M., and Dengel, A., “Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm,” KI: Poster and Demo Track, pp. 59-63, 2012.

[10]	Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM, pp. 13-24, 2011.

[11]	Sugiyama, M., and Borgwardt, K., “Rapid distance-based outlier detection via sampling,” Advances in NIPS, pp. 467-475, 2013.

Examples

>>> from kenchi.datasets import load_wdbc
>>> wdbc = load_wdbc(subset='goldstein12')
>>> wdbc.data.shape
(367, 30)
>>> wdbc = load_wdbc(subset='kriegel11')
>>> wdbc.data.shape
(367, 30)
>>> wdbc = load_wdbc(subset='sugiyama13')
>>> wdbc.data.shape
(569, 30)

kenchi.datasets.base.load_wilt(return_X_y=False)[source]¶

Load and return the wilt dataset.

anomalous class	class ‘w’
n_samples	4839
n_outliers	261
n_features	5
contamination	0.053

Parameters:	return_X_y (bool, default False) – If True, return `(data, target)` instead of a Bunch object.
Returns:	data – Dictionary-like object.
Return type:	Bunch

References

[12]	Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017.

[13]	Goix, N., “How to evaluate the quality of unsupervised anomaly detection algorithms?” In ICML Anomaly Detection Workshop, 2016.

Examples

>>> from kenchi.datasets import load_wilt
>>> wilt = load_wilt()
>>> wilt.data.shape
(4839, 5)