-
class
kenchi.outlier_detection.ensemble.
IForest
(bootstrap=False, contamination='auto', max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=1, random_state=None)[source]¶ Bases:
kenchi.outlier_detection.base.BaseOutlierDetector
Isolation forest (iForest).
Parameters: - bootstrap (bool, False) – If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed.
- contamination (float, default 'auto') – Proportion of outliers in the data set. Used to define the threshold.
- max_features (int or float, default 1.0) – Number of features to draw from X to train each base estimator.
- max_samples (int ,float or str, default 'auto') – Number of samples to draw from X to train each base estimator.
- n_estimators (int, default 100) – Number of base estimators in the ensemble.
- n_jobs (int) – Number of jobs to run in parallel. If -1, then the number of jobs is set to the number of CPU cores.
- random_state (int or RandomState instance, default None) – Seed of the pseudo random number generator.
-
anomaly_score_
¶ array-like of shape (n_samples,) – Anomaly score for each training data.
-
contamination_
¶ float – Actual proportion of outliers in the data set.
-
threshold_
¶ float – Threshold.
References
[1] Liu, F. T., Ting, K. M., and Zhou, Z.-H., “Isolation forest,” In Proceedings of ICDM, pp. 413-422, 2008. Examples
>>> import numpy as np >>> from kenchi.outlier_detection import IForest >>> X = np.array([ ... [0., 0.], [1., 1.], [2., 0.], [3., -1.], [4., 0.], ... [5., 1.], [6., 0.], [7., -1.], [8., 0.], [1000., 1.] ... ]) >>> det = IForest(random_state=0) >>> det.fit_predict(X) array([ 1, 1, 1, 1, 1, 1, 1, 1, 1, -1])
-
estimators_
¶ list – Collection of fitted sub-estimators.
-
estimators_samples_
¶ int – Subset of drawn samples for each base estimator.
-
max_samples_
¶ int – Actual number of samples.