class kenchi.outlier_detection.ensemble.IForest(bootstrap=False, contamination='auto', max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=1, random_state=None)[source]

Bases: kenchi.outlier_detection.base.BaseOutlierDetector

Isolation forest (iForest).

Parameters:
  • bootstrap (bool, False) – If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed.
  • contamination (float, default 'auto') – Proportion of outliers in the data set. Used to define the threshold.
  • max_features (int or float, default 1.0) – Number of features to draw from X to train each base estimator.
  • max_samples (int ,float or str, default 'auto') – Number of samples to draw from X to train each base estimator.
  • n_estimators (int, default 100) – Number of base estimators in the ensemble.
  • n_jobs (int) – Number of jobs to run in parallel. If -1, then the number of jobs is set to the number of CPU cores.
  • random_state (int or RandomState instance, default None) – Seed of the pseudo random number generator.
anomaly_score_

array-like of shape (n_samples,) – Anomaly score for each training data.

contamination_

float – Actual proportion of outliers in the data set.

threshold_

float – Threshold.

References

[1]Liu, F. T., Ting, K. M., and Zhou, Z.-H., “Isolation forest,” In Proceedings of ICDM, pp. 413-422, 2008.

Examples

>>> import numpy as np
>>> from kenchi.outlier_detection import IForest
>>> X = np.array([
...     [0., 0.], [1., 1.], [2., 0.], [3., -1.], [4., 0.],
...     [5., 1.], [6., 0.], [7., -1.], [8., 0.], [1000., 1.]
... ])
>>> det = IForest(random_state=0)
>>> det.fit_predict(X)
array([ 1,  1,  1,  1,  1,  1,  1,  1,  1, -1])
estimators_

list – Collection of fitted sub-estimators.

estimators_samples_

int – Subset of drawn samples for each base estimator.

max_samples_

int – Actual number of samples.