kenchi.outlier_detection.density_based module

class kenchi.outlier_detection.density_based.LOF(algorithm='auto', contamination=0.1, leaf_size=30, metric='minkowski', novelty=False, n_jobs=1, n_neighbors=20, p=2, metric_params=None)[source]

Bases: kenchi.outlier_detection.base.BaseOutlierDetector

Local Outlier Factor.

Parameters:
  • algorithm (str, default 'auto') – Tree algorithm to use. Valid algorithms are [‘kd_tree’|’ball_tree’|’auto’].
  • contamination (float, default 0.1) – Proportion of outliers in the data set. Used to define the threshold.
  • leaf_size (int, default 30) – Leaf size of the underlying tree.
  • metric (str or callable, default 'minkowski') – Distance metric to use.
  • novelty (bool, default False) – If True, you can use predict, decision_function and anomaly_score on new unseen data and not on the training data.
  • n_jobs (int, default 1) – Number of jobs to run in parallel. If -1, then the number of jobs is set to the number of CPU cores.
  • n_neighbors (int, default 20) – Number of neighbors.
  • p (int, default 2) – Power parameter for the Minkowski metric.
  • metric_params (dict, default None) – Additioal parameters passed to the requested metric.
anomaly_score_

array-like of shape (n_samples,) – Anomaly score for each training data.

threshold_

float – Threshold.

negative_outlier_factor_

array-like of shape (n_samples,) – Opposite LOF of the training samples.

n_neighbors_

int – Actual number of neighbors used for kneighbors queries.

X_

array-like of shape (n_samples, n_features) – Training data.

References

[1]Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J., “LOF: identifying density-based local outliers,” In ACM sigmod record, pp. 93-104, 2000.
[2]Kriegel, H.-P., Kroger, P., Schubert E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM‘11, pp. 13-24, 2011.
X_
n_neighbors_
negative_outlier_factor_