class kenchi.outlier_detection.base.BaseOutlierDetector[source]

Bases: sklearn.base.BaseEstimator, abc.ABC

Base class for all outlier detectors in kenchi.

References

[1]Kriegel, H.-P., Kroger, P., Schubert, E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM, pp. 13-24, 2011.
anomaly_score(X=None, normalize=False)[source]

Compute the anomaly score for each sample.

Parameters:
  • X (array-like of shape (n_samples, n_features), default None) – Data. If None, compute the anomaly score for each training sample.
  • normalize (bool, default False) – If True, return the normalized anomaly score.
Returns:

anomaly_score – Anomaly score for each sample.

Return type:

array-like of shape (n_samples,)

decision_function(X=None, threshold=None)[source]

Compute the decision function of the given samples.

Parameters:
  • X (array-like of shape (n_samples, n_features), default None) – Data. If None, compute the decision function of the given training samples.
  • threshold (float, default None) – User-provided threshold.
Returns:

shiftted_score_samples – Shifted opposite of the anomaly score for each sample. Negative scores represent outliers and positive scores represent inliers.

Return type:

array-like of shape (n_samples,)

fit(X, y=None)[source]

Fit the model according to the given training data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.
  • y (ignored) –
Returns:

self – Return self.

Return type:

object

fit_predict(X, y=None)[source]

Fit the model according to the given training data and predict if a particular training sample is an outlier or not.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training Data.
  • y (ignored) –
Returns:

y_pred – Return -1 for outliers and +1 for inliers.

Return type:

array-like of shape (n_samples,)

plot_anomaly_score(X=None, normalize=False, **kwargs)[source]

Plot the anomaly score for each sample.

Parameters:
  • X (array-like of shape (n_samples, n_features), default None) – Data. If None, plot the anomaly score for each training samples.
  • normalize (bool, default False) – If True, plot the normalized anomaly score.
  • ax (matplotlib Axes, default None) – Target axes instance.
  • bins (int, str or array-like, default 'auto') – Number of hist bins.
  • figsize (tuple, default None) – Tuple denoting figure size of the plot.
  • filename (str, default None) – If provided, save the current figure.
  • hist (bool, default True) – If True, plot a histogram of anomaly scores.
  • kde (bool, default True) – If True, plot a gaussian kernel density estimate.
  • title (string, default None) – Axes title. To disable, pass None.
  • xlabel (string, default 'Samples') – X axis title label. To disable, pass None.
  • xlim (tuple, default None) – Tuple passed to ax.xlim.
  • ylabel (string, default 'Anomaly score') – Y axis title label. To disable, pass None.
  • ylim (tuple, default None) – Tuple passed to ax.ylim.
  • **kwargs (dict) – Other keywords passed to ax.plot.
Returns:

ax – Axes on which the plot was drawn.

Return type:

matplotlib Axes

plot_roc_curve(X, y, **kwargs)[source]

Plot the Receiver Operating Characteristic (ROC) curve.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data.
  • y (array-like of shape (n_samples,)) – Labels.
  • ax (matplotlib Axes, default None) – Target axes instance.
  • figsize (tuple, default None) – Tuple denoting figure size of the plot.
  • filename (str, default None) – If provided, save the current figure.
  • title (string, default 'ROC curve') – Axes title. To disable, pass None.
  • xlabel (string, default 'FPR') – X axis title label. To disable, pass None.
  • ylabel (string, default 'TPR') – Y axis title label. To disable, pass None.
  • **kwargs (dict) – Other keywords passed to ax.plot.
Returns:

ax – Axes on which the plot was drawn.

Return type:

matplotlib Axes

predict(X=None, threshold=None)[source]

Predict if a particular sample is an outlier or not.

Parameters:
  • X (array-like of shape (n_samples, n_features), default None) – Data. If None, predict if a particular training sample is an outlier or not.
  • threshold (float, default None) – User-provided threshold.
Returns:

y_pred – Return -1 for outliers and +1 for inliers.

Return type:

array-like of shape (n_samples,)

predict_proba(X=None)[source]

Predict class probabilities for each sample.

Parameters:X (array-like of shape (n_samples, n_features), default None) – Data. If None, predict if a particular training sample is an outlier or not.
Returns:y_score – Class probabilities.
Return type:array-like of shape (n_samples, n_classes)
score_samples(X=None)[source]

Compute the opposite of the anomaly score for each sample.

Parameters:X (array-like of shape (n_samples, n_features), default None) – Data. If None, compute the opposite of the anomaly score for each training sample.
Returns:score_samples – Opposite of the anomaly score for each sample.
Return type:array-like of shape (n_samples,)
to_pickle(filename, **kwargs)[source]

Persist an outlier detector object.

Parameters:
  • filename (str or pathlib.Path) – Path of the file in which it is to be stored.
  • kwargs (dict) – Other keywords passed to sklearn.externals.joblib.dump.
Returns:

filenames – List of file names in which the data is stored.

Return type:

list