-
class
kenchi.outlier_detection.base.
BaseOutlierDetector
[source]¶ Bases:
sklearn.base.BaseEstimator
,abc.ABC
Base class for all outlier detectors in kenchi.
References
[1] Kriegel, H.-P., Kroger, P., Schubert, E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM, pp. 13-24, 2011. -
anomaly_score
(X=None, normalize=False)[source]¶ Compute the anomaly score for each sample.
Parameters: - X (array-like of shape (n_samples, n_features), default None) – Data. If None, compute the anomaly score for each training sample.
- normalize (bool, default False) – If True, return the normalized anomaly score.
Returns: anomaly_score – Anomaly score for each sample.
Return type: array-like of shape (n_samples,)
-
decision_function
(X=None, threshold=None)[source]¶ Compute the decision function of the given samples.
Parameters: - X (array-like of shape (n_samples, n_features), default None) – Data. If None, compute the decision function of the given training samples.
- threshold (float, default None) – User-provided threshold.
Returns: shiftted_score_samples – Shifted opposite of the anomaly score for each sample. Negative scores represent outliers and positive scores represent inliers.
Return type: array-like of shape (n_samples,)
-
fit
(X, y=None)[source]¶ Fit the model according to the given training data.
Parameters: - X (array-like of shape (n_samples, n_features)) – Training data.
- y (ignored) –
Returns: self – Return self.
Return type: object
-
fit_predict
(X, y=None)[source]¶ Fit the model according to the given training data and predict if a particular training sample is an outlier or not.
Parameters: - X (array-like of shape (n_samples, n_features)) – Training Data.
- y (ignored) –
Returns: y_pred – Return -1 for outliers and +1 for inliers.
Return type: array-like of shape (n_samples,)
-
plot_anomaly_score
(X=None, normalize=False, **kwargs)[source]¶ Plot the anomaly score for each sample.
Parameters: - X (array-like of shape (n_samples, n_features), default None) – Data. If None, plot the anomaly score for each training samples.
- normalize (bool, default False) – If True, plot the normalized anomaly score.
- ax (matplotlib Axes, default None) – Target axes instance.
- bins (int, str or array-like, default 'auto') – Number of hist bins.
- figsize (tuple, default None) – Tuple denoting figure size of the plot.
- filename (str, default None) – If provided, save the current figure.
- hist (bool, default True) – If True, plot a histogram of anomaly scores.
- kde (bool, default True) – If True, plot a gaussian kernel density estimate.
- title (string, default None) – Axes title. To disable, pass None.
- xlabel (string, default 'Samples') – X axis title label. To disable, pass None.
- xlim (tuple, default None) – Tuple passed to
ax.xlim
. - ylabel (string, default 'Anomaly score') – Y axis title label. To disable, pass None.
- ylim (tuple, default None) – Tuple passed to
ax.ylim
. - **kwargs (dict) – Other keywords passed to
ax.plot
.
Returns: ax – Axes on which the plot was drawn.
Return type: matplotlib Axes
-
plot_roc_curve
(X, y, **kwargs)[source]¶ Plot the Receiver Operating Characteristic (ROC) curve.
Parameters: - X (array-like of shape (n_samples, n_features)) – Data.
- y (array-like of shape (n_samples,)) – Labels.
- ax (matplotlib Axes, default None) – Target axes instance.
- figsize (tuple, default None) – Tuple denoting figure size of the plot.
- filename (str, default None) – If provided, save the current figure.
- title (string, default 'ROC curve') – Axes title. To disable, pass None.
- xlabel (string, default 'FPR') – X axis title label. To disable, pass None.
- ylabel (string, default 'TPR') – Y axis title label. To disable, pass None.
- **kwargs (dict) – Other keywords passed to
ax.plot
.
Returns: ax – Axes on which the plot was drawn.
Return type: matplotlib Axes
-
predict
(X=None, threshold=None)[source]¶ Predict if a particular sample is an outlier or not.
Parameters: - X (array-like of shape (n_samples, n_features), default None) – Data. If None, predict if a particular training sample is an outlier or not.
- threshold (float, default None) – User-provided threshold.
Returns: y_pred – Return -1 for outliers and +1 for inliers.
Return type: array-like of shape (n_samples,)
-
predict_proba
(X=None)[source]¶ Predict class probabilities for each sample.
Parameters: X (array-like of shape (n_samples, n_features), default None) – Data. If None, predict if a particular training sample is an outlier or not. Returns: y_score – Class probabilities. Return type: array-like of shape (n_samples, n_classes)
-
score_samples
(X=None)[source]¶ Compute the opposite of the anomaly score for each sample.
Parameters: X (array-like of shape (n_samples, n_features), default None) – Data. If None, compute the opposite of the anomaly score for each training sample. Returns: score_samples – Opposite of the anomaly score for each sample. Return type: array-like of shape (n_samples,)
-
to_pickle
(filename, **kwargs)[source]¶ Persist an outlier detector object.
Parameters: - filename (str or pathlib.Path) – Path of the file in which it is to be stored.
- kwargs (dict) – Other keywords passed to
sklearn.externals.joblib.dump
.
Returns: filenames – List of file names in which the data is stored.
Return type: list
-