kenchi.outlier_detection.reconstruction_based module

class kenchi.outlier_detection.reconstruction_based.PCA(contamination=0.1, iterated_power='auto', n_components=None, random_state=None, svd_solver='auto', tol=0.0, whiten=False)[source]

Bases: kenchi.outlier_detection.base.BaseOutlierDetector

Outlier detector using Principal Component Analysis (PCA).

Parameters:
  • contamination (float, default 0.1) – Proportion of outliers in the data set. Used to define the threshold.
  • iterated_power (int, default 'auto') – Number of iterations for the power method computed by svd_solver == ‘randomized’.
  • n_components (int, float, or string, default None) – Number of components to keep.
  • random_state (int or RandomState instance, default None) – Seed of the pseudo random number generator.
  • svd_solver (string, default 'auto') – SVD solver to use. Valid solvers are [‘auto’|’full’|’arpack’|’randomized’].
  • tol (float, default 0.0) – Tolerance to declare convergence for singular values computed by svd_solver == ‘arpack’.
  • whiten (bool, default False) – When True the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.
anomaly_score_

array-like of shape (n_samples,) – Anomaly score for each training data.

threshold_

float – Threshold.

components_

array-like of shape (n_components, n_features) – Principal axes in feature space, representing the directions of maximum variance in the data.

explained_variance_

array-like of shape (n_components,) – Amount of variance explained by each of the selected components.

explained_variance_ratio_

array-like of shape (n_components,) – Percentage of variance explained by each of the selected components.

mean_

array-like of shape (n_features,) – Per-feature empirical mean, estimated from the training set.

noise_variance_

float – Estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999.

n_components_

int – Estimated number of components.

singular_values_

array-like of shape (n_components,) – Singular values corresponding to each of the selected components.

components_
explained_variance_
explained_variance_ratio_
mean_
n_components_
noise_variance_
score(X, y=None)[source]

Compute the mean log-likelihood of the given data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Data.
  • y (ignored) –
Returns:

score – Mean log-likelihood of the given data.

Return type:

float

singular_values_