class kenchi.outlier_detection.reconstruction_based.PCA(contamination=0.1, iterated_power='auto', n_components=None, random_state=None, svd_solver='auto', tol=0.0, whiten=False)[source]

Bases: kenchi.outlier_detection.base.BaseOutlierDetector

Outlier detector using Principal Component Analysis (PCA).

Parameters:
  • contamination (float, default 0.1) – Proportion of outliers in the data set. Used to define the threshold.
  • iterated_power (int, default 'auto') – Number of iterations for the power method computed by svd_solver == ‘randomized’.
  • n_components (int, float, or string, default None) – Number of components to keep.
  • random_state (int or RandomState instance, default None) – Seed of the pseudo random number generator.
  • svd_solver (string, default 'auto') – SVD solver to use. Valid solvers are [‘auto’|’full’|’arpack’|’randomized’].
  • tol (float, default 0.0) – Tolerance to declare convergence for singular values computed by svd_solver == ‘arpack’.
  • whiten (bool, default False) – If True, the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.
anomaly_score_

array-like of shape (n_samples,) – Anomaly score for each training data.

contamination_

float – Actual proportion of outliers in the data set.

threshold_

float – Threshold.

Examples

>>> import numpy as np
>>> from kenchi.outlier_detection import PCA
>>> X = np.array([
...     [0., 0.], [1., 1.], [2., 0.], [3., -1.], [4., 0.],
...     [5., 1.], [6., 0.], [7., -1.], [8., 0.], [1000., 1.]
... ])
>>> det = PCA()
>>> det.fit_predict(X)
array([ 1,  1,  1,  1,  1,  1,  1,  1,  1, -1])
components_

array-like of shape (n_components, n_features) – Principal axes in feature space, representing the directions of maximum variance in the data.

explained_variance_

array-like of shape (n_components,) – Amount of variance explained by each of the selected components.

explained_variance_ratio_

array-like of shape (n_components,) – Percentage of variance explained by each of the selected components.

mean_

array-like of shape (n_features,) – Per-feature empirical mean, estimated from the training set.

n_components_

int – Estimated number of components.

noise_variance_

float – Estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999.

singular_values_

array-like of shape (n_components,) – Singular values corresponding to each of the selected components.