-
class
kenchi.outlier_detection.reconstruction_based.
PCA
(contamination=0.1, iterated_power='auto', n_components=None, random_state=None, svd_solver='auto', tol=0.0, whiten=False)[source]¶ Bases:
kenchi.outlier_detection.base.BaseOutlierDetector
Outlier detector using Principal Component Analysis (PCA).
Parameters: - contamination (float, default 0.1) – Proportion of outliers in the data set. Used to define the threshold.
- iterated_power (int, default 'auto') – Number of iterations for the power method computed by svd_solver == ‘randomized’.
- n_components (int, float, or string, default None) – Number of components to keep.
- random_state (int or RandomState instance, default None) – Seed of the pseudo random number generator.
- svd_solver (string, default 'auto') – SVD solver to use. Valid solvers are [‘auto’|’full’|’arpack’|’randomized’].
- tol (float, default 0.0) – Tolerance to declare convergence for singular values computed by svd_solver == ‘arpack’.
- whiten (bool, default False) – If True, the
components_
vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.
-
anomaly_score_
¶ array-like of shape (n_samples,) – Anomaly score for each training data.
-
contamination_
¶ float – Actual proportion of outliers in the data set.
-
threshold_
¶ float – Threshold.
Examples
>>> import numpy as np >>> from kenchi.outlier_detection import PCA >>> X = np.array([ ... [0., 0.], [1., 1.], [2., 0.], [3., -1.], [4., 0.], ... [5., 1.], [6., 0.], [7., -1.], [8., 0.], [1000., 1.] ... ]) >>> det = PCA() >>> det.fit_predict(X) array([ 1, 1, 1, 1, 1, 1, 1, 1, 1, -1])
-
components_
¶ array-like of shape (n_components, n_features) – Principal axes in feature space, representing the directions of maximum variance in the data.
-
explained_variance_
¶ array-like of shape (n_components,) – Amount of variance explained by each of the selected components.
-
explained_variance_ratio_
¶ array-like of shape (n_components,) – Percentage of variance explained by each of the selected components.
-
mean_
¶ array-like of shape (n_features,) – Per-feature empirical mean, estimated from the training set.
-
n_components_
¶ int – Estimated number of components.
-
noise_variance_
¶ float – Estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999.
-
singular_values_
¶ array-like of shape (n_components,) – Singular values corresponding to each of the selected components.