-
kenchi.pipeline.
make_pipeline
(*steps)[source]¶ Construct a Pipeline from the given estimators. This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically.
Parameters: *steps (list) – List of estimators. Returns: p Return type: Pipeline Examples
>>> from kenchi.outlier_detection import MiniBatchKMeans >>> from kenchi.pipeline import make_pipeline >>> from sklearn.preprocessing import StandardScaler >>> scaler = StandardScaler() >>> det = MiniBatchKMeans() >>> pipeline = make_pipeline(scaler, det)
-
class
kenchi.pipeline.
Pipeline
(steps, memory=None)[source]¶ Bases:
sklearn.pipeline.Pipeline
Pipeline of transforms with a final estimator.
Parameters: - steps (list) – List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
- memory (instance of joblib.Memory or string, default None) – Used to cache the fitted transformers of the pipeline. By default, no
caching is performed. If a string is given, it is the path to the
caching directory. Enabling caching triggers a clone of the
transformers before fitting. Therefore, the transformer instance given
to the pipeline cannot be inspected directly. Use the attribute
named_steps
orsteps
to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.
-
named_steps
¶ dict – Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.
Examples
>>> import numpy as np >>> from kenchi.outlier_detection import MiniBatchKMeans >>> from kenchi.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler >>> X = np.array([ ... [0., 0.], [1., 1.], [2., 0.], [3., -1.], [4., 0.], ... [5., 1.], [6., 0.], [7., -1.], [8., 0.], [1000., 1.] ... ]) >>> det = MiniBatchKMeans(n_clusters=1, random_state=0) >>> scaler = StandardScaler() >>> pipeline = Pipeline([('scaler', scaler), ('det', det)]) >>> pipeline.fit_predict(X) array([ 1, 1, 1, 1, 1, 1, 1, 1, 1, -1])
-
anomaly_score
(X=None, **kwargs)[source]¶ Apply transforms, and compute the anomaly score for each sample with the final estimator.
Parameters: - X (array-like of shape (n_samples, n_features)) – Data. If None, compute the anomaly score for each training samples.
- normalize (bool, default False) – If True, return the normalized anomaly score.
Returns: anomaly_score – Anomaly score for each sample.
Return type: array-like of shape (n_samples,)
-
featurewise_anomaly_score
(X)[source]¶ Apply transforms, and compute the feature-wise anomaly scores for each sample with the final estimator.
Parameters: X (array-like of shape (n_samples, n_features)) – Data. Returns: anomaly_score – Feature-wise anomaly scores for each sample. Return type: array-like of shape (n_samples, n_features)
-
plot_anomaly_score
(X=None, **kwargs)[source]¶ Apply transoforms, and plot the anomaly score for each sample with the final estimator.
Parameters: - X (array-like of shape (n_samples, n_features), default None) – Data. If None, plot the anomaly score for each training samples.
- normalize (bool, default False) – If True, plot the normalized anomaly score.
- ax (matplotlib Axes, default None) – Target axes instance.
- bins (int, str or array-like, default 'auto') – Number of hist bins.
- figsize (tuple, default None) – Tuple denoting figure size of the plot.
- filename (str, default None) – If provided, save the current figure.
- hist (bool, default True) – If True, plot a histogram of anomaly scores.
- kde (bool, default True) – If True, plot a gaussian kernel density estimate.
- title (string, default None) – Axes title. To disable, pass None.
- xlabel (string, default 'Samples') – X axis title label. To disable, pass None.
- xlim (tuple, default None) – Tuple passed to
ax.xlim
. - ylabel (string, default 'Anomaly score') – Y axis title label. To disable, pass None.
- ylim (tuple, default None) – Tuple passed to
ax.ylim
. - **kwargs (dict) – Other keywords passed to
ax.plot
.
Returns: ax – Axes on which the plot was drawn.
Return type: matplotlib Axes
-
plot_graphical_model
¶ Apply transforms, and plot the Gaussian Graphical Model (GGM) with the final estimator.
Parameters: - ax (matplotlib Axes, default None) – Target axes instance.
- figsize (tuple, default None) – Tuple denoting figure size of the plot.
- filename (str, default None) – If provided, save the current figure.
- random_state (int, RandomState instance, default None) – Seed of the pseudo random number generator.
- title (string, default 'GGM (n_clusters, n_features, n_isolates)') – Axes title. To disable, pass None.
- **kwargs (dict) – Other keywords passed to
nx.draw_networkx
.
Returns: ax – Axes on which the plot was drawn.
Return type: matplotlib Axes
-
plot_partial_corrcoef
¶ Apply transforms, and plot the partial correlation coefficient matrix with the final estimator.
Parameters: - ax (matplotlib Axes, default None) – Target axes instance.
- cbar (bool, default True.) – If True, draw a colorbar.
- figsize (tuple, default None) – Tuple denoting figure size of the plot.
- filename (str, default None) – If provided, save the current figure.
- title (string, default 'Partial correlation') – Axes title. To disable, pass None.
- **kwargs (dict) – Other keywords passed to
ax.pcolormesh
.
Returns: ax – Axes on which the plot was drawn.
Return type: matplotlib Axes
-
plot_roc_curve
(X, y, **kwargs)[source]¶ Apply transoforms, and plot the Receiver Operating Characteristic (ROC) curve with the final estimator.
Parameters: - X (array-like of shape (n_samples, n_features)) – Data.
- y (array-like of shape (n_samples,)) – Labels.
- ax (matplotlib Axes, default None) – Target axes instance.
- figsize (tuple, default None) – Tuple denoting figure size of the plot.
- filename (str, default None) – If provided, save the current figure.
- title (string, default 'ROC curve') – Axes title. To disable, pass None.
- xlabel (string, default 'FPR') – X axis title label. To disable, pass None.
- ylabel (string, default 'TPR') – Y axis title label. To disable, pass None.
- **kwargs (dict) – Other keywords passed to
ax.plot
.
Returns: ax – Axes on which the plot was drawn.
Return type: matplotlib Axes
-
score_samples
(X=None)[source]¶ Apply transforms, and compute the opposite of the anomaly score for each sample with the final estimator.
Parameters: X (array-like of shape (n_samples, n_features), default None) – Data. If None, compute the opposite of the anomaly score for each training sample. Returns: score_samples – Opposite of the anomaly score for each sample. Return type: array-like of shape (n_samples,)
-
to_pickle
(filename, **kwargs)[source]¶ Persist a pipeline object.
Parameters: - filename (str or pathlib.Path) – Path of the file in which it is to be stored.
- kwargs (dict) – Other keywords passed to
sklearn.externals.joblib.dump
.
Returns: filenames – List of file names in which the data is stored.
Return type: list