<no title>

kenchi.pipeline.make_pipeline(*steps)[source]¶

Construct a Pipeline from the given estimators. This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically.

Parameters:	steps (list*) – List of estimators.
Returns:	p
Return type:	Pipeline

Examples

>>> from kenchi.outlier_detection import MiniBatchKMeans
>>> from kenchi.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> scaler = StandardScaler()
>>> det = MiniBatchKMeans()
>>> pipeline = make_pipeline(scaler, det)

class kenchi.pipeline.Pipeline(steps, memory=None)[source]¶

Bases: sklearn.pipeline.Pipeline

Pipeline of transforms with a final estimator.

Parameters:

steps (list) – List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
memory (instance of joblib.Memory or string, default None) – Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.

named_steps¶: dict – Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

Examples

>>> import numpy as np
>>> from kenchi.outlier_detection import MiniBatchKMeans
>>> from kenchi.pipeline import Pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> X = np.array([
...     [0., 0.], [1., 1.], [2., 0.], [3., -1.], [4., 0.],
...     [5., 1.], [6., 0.], [7., -1.], [8., 0.], [1000., 1.]
... ])
>>> det = MiniBatchKMeans(n_clusters=1, random_state=0)
>>> scaler = StandardScaler()
>>> pipeline = Pipeline([('scaler', scaler), ('det', det)])
>>> pipeline.fit_predict(X)
array([ 1,  1,  1,  1,  1,  1,  1,  1,  1, -1])

anomaly_score(X=None, **kwargs)[source]¶

Apply transforms, and compute the anomaly score for each sample with the final estimator.

Parameters:	X (array-like of shape (n_samples, n_features)) – Data. If None, compute the anomaly score for each training samples. normalize (bool, default False) – If True, return the normalized anomaly score.
Returns:	anomaly_score – Anomaly score for each sample.
Return type:	array-like of shape (n_samples,)

featurewise_anomaly_score(X)[source]¶

Apply transforms, and compute the feature-wise anomaly scores for each sample with the final estimator.

Parameters:	X (array-like of shape (n_samples, n_features)) – Data.
Returns:	anomaly_score – Feature-wise anomaly scores for each sample.
Return type:	array-like of shape (n_samples, n_features)

plot_anomaly_score(X=None, **kwargs)[source]¶

Apply transoforms, and plot the anomaly score for each sample with the final estimator.

Parameters:	X (array-like of shape (n_samples, n_features), default None) – Data. If None, plot the anomaly score for each training samples. normalize (bool, default False) – If True, plot the normalized anomaly score. ax (matplotlib Axes, default None) – Target axes instance. bins (int, str or array-like, default 'auto') – Number of hist bins. figsize (tuple, default None) – Tuple denoting figure size of the plot. filename (str, default None) – If provided, save the current figure. hist (bool, default True) – If True, plot a histogram of anomaly scores. kde (bool, default True) – If True, plot a gaussian kernel density estimate. title (string, default None) – Axes title. To disable, pass None. xlabel (string, default 'Samples') – X axis title label. To disable, pass None. xlim (tuple, default None) – Tuple passed to `ax.xlim`. ylabel (string, default 'Anomaly score') – Y axis title label. To disable, pass None. ylim (tuple, default None) – Tuple passed to `ax.ylim`. *kwargs (dict*) – Other keywords passed to `ax.plot`.
Returns:	ax – Axes on which the plot was drawn.
Return type:	matplotlib Axes

plot_graphical_model¶

Apply transforms, and plot the Gaussian Graphical Model (GGM) with the final estimator.

Parameters:	ax (matplotlib Axes, default None) – Target axes instance. figsize (tuple, default None) – Tuple denoting figure size of the plot. filename (str, default None) – If provided, save the current figure. random_state (int, RandomState instance, default None) – Seed of the pseudo random number generator. title (string, default 'GGM (n_clusters, n_features, n_isolates)') – Axes title. To disable, pass None. *kwargs (dict*) – Other keywords passed to `nx.draw_networkx`.
Returns:	ax – Axes on which the plot was drawn.
Return type:	matplotlib Axes

plot_partial_corrcoef¶

Apply transforms, and plot the partial correlation coefficient matrix with the final estimator.

Parameters:	ax (matplotlib Axes, default None) – Target axes instance. cbar (bool, default True.) – If True, draw a colorbar. figsize (tuple, default None) – Tuple denoting figure size of the plot. filename (str, default None) – If provided, save the current figure. title (string, default 'Partial correlation') – Axes title. To disable, pass None. *kwargs (dict*) – Other keywords passed to `ax.pcolormesh`.
Returns:	ax – Axes on which the plot was drawn.
Return type:	matplotlib Axes

plot_roc_curve(X, y, **kwargs)[source]¶

Apply transoforms, and plot the Receiver Operating Characteristic (ROC) curve with the final estimator.

Parameters:	X (array-like of shape (n_samples, n_features)) – Data. y (array-like of shape (n_samples,)) – Labels. ax (matplotlib Axes, default None) – Target axes instance. figsize (tuple, default None) – Tuple denoting figure size of the plot. filename (str, default None) – If provided, save the current figure. title (string, default 'ROC curve') – Axes title. To disable, pass None. xlabel (string, default 'FPR') – X axis title label. To disable, pass None. ylabel (string, default 'TPR') – Y axis title label. To disable, pass None. *kwargs (dict*) – Other keywords passed to `ax.plot`.
Returns:	ax – Axes on which the plot was drawn.
Return type:	matplotlib Axes

score_samples(X=None)[source]¶

Apply transforms, and compute the opposite of the anomaly score for each sample with the final estimator.

Parameters:	X (array-like of shape (n_samples, n_features), default None) – Data. If None, compute the opposite of the anomaly score for each training sample.
Returns:	score_samples – Opposite of the anomaly score for each sample.
Return type:	array-like of shape (n_samples,)

to_pickle(filename, **kwargs)[source]¶

Persist a pipeline object.

Parameters:	filename (str or pathlib.Path) – Path of the file in which it is to be stored. kwargs (dict) – Other keywords passed to `sklearn.externals.joblib.dump`.
Returns:	filenames – List of file names in which the data is stored.
Return type:	list