dvb.datascience package¶

Subpackages¶

Submodules¶

dvb.datascience.classification_pipe_base module¶

class dvb.datascience.classification_pipe_base.ClassificationPipeBase¶

Bases: dvb.datascience.pipe_base.PipeBase

Base class for classification pipes, so classification related attributes and methods are reusable for different kind of classification based pipes.

X = None¶

X_labels = None¶

classes = None¶

fit_attributes = [('classes', None, None), ('n_classes', None, None), ('y_true_label', None, None), ('y_pred_label', None, None), ('y_pred_proba_labels', None, None), ('X_labels', None, None)]¶

n_classes = 0¶

threshold = None¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

y_pred = None¶

y_pred_label = ''¶

y_pred_proba = None¶

y_pred_proba_labels = None¶

y_true = None¶

y_true_label = ''¶

dvb.datascience.pipe_base module¶

class dvb.datascience.pipe_base.PipeBase¶

Bases: object

Common base class for all pipes

figs = None¶

fit(data: Dict[str, Any], params: Dict[str, Any])¶: Train on a dataset df and store the learnings so transform can be called later on to transform based on the learnings.

fit_attributes = ()¶

fit_transform(data: Dict[str, Any], transform_params: Dict[str, Any], fit_params: Dict[str, Any]) → Dict[str, Any]¶

get_fig(idx: Any)¶

Set in plt the figure to one to be used.

When idx has already be used, it will set the same Figure so data can be added to that plot. Otherwise a new Figure will be set

get_transform_data_by_key(key: str) → List[Any]¶: Get all values for a certain key for all transforms

input_keys = ('df',)¶

load(state: Dict[str, Any])¶

load all fitted attributes of this Pipe from state.

Note: All PipeBase subclasses can define a fit_attributes attribute which contains a tuple for every attribute which is set during the fit phase. Those are the attributes which needs to be saved in order to be loaded in a new process without having to train (fit) the pipeline. This is useful ie for model inference. The tuple for every attribute consist of (name, serializer, deserializer).

The (de)serializer are needed to convert to/from a JSON serializable format and can be: - None: No conversion needed, ie for str, int, float, list, bool - ‘pickle’: The attribute will be pickled and stored as base64, so it can be part of a json - callable: a function which will get the object to be (de)serialized and need to return the (de)serialized version

name = None¶

output_keys = ('df',)¶

save() → Dict[str, Any]¶: Return all fitted attributes of this Pipe in a Dict which is JSON serializable.

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.pipeline module¶

class dvb.datascience.pipeline.Pipeline¶

Bases: object

A connector specifies which Pipe (identified by its name) and which output from that Pipe (identified by the key of the output) will be input to a Pipe (identified by its name) and which input for that Pipe (identified by its key)

Example

>>> pipeline = Pipeline()
>>> pipeline.addPipe('read', ReadCSV())
>>> pipeline.addPipe('impute', Impute(), [("read", "df", "df")])

>>> pipeline.fit()

>>> pipeline.transform()

addPipe(name: str, pipe: dvb.datascience.pipe_base.PipeBase, inputs: List[Tuple[Union[str, dvb.datascience.pipe_base.PipeBase], str, str]] = None, comment: str = None) → dvb.datascience.pipeline.Pipeline¶: Add a pipe pipe to the pipeline with the given name. Optionally add the input connectors by adding them to inputs. inputs is a list of the inputs whith for each input a tuple with (output_pipe, output_key, input_key).

current_transform_nr = -1¶

draw_design()¶: Returns an image with all pipes and connectors.

end()¶: When all fit and transforms are finished, end the pipeline, so some clean up can be done. At this moment, that is mainly needed to close plots, so they won’t be shown twice in the notebook

fit_transform(data: Optional[Dict[str, Any]] = None, transform_params: Optional[Dict[str, Any]] = None, fit_params: Optional[Dict[str, Any]] = None, name: str = 'fit', close_plt: bool = False) → None¶: Train all pipes in the pipeline and run the transform for the first time

fit_transform_try(*args, **kwargs)¶

static get_params(params: Dict, key: str, metadata: Dict = None) → Dict¶: Get a dict with the contents of params only relevant for the pipe with the given key as name. Besides that, also the params[‘default’] and metadata will be added.

get_pipe(name) → Optional[dvb.datascience.pipe_base.PipeBase]¶

get_pipe_input(name) → Optional[Dict]¶: Get the input for the pipe with name from the transformed outputs. Returns a dict with all data when all data for the pipe are collectable. Returns None when not all data is present yet

get_pipe_output(name: str, transform_nr: int = None) → Dict¶: Get the output of the pipe with name and the given transform_nr (which default to None which will selects the last one). When no output is present, an empty dict is returned

get_processable_pipes() → List[dvb.datascience.pipe_base.PipeBase]¶: get the pipes which are processable give the status of the pipeline

input_connectors = None¶

static is_valid_name(name)¶

load(file_path: str) → None¶: Load the fitted parameters from the file in file_path and load them in all Pipes.

output_connectors = None¶

pipes = None¶

reset_fit()¶

save(file_path: str) → None¶: Save the fitted parameters from alle Pipes to the file in file_path.

transform(data: Optional[Dict[str, Any]] = None, transform_params: Optional[Dict[str, Any]] = None, fit_params: Optional[Dict[str, Any]] = None, fit: bool = False, name: Optional[str] = None, close_plt: bool = False)¶: When transform_params or fit_params contain a key ‘default’, that params will be given to all pipes, unless it is overridden by a specific value for that pipe in transform_params or fit_params. The default can be useful for params which are needed in a lot of pipes.

transform_outputs = None¶

transform_status = None¶

transform_try(*args, **kwargs)¶

class dvb.datascience.pipeline.Status¶

Bases: enum.Enum

An enumeration.

FINISHED = 3¶

NOT_STARTED = 1¶

PROCESSING = 2¶

dvb.datascience.score module¶

class dvb.datascience.score.ClassificationScore(score_methods: List[str] = None)¶

Bases: dvb.datascience.classification_pipe_base.ClassificationPipeBase

Some scores for classification problems

accuracy() → float¶

auc() → Optional[float]¶

classification_report()¶

confusion_matrix()¶

fit(data: Dict[str, Any], params: Dict[str, Any])¶: Train on a dataset df and store the learnings so transform can be called later on to transform based on the learnings.

input_keys = ('predict', 'predict_metadata')¶

log_loss()¶

mcc(threshold: float = None) → float¶

output_keys = ('scores',)¶

params = None¶

plot_auc()¶

plot_confusion_matrix()¶

plot_model_performance()¶

possible_predict_methods = ['plot_model_performance']¶

possible_score_methods = ['auc', 'plot_auc', 'accuracy', 'mcc', 'confusion_matrix', 'plot_confusion_matrix', 'precision_recall_curve', 'log_loss', 'classification_report', 'plot_model_performance']¶

precision_recall_curve()¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.sub_pipe_base module¶

class dvb.datascience.sub_pipe_base.PassData(subpipeline, output_keys)¶

Bases: dvb.datascience.pipe_base.PipeBase

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

class dvb.datascience.sub_pipe_base.SubPipelineBase(output_pipe_name: str)¶

Bases: dvb.datascience.pipe_base.PipeBase

fit_transform(data: Dict[str, Any], transform_params: Dict[str, Any], fit_params: Dict[str, Any]) → Dict[str, Any]¶

load(state: Dict[str, Any]) → None¶: load all fitted attributes of this Pipe from state.

save() → Dict[str, Any]¶: Return all fitted attributes of this Pipe in a Dict which is JSON serializable.

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

Module contents¶

dvb.datascience.load_module(name: str, disable_warnings: bool = True, random_seed: Optional[int] = 1122) → Any¶

Convenience function for running an experiment. This function reloads the experiment when it is already loaded, so any changes in the [.. missing word ..] of that experiment will be used. Usage:

import dvb.datascience as ds p = ds.load_module(‘experiment’).run()

p can be used to access the contents of the pipeline, like:

p.get_pipe_output(‘predict’)

in case you define a ‘run()’ method in ‘experiment.py’ returning the pipeline object

dvb.datascience.run_module(name: str, disable_warnings: bool = True) → Any¶

dvb.datascience package¶

Subpackages¶

Submodules¶

dvb.datascience.classification_pipe_base module¶

dvb.datascience.pipe_base module¶

dvb.datascience.pipeline module¶

dvb.datascience.score module¶

dvb.datascience.sub_pipe_base module¶

Module contents¶

dvb.datascience

Navigation

Related Topics