dvb.datascience.eda package

Submodules

dvb.datascience.eda.andrews module

class dvb.datascience.eda.andrews.AndrewsPlot(column: str)

Bases: dvb.datascience.pipe_base.PipeBase

Create an andrews curves plot of the data in the dataframe

Args:
data: Dataframe with the used data column: Target column to be used in the andrews curves plot
Returns:
The plot.
input_keys = ('df',)
output_keys = ('figs',)
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.base module

class dvb.datascience.eda.base.AnalyticsBase

Bases: dvb.datascience.pipe_base.PipeBase

fit(data: Dict[str, Any], params: Dict[str, Any])

Train on a dataset df and store the learnings so transform can be called later on to transform based on the learnings.

get_number_of_dfs()
set_fig(idx: Any)

Set in plt the figure to one to be used.

If ‘idx’ has already been used, the data will be added to the plot used with this idx. If not, a new figure will be created.

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.boxplot module

class dvb.datascience.eda.boxplot.BoxPlot

Bases: dvb.datascience.pipe_base.PipeBase

Create boxplots of every feature in the dataframe.

Args:
data: Dataframe to be used in the plotting. Note that only dataframes consisting entirely out of integers or floats can be used, as strings cannot be boxplotted.
Returns:
Displays the boxplots .
input_keys = ('df',)
output_keys = ('figs',)
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.corrmatrix module

class dvb.datascience.eda.corrmatrix.CorrMatrixPlot

Bases: dvb.datascience.pipe_base.PipeBase

Make a plot of the correlation matrix using all the features in the dataframe

Args:
data: Dataframe to be used
Returns:
Plot of a correlation matrix
input_keys = ('df',)
output_keys = ('fig', 'corr')
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.describe module

class dvb.datascience.eda.describe.Describe

Bases: dvb.datascience.eda.base.AnalyticsBase

Describes the data.

Args:
data: Dataframe to be used.
Returns:
A description of the data.
input_keys = ('df',)
output_keys = ('output',)
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.dimension_reduction module

class dvb.datascience.eda.dimension_reduction.DimensionReductionPlots(y_label)

Bases: dvb.datascience.eda.base.AnalyticsBase

Plot dimension reduction graphs of the data.

Args:
data: Dataframe to be used.
Returns:
Dimension reduction (PCA, ISOMAP, MDS, Spectral Embedding, tSNE) plots of the data
input_keys = ('df',)
output_keys = ('figs',)
scatterPlot(X, y, title)
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.dump module

class dvb.datascience.eda.dump.Dump

Bases: dvb.datascience.eda.base.AnalyticsBase

Dump the read data

Args:
data: Dataframe to be used
Returns:
Empty dataframe.
input_keys = ('df',)
output_keys = ('output',)
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.ecdf module

class dvb.datascience.eda.ecdf.ECDFPlots

Bases: dvb.datascience.pipe_base.PipeBase

Creates an empirical cumulative distribution function (ECDF) plot of every feature in the dataframe.

Args:
data: Dataframe to be used in the plotting.
Returns:
Plots of an ECDF for every feature.
static ecdf(data)

Compute ECDF for a one-dimensional array of measurements.

input_keys = ('df',)
output_keys = ('figs',)
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.hist module

class dvb.datascience.eda.hist.Hist(show_count_labels=True, title='Histogram', groupBy: str = None)

Bases: dvb.datascience.pipe_base.PipeBase

Create histograms of every feature.

Args:
data: Dataframe to be used in creating the histograms show_count_labels (Boolean): determines of the number is displayed above every bin (default = True) title (str): what title to display above every histogram (default = “Histogram”) groupBy (str): this string will enable multiple bars in every bin, based on the groupBy column (default = None)

Returns: Plots of all the histograms.

input_keys = ('df',)
output_keys = ('figs',)
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.logit_summary module

class dvb.datascience.eda.logit_summary.LogitSummary(use_regularized: bool = True, **kwargs)

Bases: dvb.datascience.classification_pipe_base.ClassificationPipeBase

Run a statsmodels logit for coefficient interpretation on the training set

Args:
use_regularized (Boolean): Will determine if data is fitted regularized or not (default = True) data: Dataframe to be used for this function
Returns:
A summary of the logit.
input_keys = ('df', 'df_metadata')
output_keys = ('summary',)
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.scatter module

class dvb.datascience.eda.scatter.ScatterPlots

Bases: dvb.datascience.pipe_base.PipeBase

Create scatterplots of all the features in a dataframe. This function generates scatterplots on every unique combination of features. As the number of features grows, so does the loading time of this function, so this can take a long time.

Args:
data: Dataframe, whose features will be used to create swarm plots.
Returns:
Plots of all the scatterplots.
input_keys = ('df',)
output_keys = ('figs',)
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.swarm module

class dvb.datascience.eda.swarm.SwarmPlots

Bases: dvb.datascience.pipe_base.PipeBase

Create swarmplots of all the features in a dataframe. This function generates swarmplots on every unique combination of features. As the number of features grows, so does the loading time of this function.

Args:
data: Dataframe, whose features will be used to create swarm plots.
Returns:
Plots all the swarmplots.
input_keys = ('df',)
output_keys = ('figs',)
transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]

Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

Module contents