dvb.datascience.eda package¶

Submodules¶

dvb.datascience.eda.andrews module¶

class dvb.datascience.eda.andrews.AndrewsPlot(column: str)¶

Bases: dvb.datascience.pipe_base.PipeBase

Create an andrews curves plot of the data in the dataframe

Args:: data: Dataframe with the used data column: Target column to be used in the andrews curves plot
Returns:: The plot.

input_keys = ('df',)¶

output_keys = ('figs',)¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.base module¶

class dvb.datascience.eda.base.AnalyticsBase¶

Bases: dvb.datascience.pipe_base.PipeBase

fit(data: Dict[str, Any], params: Dict[str, Any])¶: Train on a dataset df and store the learnings so transform can be called later on to transform based on the learnings.

get_number_of_dfs()¶

set_fig(idx: Any)¶

Set in plt the figure to one to be used.

If ‘idx’ has already been used, the data will be added to the plot used with this idx. If not, a new figure will be created.

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.boxplot module¶

class dvb.datascience.eda.boxplot.BoxPlot¶

Bases: dvb.datascience.pipe_base.PipeBase

Create boxplots of every feature in the dataframe.

Args:: data: Dataframe to be used in the plotting. Note that only dataframes consisting entirely out of integers or floats can be used, as strings cannot be boxplotted.
Returns:: Displays the boxplots .

input_keys = ('df',)¶

output_keys = ('figs',)¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.corrmatrix module¶

class dvb.datascience.eda.corrmatrix.CorrMatrixPlot¶

Bases: dvb.datascience.pipe_base.PipeBase

Make a plot of the correlation matrix using all the features in the dataframe

Args:: data: Dataframe to be used
Returns:: Plot of a correlation matrix

input_keys = ('df',)¶

output_keys = ('fig', 'corr')¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.describe module¶

class dvb.datascience.eda.describe.Describe¶

Bases: dvb.datascience.eda.base.AnalyticsBase

Describes the data.

Args:: data: Dataframe to be used.
Returns:: A description of the data.

input_keys = ('df',)¶

output_keys = ('output',)¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.dimension_reduction module¶

class dvb.datascience.eda.dimension_reduction.DimensionReductionPlots(y_label)¶

Bases: dvb.datascience.eda.base.AnalyticsBase

Plot dimension reduction graphs of the data.

Args:: data: Dataframe to be used.
Returns:: Dimension reduction (PCA, ISOMAP, MDS, Spectral Embedding, tSNE) plots of the data

input_keys = ('df',)¶

output_keys = ('figs',)¶

scatterPlot(X, y, title)¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.dump module¶

class dvb.datascience.eda.dump.Dump¶

Bases: dvb.datascience.eda.base.AnalyticsBase

Dump the read data

Args:: data: Dataframe to be used
Returns:: Empty dataframe.

input_keys = ('df',)¶

output_keys = ('output',)¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.ecdf module¶

class dvb.datascience.eda.ecdf.ECDFPlots¶

Bases: dvb.datascience.pipe_base.PipeBase

Creates an empirical cumulative distribution function (ECDF) plot of every feature in the dataframe.

Args:: data: Dataframe to be used in the plotting.
Returns:: Plots of an ECDF for every feature.

static ecdf(data)¶: Compute ECDF for a one-dimensional array of measurements.

input_keys = ('df',)¶

output_keys = ('figs',)¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.hist module¶

class dvb.datascience.eda.hist.Hist(show_count_labels=True, title='Histogram', groupBy: str = None)¶

Bases: dvb.datascience.pipe_base.PipeBase

Create histograms of every feature.

Args:: data: Dataframe to be used in creating the histograms show_count_labels (Boolean): determines of the number is displayed above every bin (default = True) title (str): what title to display above every histogram (default = “Histogram”) groupBy (str): this string will enable multiple bars in every bin, based on the groupBy column (default = None)

Returns: Plots of all the histograms.

input_keys = ('df',)¶

output_keys = ('figs',)¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.logit_summary module¶

class dvb.datascience.eda.logit_summary.LogitSummary(use_regularized: bool = True, **kwargs)¶

Bases: dvb.datascience.classification_pipe_base.ClassificationPipeBase

Run a statsmodels logit for coefficient interpretation on the training set

Args:: use_regularized (Boolean): Will determine if data is fitted regularized or not (default = True) data: Dataframe to be used for this function
Returns:: A summary of the logit.

input_keys = ('df', 'df_metadata')¶

output_keys = ('summary',)¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.scatter module¶

class dvb.datascience.eda.scatter.ScatterPlots¶

Bases: dvb.datascience.pipe_base.PipeBase

Create scatterplots of all the features in a dataframe. This function generates scatterplots on every unique combination of features. As the number of features grows, so does the loading time of this function, so this can take a long time.

Args:: data: Dataframe, whose features will be used to create swarm plots.
Returns:: Plots of all the scatterplots.

input_keys = ('df',)¶

output_keys = ('figs',)¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda.swarm module¶

class dvb.datascience.eda.swarm.SwarmPlots¶

Bases: dvb.datascience.pipe_base.PipeBase

Create swarmplots of all the features in a dataframe. This function generates swarmplots on every unique combination of features. As the number of features grows, so does the loading time of this function.

Args:: data: Dataframe, whose features will be used to create swarm plots.
Returns:: Plots all the swarmplots.

input_keys = ('df',)¶

output_keys = ('figs',)¶

transform(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶: Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.

dvb.datascience.eda package¶

Submodules¶

dvb.datascience.eda.andrews module¶

dvb.datascience.eda.base module¶

dvb.datascience.eda.boxplot module¶

dvb.datascience.eda.corrmatrix module¶

dvb.datascience.eda.describe module¶

dvb.datascience.eda.dimension_reduction module¶

dvb.datascience.eda.dump module¶

dvb.datascience.eda.ecdf module¶

dvb.datascience.eda.hist module¶

dvb.datascience.eda.logit_summary module¶

dvb.datascience.eda.scatter module¶

dvb.datascience.eda.swarm module¶

Module contents¶

dvb.datascience

Navigation

Related Topics