dvb.datascience.data package¶
Submodules¶
dvb.datascience.data.arff module¶
-
class
dvb.datascience.data.arff.
ARFFDataExportPipe
¶ Bases:
dvb.datascience.pipe_base.PipeBase
Exports ARFF files and writes it to file.
- Args:
- file_path (str): String with a path to the file to import wekaname (str): The wekaname to be used
- Returns:
- A file.
-
file_path
= None¶
-
fit
(data: Dict[str, Any], params: Dict[str, Any])¶ Train on a dataset df and store the learnings so transform can be called later on to transform based on the learnings.
-
fit_attributes
= [('file_path', None, None), ('wekaname', None, None)]¶
-
input_keys
= ('df',)¶
-
output_keys
= ()¶
-
transform
(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶ Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.
-
wekaname
= None¶
-
class
dvb.datascience.data.arff.
ARFFDataImportPipe
¶ Bases:
dvb.datascience.pipe_base.PipeBase
Imports ARFF files and returns a dataframe.
- Args:
- file_path (str): String with a path to the file to import
- Returns:
- A dataframe
-
file_path
= None¶
-
fit
(data: Dict[str, Any], params: Dict[str, Any])¶ Train on a dataset df and store the learnings so transform can be called later on to transform based on the learnings.
-
fit_attributes
= [('file_path', None, None)]¶
-
input_keys
= ()¶
-
output_keys
= ('df',)¶
-
transform
(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶ Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.
dvb.datascience.data.csv module¶
-
class
dvb.datascience.data.csv.
CSVDataExportPipe
(file_path: str = None, sep: str = None, **kwargs)¶ Bases:
dvb.datascience.pipe_base.PipeBase
Exports a dataframe to CSV. Takes as input filepath (str), sep (str). Returns a CSV file at the specified location.
-
input_keys
= ('df',)¶
-
output_keys
= ()¶
-
transform
(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶ Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.
-
-
class
dvb.datascience.data.csv.
CSVDataImportPipe
(file_path: str = None, content: str = None, sep: bool = None, engine: str = 'python', index_col: str = None)¶ Bases:
dvb.datascience.pipe_base.PipeBase
Imports data from CSV and creates a dataframe using pd.read_csv().
- Args:
- filepath (str): path to read file content (str): raw data to import sep (bool): separation character to use engine (str): engine to be used, default is “python” index_col (str): column to use as index
- Returns:
- A dataframe with index_col as index column.
-
input_keys
= ()¶
-
output_keys
= ('df',)¶
-
transform
(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶ Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.
dvb.datascience.data.excel module¶
-
class
dvb.datascience.data.excel.
ExcelDataImportPipe
(file_path: str = None, sheet_name=0, index_col: str = None)¶ Bases:
dvb.datascience.pipe_base.PipeBase
Imports data from excel and creates a dataframe using pd.read_excel().
- Args:
- filepath(str): path to read file sheet_name(int): sheet number to be used (default 0) index_col(str): index column to be used
- Returns:
- A dataframe with index_col as index column.
-
input_keys
= ()¶
-
output_keys
= ('df',)¶
-
transform
(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶ Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.
dvb.datascience.data.teradata module¶
-
class
dvb.datascience.data.teradata.
TeraDataImportPipe
¶ Bases:
dvb.datascience.pipe_base.PipeBase
Reads data from Teradata and returns a dataframe.
- Args:
- file_path(str): path to read file containing SQL query sql(str): raw SQL query to be used
- Returns:
- A dataframe using pd.read_sql_query(), sorts the index alphabetically.
-
transform
(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶ Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.
-
class
dvb.datascience.data.teradata.
customDataTypeConverter
¶ Bases:
teradata.datatypes.DefaultDataTypeConverter
Transforms data types from Teradata to datatypes used by Python. Replaces decimal comma with decimal point. Changes BYTEINT, BIGINT, SMALLINT and INTEGER to the Python type int.
-
convertValue
(dbType, dataType, typeCode, value)¶ Converts the value returned by the database into the desired python object.
-
Module contents¶
-
class
dvb.datascience.data.
DataPipe
(key: str = 'data', data=None)¶ Bases:
dvb.datascience.pipe_base.PipeBase
Add some data to the pipeline via fit or transform params. The data can be added on three different moments:
>>> pipe = DataPipe(data=[1,2,3]) >>> pipeline.fit_transform(fit_params={"data": [4,5,6]}) >>> pipeline.transform(transform_params={"data": [7,8,9]})
The last data will be used.
-
fit
(data: Dict[str, Any], params: Dict[str, Any])¶ Train on a dataset df and store the learnings so transform can be called later on to transform based on the learnings.
-
transform
(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶ Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.
-
-
class
dvb.datascience.data.
GeneratedSampleClassification
(n_classes: int = 10, n_features: int = 20, n_samples: int = 100, random_state: int = None)¶ Bases:
dvb.datascience.pipe_base.PipeBase
-
input_keys
= ()¶
-
output_keys
= ('df', 'df_metadata')¶
-
transform
(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶ Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.
-
-
class
dvb.datascience.data.
SampleData
(dataset_name: str = 'iris')¶ Bases:
dvb.datascience.pipe_base.PipeBase
-
input_keys
= ()¶
-
output_keys
= ('df', 'df_metadata')¶
-
possible_dataset_names
= ['iris', 'diabetes', 'wine', 'boston', 'breast_cancer', 'digits', 'linnerud']¶
-
transform
(data: Dict[str, Any], params: Dict[str, Any]) → Dict[str, Any]¶ Perform an operations on df using the kwargs and the learnings from training. Transform will return a tuple with the transformed dataset and some output. The transformed dataset will be the input for the next plumber. The output will be collected and shown to the user.
-