castable#

class pdcleaner.detection.types.castable(obj, detector=None, target=None, **kwargs)[source]#

Bases: _SeriesDetector

Detect elements that cannot be casted into target type.

Intended to be used by the detect method with the keyword ‘castable’

>>> series.cleaner.detect.castable(...)
>>> series.cleaner.detect('castable',...)

This detection method flags elements in an object Series as errors when they can not be converted (casted) into a given type.

For example,

  • 1.2 is not castable as an integer

  • ABC is not castable as a number/float

  • 2022-02-28 is castable as a date, but 2022-02-31 is not

The expected thousands and decimal separators can be customized, so that, for example: 100,000.0 can be seen as 100000.

Note

NA values are not treated as errors.

Parameters:
  • target ({"int", "number", "date", "boolean"}, required parameter) – The target type that the value will be casted into

  • decimal (str, Optional) – Specific decimal separator in case of number type

  • thousands (str, Optional) – Specific thousand separator in case of number type

  • bool_values (dict,Optional) – Specific value of True and False in case of boolean type

Raises:
  • ValueError – When target is not among the allowed values, or target is not defined.

  • ValueError – When detector is applied to float or int series

Examples

>>> import pandas as pd
>>> import pdcleaner
>>> series = pd.Series(['1','2.0','1.2','A','100', '22/05/1975'], dtype='object')
>>> detector = series.cleaner.detect('castable', target='int')
>>> print(detector.is_error())
0    False
1    False
2     True
3     True
4    False
5     True
dtype: bool
>>> detector = series.cleaner.detect('castable', target='date')
>>> print(detector.is_error())
0     True
1     True
2     True
3     True
4     True
5    False
dtype: bool
>>> detector = series.cleaner.detect('castable', target='number')
>>> detector.is_error()
0    False
1    False
2    False
3    False
4    False
5     True
dtype: bool
>>> series = pd.Series(['Yes','No','No','Yes','Ok', 'Nok'], dtype='object')
>>> detector = series.cleaner.detect('castable', target='boolean',
                                            bool_values={"Yes":True, "No":False})
>>> detector.is_error()
0    False
1    False
2    False
3    False
4     True
5     True
dtype: bool

Missing values are not treated as errors.

>>> series = pd.Series(['1','2.0','1.2','A','100', np.nan], dtype='object')
>>> detector = series.cleaner.detect('castable', target='number')
>>> print(detector.is_error())
0    False
1    False
2    False
3     True
4    False
5    False
dtype: bool

Attributes Summary

bool_values

Specific value for True and False

decimal

Specific decimal separator

index

Indices of the rows detected as errors

n_errors

Number of rows detected as errors

name

obj

The object (Series or DataFrame) containing the data to which the detection is applied

target

Target type that value will be checked

thousands

Specific thousand separator

Methods Summary

check_separators(series)

Method to replace specific separator before applying detector

detected()

Series or DataFrame containing only the detected errors

has_errors()

Returns True if any error has been detected, False otherwise

is_error()

Return a boolean same-sized object indicating if the values are flagged as errors

not_error()

Return a boolean same-sized object indicating if the values are NOT flagged as errors

report()

prints a detection report

valid()

Series or DataFrame containing only the valid values

Attributes Documentation

bool_values#

Specific value for True and False

decimal#

Specific decimal separator

index#

Indices of the rows detected as errors

n_errors#

Number of rows detected as errors

name = 'castable'#
obj#

The object (Series or DataFrame) containing the data to which the detection is applied

target#

Target type that value will be checked

thousands#

Specific thousand separator

Methods Documentation

check_separators(series: Series) Series[source]#

Method to replace specific separator before applying detector

detected()#

Series or DataFrame containing only the detected errors

has_errors() bool#

Returns True if any error has been detected, False otherwise

is_error() Series#

Return a boolean same-sized object indicating if the values are flagged as errors

not_error() Series#

Return a boolean same-sized object indicating if the values are NOT flagged as errors

report()#

prints a detection report

valid()#

Series or DataFrame containing only the valid values