castable#
- class pdcleaner.detection.types.castable(obj, detector=None, target=None, **kwargs)[source]#
Bases:
_SeriesDetectorDetect elements that cannot be casted into target type.
Intended to be used by the detect method with the keyword ‘castable’
>>> series.cleaner.detect.castable(...) >>> series.cleaner.detect('castable',...)
This detection method flags elements in an object Series as errors when they can not be converted (casted) into a given type.
For example,
1.2is not castable as an integerABCis not castable as a number/float2022-02-28is castable as a date, but2022-02-31is not
The expected thousands and decimal separators can be customized, so that, for example:
100,000.0can be seen as 100000.Note
NA values are not treated as errors.
- Parameters:
target ({"int", "number", "date", "boolean"}, required parameter) – The target type that the value will be casted into
decimal (str, Optional) – Specific decimal separator in case of number type
thousands (str, Optional) – Specific thousand separator in case of number type
bool_values (dict,Optional) – Specific value of True and False in case of boolean type
- Raises:
ValueError – When target is not among the allowed values, or target is not defined.
ValueError – When detector is applied to
floatorintseries
Examples
>>> import pandas as pd >>> import pdcleaner
>>> series = pd.Series(['1','2.0','1.2','A','100', '22/05/1975'], dtype='object') >>> detector = series.cleaner.detect('castable', target='int') >>> print(detector.is_error()) 0 False 1 False 2 True 3 True 4 False 5 True dtype: bool
>>> detector = series.cleaner.detect('castable', target='date') >>> print(detector.is_error()) 0 True 1 True 2 True 3 True 4 True 5 False dtype: bool
>>> detector = series.cleaner.detect('castable', target='number') >>> detector.is_error() 0 False 1 False 2 False 3 False 4 False 5 True dtype: bool
>>> series = pd.Series(['Yes','No','No','Yes','Ok', 'Nok'], dtype='object') >>> detector = series.cleaner.detect('castable', target='boolean', bool_values={"Yes":True, "No":False}) >>> detector.is_error() 0 False 1 False 2 False 3 False 4 True 5 True dtype: bool
Missing values are not treated as errors.
>>> series = pd.Series(['1','2.0','1.2','A','100', np.nan], dtype='object') >>> detector = series.cleaner.detect('castable', target='number') >>> print(detector.is_error()) 0 False 1 False 2 False 3 True 4 False 5 False dtype: bool
Attributes Summary
Specific value for True and False
Specific decimal separator
Indices of the rows detected as errors
Number of rows detected as errors
The object (Series or DataFrame) containing the data to which the detection is applied
Target type that value will be checked
Specific thousand separator
Methods Summary
check_separators(series)Method to replace specific separator before applying detector
detected()Series or DataFrame containing only the detected errors
Returns True if any error has been detected, False otherwise
is_error()Return a boolean same-sized object indicating if the values are flagged as errors
Return a boolean same-sized object indicating if the values are NOT flagged as errors
report()prints a detection report
valid()Series or DataFrame containing only the valid values
Attributes Documentation
- bool_values#
Specific value for True and False
- decimal#
Specific decimal separator
- index#
Indices of the rows detected as errors
- n_errors#
Number of rows detected as errors
- name = 'castable'#
- obj#
The object (Series or DataFrame) containing the data to which the detection is applied
- target#
Target type that value will be checked
- thousands#
Specific thousand separator
Methods Documentation
- check_separators(series: Series) Series[source]#
Method to replace specific separator before applying detector
- detected()#
Series or DataFrame containing only the detected errors
- has_errors() bool#
Returns True if any error has been detected, False otherwise
- is_error() Series#
Return a boolean same-sized object indicating if the values are flagged as errors
- not_error() Series#
Return a boolean same-sized object indicating if the values are NOT flagged as errors
- report()#
prints a detection report
- valid()#
Series or DataFrame containing only the valid values