length#
- class pdcleaner.detection.basic.length(obj, detector=None, lower=-inf, upper=inf, value=inf, inclusive='both')[source]#
Bases:
_SeriesDetectorDetect elements with length outside of given bounds.
Intended to be used by the detect method with the keyword ‘length’
>>> series.cleaner.detect.length(...) >>> series.cleaner.detect('length',...)
This detection method flags elements as potential errors wherever the corresponding length of Series element is outside the range between lower and upper. Alternatively, can be used with a fixed lenght value.
Note
NA values are not treated as errors.
- Parameters:
lower (float or None (Default)) – Lower bound
upper (float or None (Default)) – Upper bound
value (float or None ( Default)) – Specific length of the element
- Raises:
TypeError – when at least one of lower, upper or value is not a number
ValueError – when lower or upper is specified at the same time as value or when none of the three is given
ValueError – when lower >= upper
Examples
>>> import pandas as pd >>> import pdcleaner
>>> series = pd.Series(['75013','78000' , '931204', '952684'], dtype='string') >>> detector = series.cleaner.detect.length(value=5) >>> print(detector.is_error()) 0 False 1 False 2 True 3 True dtype: bool
with two bounds specified
>>> series = pd.Series(['1','001' , '01460', '0011448'], dtype='string') >>> detector = series.cleaner.detect.length(lower=2, upper=6) >>> print(detector.is_error()) 0 True 1 False 2 False 3 True dtype: bool
Can be used with integers
>>> series = pd.Series([1, 1234567 , 1460, np.nan]) >>> detector = series.cleaner.detect.length(upper=6) >>> detector.is_error() 0 False 1 True 2 False 3 False dtype: bool
and with floats
>>> series = pd.Series([1.007, 1.234567 , 1.460], dtype='float64') >>> detector = series.cleaner.detect.length(upper=6) >>> detector.is_error() 0 False 1 True 2 False dtype: bool
Missing values are not treated as errors.
>>> series = pd.Series(['1','001' , '01460', np.nan], dtype='string') >>> detector = series.cleaner.detect.length(upper=6) >>> print(detector.is_error()) 0 False 1 False 2 False 3 False dtype: bool
Attributes Summary
Keyword to indicate if boundaries are included {“both”, “neither”, “left”, “right”}
Indices of the rows detected as errors
Lower bound
"Checking mode
Number of rows detected as errors
The object (Series or DataFrame) containing the data to which the detection is applied
Upper bound
Fix length value
Methods Summary
detected()Series or DataFrame containing only the detected errors
Returns True if any error has been detected, False otherwise
is_error()Return a boolean same-sized object indicating if the values are flagged as errors
Return a boolean same-sized object indicating if the values are NOT flagged as errors
report()prints a detection report
valid()Series or DataFrame containing only the valid values
Attributes Documentation
- inclusive#
Keyword to indicate if boundaries are included {“both”, “neither”, “left”, “right”}
- index#
Indices of the rows detected as errors
- lower#
Lower bound
- mode#
“Checking mode
- n_errors#
Number of rows detected as errors
- name = 'length'#
- obj#
The object (Series or DataFrame) containing the data to which the detection is applied
- upper#
Upper bound
- value#
Fix length value
Methods Documentation
- detected()#
Series or DataFrame containing only the detected errors
- has_errors() bool#
Returns True if any error has been detected, False otherwise
- is_error() Series#
Return a boolean same-sized object indicating if the values are flagged as errors
- not_error() Series#
Return a boolean same-sized object indicating if the values are NOT flagged as errors
- report()#
prints a detection report
- valid()#
Series or DataFrame containing only the valid values