length#

class pdcleaner.detection.basic.length(obj, detector=None, lower=-inf, upper=inf, value=inf, inclusive='both')[source]#

Bases: _SeriesDetector

Detect elements with length outside of given bounds.

Intended to be used by the detect method with the keyword ‘length’

>>> series.cleaner.detect.length(...)
>>> series.cleaner.detect('length',...)

This detection method flags elements as potential errors wherever the corresponding length of Series element is outside the range between lower and upper. Alternatively, can be used with a fixed lenght value.

Note

NA values are not treated as errors.

Parameters:

lower (float or None (Default)) – Lower bound
upper (float or None (Default)) – Upper bound
value (float or None ( Default)) – Specific length of the element

Raises:

TypeError – when at least one of lower, upper or value is not a number
ValueError – when lower or upper is specified at the same time as value or when none of the three is given
ValueError – when lower >= upper

Examples

>>> import pandas as pd
>>> import pdcleaner

>>> series = pd.Series(['75013','78000' , '931204', '952684'], dtype='string')
>>> detector = series.cleaner.detect.length(value=5)
>>> print(detector.is_error())
0    False
1    False
2     True
3     True
dtype: bool

with two bounds specified

>>> series = pd.Series(['1','001' , '01460', '0011448'], dtype='string')
>>> detector = series.cleaner.detect.length(lower=2, upper=6)
>>> print(detector.is_error())
0    True
1    False
2    False
3    True
dtype: bool

Can be used with integers

>>> series = pd.Series([1, 1234567 , 1460, np.nan])
>>> detector = series.cleaner.detect.length(upper=6)
>>> detector.is_error()
0    False
1     True
2    False
3    False
dtype: bool

and with floats

>>> series = pd.Series([1.007, 1.234567 , 1.460], dtype='float64')
>>> detector = series.cleaner.detect.length(upper=6)
>>> detector.is_error()
0    False
1     True
2    False
dtype: bool

Missing values are not treated as errors.

>>> series = pd.Series(['1','001' , '01460', np.nan], dtype='string')
>>> detector = series.cleaner.detect.length(upper=6)
>>> print(detector.is_error())
0     False
1     False
2     False
3     False
dtype: bool

Attributes Summary

`inclusive`	Keyword to indicate if boundaries are included {“both”, “neither”, “left”, “right”}
`index`	Indices of the rows detected as errors
`lower`	Lower bound
`mode`	"Checking mode
`n_errors`	Number of rows detected as errors
`name`
`obj`	The object (Series or DataFrame) containing the data to which the detection is applied
`upper`	Upper bound
`value`	Fix length value

Methods Summary

`detected`()	Series or DataFrame containing only the detected errors
`has_errors`()	Returns True if any error has been detected, False otherwise
`is_error`()	Return a boolean same-sized object indicating if the values are flagged as errors
`not_error`()	Return a boolean same-sized object indicating if the values are NOT flagged as errors
`report`()	prints a detection report
`valid`()	Series or DataFrame containing only the valid values

Attributes Documentation

inclusive#: Keyword to indicate if boundaries are included {“both”, “neither”, “left”, “right”}

index#: Indices of the rows detected as errors

lower#: Lower bound

mode#: “Checking mode

n_errors#: Number of rows detected as errors

name = 'length'#

obj#: The object (Series or DataFrame) containing the data to which the detection is applied

upper#: Upper bound

value#: Fix length value

Methods Documentation

detected()#: Series or DataFrame containing only the detected errors

has_errors() → bool#: Returns True if any error has been detected, False otherwise

is_error() → Series#: Return a boolean same-sized object indicating if the values are flagged as errors

not_error() → Series#: Return a boolean same-sized object indicating if the values are NOT flagged as errors

report()#: prints a detection report

valid()#: Series or DataFrame containing only the valid values