length#

class pdcleaner.detection.basic.length(obj, detector=None, lower=-inf, upper=inf, value=inf, inclusive='both')[source]#

Bases: _SeriesDetector

Detect elements with length outside of given bounds.

Intended to be used by the detect method with the keyword ‘length’

>>> series.cleaner.detect.length(...)
>>> series.cleaner.detect('length',...)

This detection method flags elements as potential errors wherever the corresponding length of Series element is outside the range between lower and upper. Alternatively, can be used with a fixed lenght value.

Note

NA values are not treated as errors.

Parameters:
  • lower (float or None (Default)) – Lower bound

  • upper (float or None (Default)) – Upper bound

  • value (float or None ( Default)) – Specific length of the element

Raises:
  • TypeError – when at least one of lower, upper or value is not a number

  • ValueError – when lower or upper is specified at the same time as value or when none of the three is given

  • ValueError – when lower >= upper

Examples

>>> import pandas as pd
>>> import pdcleaner
>>> series = pd.Series(['75013','78000' , '931204', '952684'], dtype='string')
>>> detector = series.cleaner.detect.length(value=5)
>>> print(detector.is_error())
0    False
1    False
2     True
3     True
dtype: bool

with two bounds specified

>>> series = pd.Series(['1','001' , '01460', '0011448'], dtype='string')
>>> detector = series.cleaner.detect.length(lower=2, upper=6)
>>> print(detector.is_error())
0    True
1    False
2    False
3    True
dtype: bool

Can be used with integers

>>> series = pd.Series([1, 1234567 , 1460, np.nan])
>>> detector = series.cleaner.detect.length(upper=6)
>>> detector.is_error()
0    False
1     True
2    False
3    False
dtype: bool

and with floats

>>> series = pd.Series([1.007, 1.234567 , 1.460], dtype='float64')
>>> detector = series.cleaner.detect.length(upper=6)
>>> detector.is_error()
0    False
1     True
2    False
dtype: bool

Missing values are not treated as errors.

>>> series = pd.Series(['1','001' , '01460', np.nan], dtype='string')
>>> detector = series.cleaner.detect.length(upper=6)
>>> print(detector.is_error())
0     False
1     False
2     False
3     False
dtype: bool

Attributes Summary

inclusive

Keyword to indicate if boundaries are included {“both”, “neither”, “left”, “right”}

index

Indices of the rows detected as errors

lower

Lower bound

mode

"Checking mode

n_errors

Number of rows detected as errors

name

obj

The object (Series or DataFrame) containing the data to which the detection is applied

upper

Upper bound

value

Fix length value

Methods Summary

detected()

Series or DataFrame containing only the detected errors

has_errors()

Returns True if any error has been detected, False otherwise

is_error()

Return a boolean same-sized object indicating if the values are flagged as errors

not_error()

Return a boolean same-sized object indicating if the values are NOT flagged as errors

report()

prints a detection report

valid()

Series or DataFrame containing only the valid values

Attributes Documentation

inclusive#

Keyword to indicate if boundaries are included {“both”, “neither”, “left”, “right”}

index#

Indices of the rows detected as errors

lower#

Lower bound

mode#

“Checking mode

n_errors#

Number of rows detected as errors

name = 'length'#
obj#

The object (Series or DataFrame) containing the data to which the detection is applied

upper#

Upper bound

value#

Fix length value

Methods Documentation

detected()#

Series or DataFrame containing only the detected errors

has_errors() bool#

Returns True if any error has been detected, False otherwise

is_error() Series#

Return a boolean same-sized object indicating if the values are flagged as errors

not_error() Series#

Return a boolean same-sized object indicating if the values are NOT flagged as errors

report()#

prints a detection report

valid()#

Series or DataFrame containing only the valid values