custom#

class pdcleaner.detection.basic.custom(obj, detector=None, error_func=None)[source]#

Bases: _Detector

Detect errors using an user-defined callable

Intended to be used by the detect method with the keyword ‘custom’

>>> df.cleaner.detect.custom(...)
>>> df.cleaner.detect('custom',...)
Parameters:

error_func (Callable) – returns a boolean: True if the element/row is an error, False otherwise

Raises:
  • TypeError: – when error_func is not a callable

  • ValueError – when the number of arguments of error_func does not match the number of columns

  • TypeError: – when error_func does not return a boolean

Examples

>>> import pandas as pd
>>> import pdcleaner

with a lambda function

>>> series = pd.Series([-1, 2, 3])
>>> detector = series.cleaner.detect('custom', error_func=lambda x: x<0)
>>> print(detector.is_error())
0     True
1    False
2    False
dtype: bool

with a function

>>> def f(x) -> bool:
        if x**2 > 5:
            return True
        return False
>>> detector = series.cleaner.detect('custom', error_func=f)
>>> print(detector.is_error())
0    False
1     True
2     True
dtype: bool

with a dataframe, the callable should have the same number of inputs as the df.

>>> df = pd.DataFrame({'col1' : [1,2,3], 'col2' : [1,3,9] })
>>> bad_square = lambda x,y: x**2!=y
>>> df.cleaner.detect('custom', error_func=bad_square).is_error()
0    False
1     True
2    False
dtype: bool

Attributes Summary

error_func

Custom error function

index

Indices of the rows detected as errors

n_errors

Number of rows detected as errors

name

obj

The object (Series or DataFrame) containing the data to which the detection is applied

Methods Summary

detected()

Series or DataFrame containing only the detected errors

has_errors()

Returns True if any error has been detected, False otherwise

is_error()

Return a boolean same-sized object indicating if the values are flagged as errors

not_error()

Return a boolean same-sized object indicating if the values are NOT flagged as errors

report()

prints a detection report

valid()

Series or DataFrame containing only the valid values

Attributes Documentation

error_func#

Custom error function

index#

Indices of the rows detected as errors

n_errors#

Number of rows detected as errors

name = 'custom'#
obj#

The object (Series or DataFrame) containing the data to which the detection is applied

Methods Documentation

detected()#

Series or DataFrame containing only the detected errors

has_errors() bool#

Returns True if any error has been detected, False otherwise

is_error() Series#

Return a boolean same-sized object indicating if the values are flagged as errors

not_error() Series#

Return a boolean same-sized object indicating if the values are NOT flagged as errors

report()#

prints a detection report

valid()#

Series or DataFrame containing only the valid values