bounded#

class pdcleaner.detection.basic.bounded(obj, detector=None, lower=-inf, upper=inf, inclusive='both')[source]#

Bases: _NumericalSeriesDetector

Detect values outside of given bounds.

Intended to be used by the detect method with the keyword ‘bounded’

>>> series.cleaner.detect.bounded(...)
>>> series.cleaner.detect('bounded',...)

This detection method flags values as potential errors wherever the corresponding Series element is outside the range between lower and upper.

Note

NA values are not treated as errors.

Parameters:
  • lower (float or -np.inf (Default)) – Lower bound

  • upper (float or np.inf (Default)) – Upper bound

  • inclusive ({“both”, “neither”, “left”, “right”}, default "both") – Include boundaries. Whether to set each bound as closed or open.

Raises:
  • Warning – when neither lower, nor upper is specified

  • ValueError – when lower >= upper

Examples

>>> series = pd.Series([1, 2, 100, 3])
>>> detector = series.cleaner.detect.bounded(lower=2, upper=4)
>>> print(detector.is_error())
0     True
1    False
2     True
3    False
dtype: bool

With only one bound specified

>>> series = pd.Series([1, 2, 100, 3])
>>> detector = series.cleaner.detect.bounded(upper=4)
>>> print(detector.is_error())
0    False
1    False
2     True
3    False
dtype: bool

Missing values are not treated as errors.

>>> series = pd.Series([1, np.nan, 100, 3])
>>> detector = series.cleaner.detect.bounded(lower=2, upper=4)
>>> print(detector.is_error())
0     True
1    False
2     True
3    False
dtype: bool

Attributes Summary

inclusive

Keyword to indicate if boundaries are included {“both”, “neither”, “left”, “right”}

index

Indices of the rows detected as errors

lower

Lower bound

n_errors

Number of rows detected as errors

name

obj

The object (Series or DataFrame) containing the data to which the detection is applied

sided

Keyword to indicate if detection is one side or both {"both", "right", "left"}

upper

Upper bound

Methods Summary

detected()

Series or DataFrame containing only the detected errors

has_errors()

Returns True if any error has been detected, False otherwise

is_error()

Return a boolean same-sized object indicating if the values are flagged as errors

not_error()

Return a boolean same-sized object indicating if the values are NOT flagged as errors

plot([color, errors_color, compact, limits, ...])

plot a visualization representing an overview of the treated data and colored according to the validity of the values:

report()

prints a detection report

valid()

Series or DataFrame containing only the valid values

Attributes Documentation

inclusive#

Keyword to indicate if boundaries are included {“both”, “neither”, “left”, “right”}

index#

Indices of the rows detected as errors

lower#

Lower bound

n_errors#

Number of rows detected as errors

name = 'bounded'#
obj#

The object (Series or DataFrame) containing the data to which the detection is applied

sided#

Keyword to indicate if detection is one side or both {“both”, “right”, “left”}

upper#

Upper bound

Methods Documentation

detected()#

Series or DataFrame containing only the detected errors

has_errors() bool#

Returns True if any error has been detected, False otherwise

is_error() Series#

Return a boolean same-sized object indicating if the values are flagged as errors

not_error() Series#

Return a boolean same-sized object indicating if the values are NOT flagged as errors

plot(color='green', errors_color='red', compact=False, limits=True, figsize=None)#

plot a visualization representing an overview of the treated data and colored according to the validity of the values:

  • a scatter plot representing the values in the treated series.

  • a histogram representing the distribution of values.

  • a kernel density estimate plot visualizing the distribution of values.

  • a boxplot showing the distribution of values.

Parameters:
  • color (palette name (Default: "green")) – Color associated to legitimate values. Should be something that can be interpreted by seaborn’s color_palette()

  • errors_color (palette name (Default: "red")) – Color associated to erroneous values. Should be something that can be interpreted by seaborn’s color_palette()

  • compact (Bool (Default: False)) – If True, compact the plots around valid values and show the number of erroneous values on the scatter plot

  • limits (Bool (Default: True)) – If True, draw horizontal lines showing the lower and upper values delimiting the allowed values

  • figsize ((float, float) (Default: None)) – width and height of the figure.

Returns:

axs – an array of length 4 containing the matplotlib axes representing the plots

Return type:

array of matplotlib.axes._subplots.AxesSubplot

Examples

>>> series = pd.Series([-5, 1, 2 , 3, 8, 12])
>>> detector = series.cleaner.detect.bounded(lower=0, upper=10)
>>> detector.plot()
../_static/plot_numseries.png
report()#

prints a detection report

valid()#

Series or DataFrame containing only the valid values