bounded#
- class pdcleaner.detection.basic.bounded(obj, detector=None, lower=-inf, upper=inf, inclusive='both')[source]#
Bases:
_NumericalSeriesDetectorDetect values outside of given bounds.
Intended to be used by the detect method with the keyword ‘bounded’
>>> series.cleaner.detect.bounded(...) >>> series.cleaner.detect('bounded',...)
This detection method flags values as potential errors wherever the corresponding Series element is outside the range between lower and upper.
Note
NA values are not treated as errors.
- Parameters:
lower (float or -np.inf (Default)) – Lower bound
upper (float or np.inf (Default)) – Upper bound
inclusive ({“both”, “neither”, “left”, “right”}, default "both") – Include boundaries. Whether to set each bound as closed or open.
- Raises:
Warning – when neither lower, nor upper is specified
ValueError – when lower >= upper
Examples
>>> series = pd.Series([1, 2, 100, 3]) >>> detector = series.cleaner.detect.bounded(lower=2, upper=4) >>> print(detector.is_error()) 0 True 1 False 2 True 3 False dtype: bool
With only one bound specified
>>> series = pd.Series([1, 2, 100, 3]) >>> detector = series.cleaner.detect.bounded(upper=4) >>> print(detector.is_error()) 0 False 1 False 2 True 3 False dtype: bool
Missing values are not treated as errors.
>>> series = pd.Series([1, np.nan, 100, 3]) >>> detector = series.cleaner.detect.bounded(lower=2, upper=4) >>> print(detector.is_error()) 0 True 1 False 2 True 3 False dtype: bool
Attributes Summary
Keyword to indicate if boundaries are included {“both”, “neither”, “left”, “right”}
Indices of the rows detected as errors
Lower bound
Number of rows detected as errors
The object (Series or DataFrame) containing the data to which the detection is applied
Keyword to indicate if detection is one side or both {"both", "right", "left"}
Upper bound
Methods Summary
detected()Series or DataFrame containing only the detected errors
Returns True if any error has been detected, False otherwise
is_error()Return a boolean same-sized object indicating if the values are flagged as errors
Return a boolean same-sized object indicating if the values are NOT flagged as errors
plot([color, errors_color, compact, limits, ...])plot a visualization representing an overview of the treated data and colored according to the validity of the values:
report()prints a detection report
valid()Series or DataFrame containing only the valid values
Attributes Documentation
- inclusive#
Keyword to indicate if boundaries are included {“both”, “neither”, “left”, “right”}
- index#
Indices of the rows detected as errors
- lower#
Lower bound
- n_errors#
Number of rows detected as errors
- name = 'bounded'#
- obj#
The object (Series or DataFrame) containing the data to which the detection is applied
- sided#
Keyword to indicate if detection is one side or both {“both”, “right”, “left”}
- upper#
Upper bound
Methods Documentation
- detected()#
Series or DataFrame containing only the detected errors
- has_errors() bool#
Returns True if any error has been detected, False otherwise
- is_error() Series#
Return a boolean same-sized object indicating if the values are flagged as errors
- not_error() Series#
Return a boolean same-sized object indicating if the values are NOT flagged as errors
- plot(color='green', errors_color='red', compact=False, limits=True, figsize=None)#
plot a visualization representing an overview of the treated data and colored according to the validity of the values:
a scatter plot representing the values in the treated series.
a histogram representing the distribution of values.
a kernel density estimate plot visualizing the distribution of values.
a boxplot showing the distribution of values.
- Parameters:
color (palette name (Default: "green")) – Color associated to legitimate values. Should be something that can be interpreted by seaborn’s color_palette()
errors_color (palette name (Default: "red")) – Color associated to erroneous values. Should be something that can be interpreted by seaborn’s color_palette()
compact (Bool (Default: False)) – If True, compact the plots around valid values and show the number of erroneous values on the scatter plot
limits (Bool (Default: True)) – If True, draw horizontal lines showing the lower and upper values delimiting the allowed values
figsize ((float, float) (Default: None)) – width and height of the figure.
- Returns:
axs – an array of length 4 containing the matplotlib axes representing the plots
- Return type:
array of matplotlib.axes._subplots.AxesSubplot
Examples
>>> series = pd.Series([-5, 1, 2 , 3, 8, 12]) >>> detector = series.cleaner.detect.bounded(lower=0, upper=10) >>> detector.plot()
- report()#
prints a detection report
- valid()#
Series or DataFrame containing only the valid values