Detection methods#
pdcleaner.detection.basic Module#
Basic detectors
Classes#
|
Detect values outside of given bounds. |
|
Detect errors using an user-defined callable |
|
Detect duplicated elements |
|
Detect elements with length outside of given bounds. |
|
Detect elements containing missing values |
|
Detect errors values in a Series using quantiles. |
pdcleaner.detection.types Module#
Detectors related to element types
Classes#
|
Detect elements that cannot be casted into target type. |
|
Detect elements with type errors. |
pdcleaner.detection.datetimes Module#
Strings detectors
Classes#
|
Detect if date value is between a given range. |
pdcleaner.detection.values Module#
Values detectors
Classes#
|
Detects least frequent associations between two category columns |
|
Detect class values that appear at max n times. |
|
Detect class values not in a given list. |
|
Detect class values that appear less than a given freq. |
|
Detect class values different from a value. |
pdcleaner.detection.gaussian Module#
Gaussian detectors
Classes#
|
Detect outliers as potential errors in a Series using the IQR method. |
|
Detect outliers as potential errors in a Series using the modified Z-score. |
|
Detect outliers as potential errors in a Series using the Z-score method. |
pdcleaner.detection.strings Module#
Strings detectors
Classes#
|
Detect strings that might be alternative representations of the same thing. |
|
Detect strings that do not match a given pattern. |
|
Detect elements whith extra spaces before and/or after the value. |
pdcleaner.detection.web Module#
Web related detection methods: * email: Detect strings that do not match an email * url: Detect strings that do not match a url * ping: Detect strings that do not match a reachable url
Classes#
|
'email': Detect strings that do not match an email. |
|
Detect strings that do not match a reachable url. |
|
Detect strings that do not match a url. |
pdcleaner.detection.multivariate Module#
Multivariate detectors
Classes#
|
Detects outliers in a numeric DataFrame using a clustering DBScan algorithm |