Detection methods#

pdcleaner.detection.basic Module#

Basic detectors

Classes#

bounded(obj[, detector, lower, upper, inclusive])

Detect values outside of given bounds.

custom(obj[, detector, error_func])

Detect errors using an user-defined callable

duplicated(obj[, detector, subset, keep])

Detect duplicated elements

length(obj[, detector, lower, upper, value, ...])

Detect elements with length outside of given bounds.

missing(obj[, detector, how])

Detect elements containing missing values

quantiles(obj[, detector, lowerq, upperq, ...])

Detect errors values in a Series using quantiles.

pdcleaner.detection.types Module#

Detectors related to element types

Classes#

castable(obj[, detector, target])

Detect elements that cannot be casted into target type.

types(obj[, detector, ptype])

Detect elements with type errors.

pdcleaner.detection.datetimes Module#

Strings detectors

Classes#

date_range(obj[, detector, lower, upper, ...])

Detect if date value is between a given range.

pdcleaner.detection.values Module#

Values detectors

Classes#

associations(obj[, detector, count, freq])

Detects least frequent associations between two category columns

counts(obj[, detector, n])

Detect class values that appear at max n times.

enum(obj[, detector, values, forbidden])

Detect class values not in a given list.

freq(obj[, detector, freq])

Detect class values that appear less than a given freq.

value(obj[, detector, value, check_type, ...])

Detect class values different from a value.

pdcleaner.detection.gaussian Module#

Gaussian detectors

Classes#

iqr(obj[, detector, threshold, inclusive, ...])

Detect outliers as potential errors in a Series using the IQR method.

modzscore(obj[, detector, threshold, ...])

Detect outliers as potential errors in a Series using the modified Z-score.

zscore(obj[, detector, threshold, ...])

Detect outliers as potential errors in a Series using the Z-score method.

pdcleaner.detection.strings Module#

Strings detectors

Classes#

alternatives(obj[, detector, keys, keep])

Detect strings that might be alternative representations of the same thing.

pattern(obj[, detector, pattern, mode, ...])

Detect strings that do not match a given pattern.

spaces(obj[, detector, side])

Detect elements whith extra spaces before and/or after the value.

pdcleaner.detection.web Module#

Web related detection methods: * email: Detect strings that do not match an email * url: Detect strings that do not match a url * ping: Detect strings that do not match a reachable url

Classes#

email(obj[, detector])

'email': Detect strings that do not match an email.

ping(obj[, detector])

Detect strings that do not match a reachable url.

url(obj[, detector, check_protocol])

Detect strings that do not match a url.

pdcleaner.detection.multivariate Module#

Multivariate detectors

Classes#

outliers(obj[, detector, eps, min_samples])

Detects outliers in a numeric DataFrame using a clustering DBScan algorithm