alternatives#
- pdcleaner.cleaning.cleaning.alternatives(self, detector, inplace=False)[source]#
Clean by merging values that have been identified as alternative representations of the same object by a key collision detector.
Note
This method can only applied with a ‘alternatives’ detector.
- Parameters:
detector (a Detector object,) – The detector obj that will identify entries to clean
inplace (bool (Default: False)) – Whether to perform the operation in place on the data.
- Return type:
The modified data or None if inplace is True
Example
Detect alternative formulations for the same name
>>> series = pd.Series(['Linus Torvalds','linus.torvalds','Torvalds, Linus', >>> 'Linus Torvalds', 'Bill Gates']) >>> detector = series.cleaner.detect.alternatives() >>> print(detector.is_error()) 0 False 1 True 2 True 3 False 4 False dtype: bool
Clean by standardizing
>>> print(series.cleaner.clean('alternatives', detector)) 0 Linus Torvalds 1 Linus Torvalds 2 Linus Torvalds 3 Linus Torvalds 4 Bill Gates dtype: object