alternatives#

pdcleaner.cleaning.cleaning.alternatives(self, detector, inplace=False)[source]#

Clean by merging values that have been identified as alternative representations of the same object by a key collision detector.

Note

This method can only applied with a ‘alternatives’ detector.

Parameters:
  • detector (a Detector object,) – The detector obj that will identify entries to clean

  • inplace (bool (Default: False)) – Whether to perform the operation in place on the data.

Return type:

The modified data or None if inplace is True

Example

Detect alternative formulations for the same name

>>> series = pd.Series(['Linus Torvalds','linus.torvalds','Torvalds, Linus',
>>>                     'Linus Torvalds', 'Bill Gates'])
>>> detector = series.cleaner.detect.alternatives()
>>> print(detector.is_error())
0    False
1     True
2     True
3    False
4    False
dtype: bool

Clean by standardizing

>>> print(series.cleaner.clean('alternatives', detector))
0    Linus Torvalds
1    Linus Torvalds
2    Linus Torvalds
3    Linus Torvalds
4        Bill Gates
dtype: object