replace#

pdcleaner.cleaning.cleaning.replace(self, detector, value=None, inplace=False)[source]#

Clean by replacing errors by a value

Parameters:
  • detector (a Detector object,) – The detector obj that will identify entries to clean

  • value (string, numeric, dict or callable) –

    The replacement strategy. If a single value is given, errors will be replaced with that value.

    If a dictionary is given, errors will be replaced with respect to the dictionary entries. If an erroneous value does not have a corresponding key in the value dict, it will be replaced with nan.

    If a callable is given, it is computed on the Series. It should return a scalar or a Series. It must not change input series.

  • inplace (bool (Default: False)) – Whether to perform the operation in place on the data.

Return type:

The modified data or None if inplace is True

Examples

>>> series = pd.Series([np.nan,
                           0,
                           -5,
                           4,
                           6,
                           100,
                           ])
>>> detector = series.cleaner.detect.bounded(lower=0, upper=10)
>>> series.cleaner.clean.replace(detector, value=5)
0    NaN
1    0.0
2    5.0
3    4.0
4    6.0
5    5.0
dtype: float64

Replace using a dictionary

>>> series.cleaner.clean.replace(detector, value={100:10})
0    NaN
1    0.0
2    NaN
3    4.0
4    6.0
5   10.0
dtype: float64

Replace using a lambda (the lambda applies to the series of erroneous entries)

>>> series.cleaner.clean.replace(detector,
>>>                                 value=lambda s: s.clip(lower=0) / 10 )
0    NaN
1    0.0
2    0.0
3    4.0
4    6.0
5   10.0
dtype: float64