cast#

pdcleaner.cleaning.cleaning.cast(self, detector, **kwargs)[source]#

Clean by casting value into the specific target type. When the value is not castable, it is transformed to NaN. This method works only with the castable detector.

When casting into date, all parameters in pd.to_datetime method are allowed. See https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html

Parameters:

detector (a Detector object,) – The detector obj that will identify entries to clean

Return type:

The modified data

Examples

>>> series = pd.Series(['100 000', '154,5', '9 000', '250,12'], dtype='object')
>>> detector = series.cleaner.detect.castable(target='float', thousands=' ', decimal=',')
>>> series.cleaner.clean('cast', detector)
0    100000
1    154.5
2    9000
3    250.12
dtype: float64

Casting into int

>>> detector = series.cleaner.detect.castable(target='int', thousands=' ', decimal=',')
>>> series.cleaner.clean('cast', detector)
0    100000
1      <NA>
2      9000
3      <NA>
dtype: Int32

Casting into date

>>> series = pd.Series(['1.05', '154', '15/05/2022', 'Alice'], dtype='object')
>>> detector = series.cleaner.detect.castable(target='date')
>>> series.cleaner.clean('cast', detector, format="%d/%m/%Y")
0    NaT
1    NaT
2    2022-05-15
3    NaT
dtype: datetime64[ns]