samedi 11 juin 2016

Present pandas column assignment in one line based on if conditions


From separate question, but related:


I need to change a column of a pandas dataframe, but the solution I found requires a lot of brute force: It lacks versatility by having to set up conditions for each call, thanks to it being a timedelta index. Since I have several conditions that need to be assigned for stages during data collection, I was hoping for a cleaner option.

Here is the rundown:

I have several steps, which need to be given boundaries. I would like them all to be in one line, but they create an index key for the start and stop, deal with time deltas, and then establish the variables.

I would like all 7 to look like this:

    df['proc'] = np.where((df['press']>1100),'gas soak','pressurize')

Instead, they first call index keys:

    idxPnotT = df[df.proc == 'gas soak'].index.tolist()
    idxHS = idxPnotT[0]
    idxDil0 = idxPnotT[0] + pd.Timedelta(minutes=1)
    ...
    idxPnot100 = df[(df['press'] > 100)].index.tolist()
    idxPnot100 = idxPnot100[-1];

Then they use the index keys for assignment.

    df.loc[idxHS:idxDil0].proc = 'gas soak'
    ...
    df.loc[idxPostHS:idxPnot100].proc = 'vent'
    df.loc[idxPnot100:].proc = 'open'

The dataset:

df.info()
<class 'pandas.core.frame.DataFrame'>
TimedeltaIndex: 3383 entries, 00:00:00 to 00:56:25
Data columns (total 5 columns):
time     3383 non-null object
mass     3383 non-null float64
temp     3383 non-null float64
press    3383 non-null float64
proc     3383 non-null object
dtypes: float64(3), object(2)
memory usage: 158.6+ KB

df.index

Out[138]: In [139]: 
TimedeltaIndex(['00:00:00', '00:00:01', '00:00:02', '00:00:03', '00:00:04',
                '00:00:05', '00:00:06', '00:00:07', '00:00:08', '00:00:09',
                ...
                '00:56:16', '00:56:17', '00:56:18', '00:56:19', '00:56:20',
                '00:56:21', '00:56:22', '00:56:23', '00:56:24', '00:56:25'],
               dtype='timedelta64[ns]', name='time', length=3383, freq=None)

The code isn't pretty, and lacks the smoothness python allows, and I continue to get errors, without really knowing where they come from having tested both options from the caveats on the pandas page:

**SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self[name] = value**

Thanks again for all of the help!


Aucun commentaire:

Enregistrer un commentaire