From separate question, but related:
I need to change a column of a pandas dataframe, but the solution I found requires a lot of brute force: It lacks versatility by having to set up conditions for each call, thanks to it being a timedelta index. Since I have several conditions that need to be assigned for stages during data collection, I was hoping for a cleaner option.
Here is the rundown:
I have several steps, which need to be given boundaries. I would like them all to be in one line, but they create an index key for the start and stop, deal with time deltas, and then establish the variables.
I would like all 7 to look like this:
df['proc'] = np.where((df['press']>1100),'gas soak','pressurize')
Instead, they first call index keys:
idxPnotT = df[df.proc == 'gas soak'].index.tolist()
idxHS = idxPnotT[0]
idxDil0 = idxPnotT[0] + pd.Timedelta(minutes=1)
...
idxPnot100 = df[(df['press'] > 100)].index.tolist()
idxPnot100 = idxPnot100[-1];
Then they use the index keys for assignment.
df.loc[idxHS:idxDil0].proc = 'gas soak'
...
df.loc[idxPostHS:idxPnot100].proc = 'vent'
df.loc[idxPnot100:].proc = 'open'
The dataset:
df.info()
<class 'pandas.core.frame.DataFrame'>
TimedeltaIndex: 3383 entries, 00:00:00 to 00:56:25
Data columns (total 5 columns):
time 3383 non-null object
mass 3383 non-null float64
temp 3383 non-null float64
press 3383 non-null float64
proc 3383 non-null object
dtypes: float64(3), object(2)
memory usage: 158.6+ KB
df.index
Out[138]: In [139]:
TimedeltaIndex(['00:00:00', '00:00:01', '00:00:02', '00:00:03', '00:00:04',
'00:00:05', '00:00:06', '00:00:07', '00:00:08', '00:00:09',
...
'00:56:16', '00:56:17', '00:56:18', '00:56:19', '00:56:20',
'00:56:21', '00:56:22', '00:56:23', '00:56:24', '00:56:25'],
dtype='timedelta64[ns]', name='time', length=3383, freq=None)
The code isn't pretty, and lacks the smoothness python allows, and I continue to get errors, without really knowing where they come from having tested both options from the caveats on the pandas page:
**SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self[name] = value**
Thanks again for all of the help!
Aucun commentaire:
Enregistrer un commentaire