samedi 11 juin 2016

Pandas - Conditional Calculation Based on Shift Values from Two Other Columns


I'm sure this question is easy, but it has been stumping me for too long, so would REALLY appreciate some direction

I'm looking to add a column to a dataframe based on the results of two other columns

I want to identify if the stock is equal to the stock in the prior row and the date is equal to the date in the prior row.

I am looking to get the running count I tried something along the lines of the following

df['DayCount']=np.where(df['ticker'] ==df['ticker'].shift()) & np.where(df['trade_date']==df['trade_date'].shift() ,  1, 0)

and

df['DayCount'] = df.where(df['ticker'] ==df['ticker'].shift() &    df['trade_date']==df['trade_date'].shift(),1,0)

Sample input

Stock, Date, Time, Price 
IBM, 2014-09-01, 12:30:01, 50.5
IBM, 2014-09-01, 12:30:02, 50.7
IBM, 2014-09-01, 12:30:03, 50.9
IBM, 2014-09-02, 09:57:02, 52.7
IBM, 2014-09-02, 09:57:03, 52.9
AAPL, 2014-11-02, 09:57:02, 520.31
AAPL, 2014-11-02, 09:57:03, 520.92

And output:

Stock, Date,Time, Price, DayCount 
IBM, 2014-09-01, 12:30:01, 50.5,1
IBM, 2014-09-01, 12:30:02, 50.7,2
IBM, 2014-09-01, 12:30:03, 50.9,3
IBM, 2014-09-02, 09:57:02, 52.7,1
IBM, 2014-09-02, 09:57:03, 52.9,2
AAPL, 2014-11-02, 09:57:02, 520.31,1
AAPL, 2014-11-02, 09:57:03, 520.92,2

I got errors like

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

And then applying a cumulative count.

First, and this is most important to me, how do you write the initial statement so that you can do the compare over multiple columns

Second, how would you add in the cumulative count?

Thank you so much for helping

Expanding on the original post, here is another question.. Assume now that the data set is slightly different

Stock, Date, Time, Price,BidOffer
IBM, 2014-09-01, 12:30:01, 50.5, bid
IBM, 2014-09-01, 12:30:02, 50.7, offer
IBM, 2014-09-01, 12:30:03, 50.9, bid
IBM, 2014-09-02, 09:57:02, 52.7, bid
IBM, 2014-09-02, 09:57:03, 52.9, bid
AAPL, 2014-11-02, 09:57:02, 520.31, offer
AAPL, 2014-11-02, 09:57:03, 520.92, offer

And we are looking to see how many times in a row stocks traded on the bid or offer, so the output would be:

Stock, Date, Time, Price,BidOffer,Count
IBM, 2014-09-01, 12:30:01, 50.5, bid, 1 
IBM, 2014-09-01, 12:30:02, 50.7, offer, 1
IBM, 2014-09-01, 12:30:03, 50.9, bid,1
IBM, 2014-09-02, 09:57:02, 52.7, bid,1
IBM, 2014-09-02, 09:57:03, 52.9, bid,2
AAPL, 2014-11-02, 09:57:02, 520.31, offer,1
AAPL, 2014-11-02, 09:57:03, 520.92, offer,2

Groupings are effectively Stock and Date, time is just used to determine sequence.. any help much appreciated on this expansion


Aucun commentaire:

Enregistrer un commentaire