我有一个 pandas 数据框,我想计算 days_until_next_event
用于:
df = pd.DataFrame({'message_count': [1, 3, 5, 6, 2, 8, 10, 2], 'event_date': ['2016-01-05', '2016-01-05', '2016-01-05', '2016-01-13', '2016-01-13', '2016-01-13', '2016-01-28', '2016-01-28'], 'message_date': ['2016-01-05', '2016-01-06', '2016-01-10', '2016-01-13', '2016-01-16', '2016-01-22', '2016-01-28', '2016-01-30']})
event_date message_count message_date
2016-01-05 1 2016-01-05
2016-01-05 3 2016-01-06
2016-01-05 5 2016-01-10
2016-01-13 6 2016-01-13
2016-01-13 2 2016-01-16
2016-01-13 8 2016-01-22
2016-01-28 10 2016-01-28
2016-01-28 2 2016-01-30
预期的数据框如下所示:
days_until_next_event event_date message_count message_date
0 days 2016-01-05 1 2016-01-05
7 days 2016-01-05 3 2016-01-06
3 days 2016-01-05 5 2016-01-10
0 days 2016-01-13 6 2016-01-13
12 days 2016-01-13 2 2016-01-16
6 days 2016-01-13 8 2016-01-22
0 days 2016-01-28 10 2016-01-28
NaT 2016-01-28 2 2016-01-30
days_until_next_event
是 message_date
和下一个 new event_date
之间的差异。如果这两个日期相同,那么它的值为 0。我可以通过以下方式获得自上次事件以来的天数:
df2['days_since_last_dte'] = [(message - event) for message, event in zip(df2['message_date'], df2['event_date'])]
但是我无法添加最后一 block 比较它与下一个"new"event_date
请您参考如下方法:
IIUC(PS:假设你的 df 是排序的,如果不是 sort_values
首先)
df['New']=df.event_date.map(pd.Series(df.event_date.unique()[1:],index=df.event_date.unique()[:-1]))
df.loc[df.groupby('event_date').head(1).index,'DiffDays']=0
df
Out[1191]:
event_date message_count message_date New DiffDays
0 2016-01-05 1 2016-01-05 2016-01-13 0
1 2016-01-05 3 2016-01-06 2016-01-13 7 days 00:00:00
2 2016-01-05 5 2016-01-10 2016-01-13 3 days 00:00:00
3 2016-01-13 6 2016-01-13 2016-01-28 0
4 2016-01-13 2 2016-01-16 2016-01-28 12 days 00:00:00
5 2016-01-13 8 2016-01-22 2016-01-28 6 days 00:00:00
6 2016-01-28 10 2016-01-28 NaT 0
7 2016-01-28 2 2016-01-30 NaT NaT