我有一个df如下:

  Date                values
20190101000000  1384.4801224435887
20190101000001  1384.5053056232982
20190101000002  1384.5304889818935
20190101000003  1384.5556725193492
20190101000004  1384.5808562356392
20190101000005  1384.606040130739
20190101000006  1384.631224204622
20190101000007  1384.6564084572635
20190101000008  1384.6815928886372
20190101000009  1384.7067774987179
20190101000010  1384.7319622874802
20190101000011  1384.757147254898
20190101000012  1384.7823324009464
20190101000013  1384.8075177255998
20190101000014  1384.8327032288325
20190101000015  1384.8578889106184
20190101000016  1384.8830747709321
20190101000017  1384.9082608097488
20190101000018  1384.9334470270423
20190101000019  1384.958633422787
20190101000020  1384.9838199969574
20190101000021  1385.0090067495285
20190101000022  1385.034193680474
20190101000023  1385.0593807897685
20190101000024  1385.0845680773864
20190101000025  1385.1097555433028
20190101000026  1385.134943187491
20190101000027  1385.160131009926
20190101000028  1385.1853190105826
20190101000029  1385.2105071894343
20190101000030  1385.2356955464566

其中Date列的格式为%Y%m%d%H%M%S。我将开始日期和结束日期作为用户输入,并以1秒的频率进行分割。

现在,我想从用户那里获取第二个频率值,并在那一刻从values列中获取该值。

例:

如果第二个分辨率是10秒,则输出必须如下:

start              end                 value
20190101000000  20190101000010  1384.7319622874802
20190101000011  20190101000020  1384.9838199969574
20190101000021  20190101000030  1385.2356955464566

从上面的df可以看到,如果分辨率为10sec,则必须获得每10秒的值。

如果第二个分辨率为15分钟,则输出必须如下:

 start                 end             values
20190101000000  20190101001500  1407.2142300429964
20190101001501  20190101003000  1416.6996533329484
20190101003001  20190101004500  1424.2467631293005

如何才能做到这一点?

到目前为止,我的代码是:

import datetime
import pandas as pd

START_DATE = str(input('Enter start date in %Y-%m-%d %H:%M:%S format: '))
END_DATE = str(input('Enter end date in %Y-%m-%d %H:%M:%S format: '))
RESOLUTION = 'S'

dates = pd.date_range(START_DATE, END_DATE, freq = RESOLUTION)
dates = pd.DataFrame(pd.Series(dates).dt.strftime('%Y%m%d%H%M%S'), columns = ['Date'])
分析解答

比较通过timedelta取模的日期时间值转换为下划线格式,然后通过DataFrame.insertSeries.shift创建新列,最后使用iloc删除第一行:

res = '10s'
m = pd.to_datetime(df['Date']).to_numpy().astype(np.int64) % pd.Timedelta(res).value == 0

df = df[m].rename(columns={'Date':'end'})
df.insert(0, 'start', df['end'].shift())
df = df.iloc[1:]
print (df)
             start             end              values
10  20190101000000  20190101000010  1384.7319622874802
20  20190101000010  20190101000020  1384.9838199969574
30  20190101000020  20190101000030  1385.2356955464566

最后加1秒使用:

df.loc[df.index[1:], 'start'] = (pd.to_datetime(df.loc[df.index[1:], 'start']) +
                                    pd.Timedelta('1s')).dt.strftime('%Y%m%d%H%M%S')
print (df)
             start             end              values
10  20190101000000  20190101000010  1384.7319622874802
20  20190101000011  20190101000020  1384.9838199969574
30  20190101000021  20190101000030  1385.2356955464566