我有一个df
如下:
Date values
20190101000000 1384.4801224435887
20190101000001 1384.5053056232982
20190101000002 1384.5304889818935
20190101000003 1384.5556725193492
20190101000004 1384.5808562356392
20190101000005 1384.606040130739
20190101000006 1384.631224204622
20190101000007 1384.6564084572635
20190101000008 1384.6815928886372
20190101000009 1384.7067774987179
20190101000010 1384.7319622874802
20190101000011 1384.757147254898
20190101000012 1384.7823324009464
20190101000013 1384.8075177255998
20190101000014 1384.8327032288325
20190101000015 1384.8578889106184
20190101000016 1384.8830747709321
20190101000017 1384.9082608097488
20190101000018 1384.9334470270423
20190101000019 1384.958633422787
20190101000020 1384.9838199969574
20190101000021 1385.0090067495285
20190101000022 1385.034193680474
20190101000023 1385.0593807897685
20190101000024 1385.0845680773864
20190101000025 1385.1097555433028
20190101000026 1385.134943187491
20190101000027 1385.160131009926
20190101000028 1385.1853190105826
20190101000029 1385.2105071894343
20190101000030 1385.2356955464566
其中Date
列的格式为%Y%m%d%H%M%S
。我将开始日期和结束日期作为用户输入,并以1秒的频率进行分割。
现在,我想从用户那里获取第二个频率值,并在那一刻从values
列中获取该值。
例:
如果第二个分辨率是10秒,则输出必须如下:
start end value
20190101000000 20190101000010 1384.7319622874802
20190101000011 20190101000020 1384.9838199969574
20190101000021 20190101000030 1385.2356955464566
从上面的df
可以看到,如果分辨率为10sec,则必须获得每10秒的值。
如果第二个分辨率为15分钟,则输出必须如下:
start end values
20190101000000 20190101001500 1407.2142300429964
20190101001501 20190101003000 1416.6996533329484
20190101003001 20190101004500 1424.2467631293005
如何才能做到这一点?
到目前为止,我的代码是:
import datetime
import pandas as pd
START_DATE = str(input('Enter start date in %Y-%m-%d %H:%M:%S format: '))
END_DATE = str(input('Enter end date in %Y-%m-%d %H:%M:%S format: '))
RESOLUTION = 'S'
dates = pd.date_range(START_DATE, END_DATE, freq = RESOLUTION)
dates = pd.DataFrame(pd.Series(dates).dt.strftime('%Y%m%d%H%M%S'), columns = ['Date'])
分析解答
比较通过timedelta取模的日期时间值转换为下划线格式,然后通过DataFrame.insert
和Series.shift
创建新列,最后使用iloc
删除第一行:
res = '10s'
m = pd.to_datetime(df['Date']).to_numpy().astype(np.int64) % pd.Timedelta(res).value == 0
df = df[m].rename(columns={'Date':'end'})
df.insert(0, 'start', df['end'].shift())
df = df.iloc[1:]
print (df)
start end values
10 20190101000000 20190101000010 1384.7319622874802
20 20190101000010 20190101000020 1384.9838199969574
30 20190101000020 20190101000030 1385.2356955464566
最后加1秒使用:
df.loc[df.index[1:], 'start'] = (pd.to_datetime(df.loc[df.index[1:], 'start']) +
pd.Timedelta('1s')).dt.strftime('%Y%m%d%H%M%S')
print (df)
start end values
10 20190101000000 20190101000010 1384.7319622874802
20 20190101000011 20190101000020 1384.9838199969574
30 20190101000021 20190101000030 1385.2356955464566