test.csv中的数据如下所示:

device_id,upload_time
id1,2020-06-01 07:46:30+00:00
id2,2020-06-05 16:04:32+00:00

我想删除upload_time中的+00:00并向upload_time添加8小时,然后生成一个新列new_upload_time

我用这段代码来做到这一点。

import pandas as pd
from datetime import datetime, timedelta

df = pd.read_csv(r'E:/test.csv',parse_dates=[1], encoding='utf-8')
df['new_upload_time'] = pd.DatetimeIndex(df['upload_time'].dt.strftime('%Y-%m-%d %H:%M:%S'))+timedelta(hours=8)
df.to_csv(r'E:/result.csv', index=False, mode='w', header=True)

result.csv:

device_id,upload_time,new_upload_time
id1,2020-06-01 07:46:30+00:00,2020-06-01 15:46:30
id2,2020-06-05 16:04:32+00:00,2020-06-06 00:04:32

尽管我已经实现了它,但是我觉得代码有点复杂。

有没有简单的方法?

分析解答

添加8小时后进行格式化。您正在将格式设置为字符串,然后尝试向其添加数字。首先添加数字,然后使用strftime格式化为字符串:

df['upload_time'] = pd.to_datetime(df['upload_time'])
df['new_upload_time'] = df['upload_time'] + pd.Timedelta(hours=8)
df['new_upload_time'] = df['new_upload_time'].dt.strftime('%Y/%m/%d %H:%M:%S')
df['upload_time'] = df['upload_time'].dt.strftime('%Y/%m/%d %H:%M:%S') #pass whaetver format that you want to `strftime` here.
df
Out1]: 
  device_id          upload_time      new_upload_time
0       id1  2020/06/01 07:46:30  2020/06/01 15:46:30
1       id2  2020/06/05 16:04:32  2020/06/06 00:04:32

您还可以在导出时指定date_format

df['upload_time'] = pd.to_datetime(df['upload_time'])
df['new_upload_time'] = df['upload_time'] + pd.Timedelta(hours=8)
df.to_csv('test.csv', index=False, date_format='%Y-%m-%d %H:%M:%S')