我有一个CSV文件,其中包含Year列(int)和Month列(str)。
Year | MonthLong |
---|---|
1984 | January |
1984 | January |
我想将每个月的最后一天放在单独的(dd/mm/yy)栏中。
我已经尝试了很多愚蠢的东西(只是学习并且仍然尝试了很多我擅长的excel工作),但是现在很茫然。
months = {'January':1, 'February':2, 'March':3, 'April':4, 'May':5, 'June':6, 'July':7, 'August':8, 'September':9, 'October':10, 'November':11, 'December':12}
#Creates a column with month as a number
tax_cols['Month'] = tax_cols['MonthLong'].map(months)
#Adds a day column to get to_datetime to work - day is wrong
tax_cols['Day'] = tax_cols['MonthLong'].map(months)
#Creates a full date column - day is wrong
tax_cols['FullDate'] = pd.to_datetime(tax_cols[['Year', 'Month', 'Day']])
从这里开始,我尝试了datetime/relativedelta和monthrange,但没有成功。有什么想法可以使我前进吗?
分析解答
将to_datetime
和%Y%B
一起用于YYYY
中的指定年份和月份名称格式的月份,对于月份中的最晚日期,请添加offsets.MonthEnd
:
date = pd.to_datetime(tax_cols['Year'].astype(str) + tax_cols['MonthLong'], format='%Y%B')
tax_cols['FullDate'] = date + pd.offsets.MonthEnd()
print (tax_cols)
Year MonthLong FullDate
0 1984 January 1984-01-31
1 1984 January 1984-01-31
如果可能,可以使用一些其他格式将errors='coerce'
用于NaT
,以解决此错误的日期时间:
date = pd.to_datetime(tax_cols['Year'].astype(str) + tax_cols['MonthLong'], format='%Y%B', errors='coerce')
tax_cols['FullDate'] = date + pd.offsets.MonthEnd()
print (tax_cols)
Year MonthLong FullDate
0 1984 January 1984-01-31
1 1984 January 1984-01-31