我有一个CSV文件,其中包含Year列(int)和Month列(str)。

Year MonthLong
1984 January
1984 January

我想将每个月的最后一天放在单独的(dd/mm/yy)栏中。

我已经尝试了很多愚蠢的东西(只是学习并且仍然尝试了很多我擅长的excel工作),但是现在很茫然。

months = {'January':1, 'February':2, 'March':3, 'April':4, 'May':5, 'June':6, 'July':7, 'August':8, 'September':9, 'October':10, 'November':11, 'December':12}

#Creates a column with month as a number
tax_cols['Month'] = tax_cols['MonthLong'].map(months)

#Adds a day column to get to_datetime to work - day is wrong
tax_cols['Day'] = tax_cols['MonthLong'].map(months)

#Creates a full date column - day is wrong
tax_cols['FullDate'] = pd.to_datetime(tax_cols[['Year', 'Month', 'Day']])

从这里开始,我尝试了datetime/relativedelta和monthrange,但没有成功。有什么想法可以使我前进吗?

分析解答

to_datetime%Y%B一起用于YYYY中的指定年份和月份名称格式的月份,对于月份中的最晚日期,请添加offsets.MonthEnd

date = pd.to_datetime(tax_cols['Year'].astype(str) + tax_cols['MonthLong'], format='%Y%B')
tax_cols['FullDate'] = date + pd.offsets.MonthEnd()
print (tax_cols)
   Year MonthLong   FullDate
0  1984   January 1984-01-31
1  1984   January 1984-01-31

如果可能,可以使用一些其他格式将errors='coerce'用于NaT,以解决此错误的日期时间:

date = pd.to_datetime(tax_cols['Year'].astype(str) + tax_cols['MonthLong'], format='%Y%B', errors='coerce')
tax_cols['FullDate'] = date + pd.offsets.MonthEnd()
print (tax_cols)
   Year MonthLong   FullDate
0  1984   January 1984-01-31
1  1984   January 1984-01-31