我有一列,迭代1到3。我需要一个循环号,该循环号出现在中间的列中。如何使用pandas获取第二列编号?
这是表格:
column | I need |Note
-----------------------------------------------------------------------
2 | 1 |first cycle although not starting from 1
3 | 1 |first cycle although not starting from 1
-----------------------------------------------------------------------
1 | 2 |second cycle
2 | 2 |second cycle
3 | 2 |second cycle
-----------------------------------------------------------------------
1 | 3 |
2 | 3 |
3 | 3 |
-----------------------------------------------------------------------
1 | 4 |
2 | 4 |
3 | 4 |
-----------------------------------------------------------------------
1 | 5 |
2 | 5 |
3 | 5 |
-----------------------------------------------------------------------
1 | 6 |
2 | 6 |
3 | 6 |
-----------------------------------------------------------------------
1 | 7 |7th cycle and does have to end in 3
2 | 7 |
分析解答
在您的样本数据按Series.diff
计算第一个差异的情况下,比较不像0
那样比较,最后比较Series.cumsum
的最后总和:
df['new'] = df['column'].diff().lt(0).cumsum() + 1
如果值是字符串,则可以使用字典通过Series.map
编码为数字:
df['new'] = df['column'].map({'1':0, '2':2, '3':3}).diff().lt(0).cumsum() + 1
print (df)
column I need new
0 2 1 1
1 3 1 1
2 1 2 2
3 2 2 2
4 3 2 2
5 1 3 3
6 2 3 3
7 3 3 3
8 1 4 4
9 2 4 4
10 3 4 4
11 1 5 5
12 2 5 5
13 3 5 5
14 1 6 6
15 2 6 6
16 3 6 6
17 1 7 7
18 2 7 7
编辑:您可以使用enumerate
在一组中为所有值创建地图字典:
d = {v:k for k, v in enumerate(['1','2','3'])}
#if possible create groups by all unique values - check order before
#print (df.columns.unique())
#d = {v:k for k, v in enumerate(df.columns.unique()}
df['new'] = df['column'].map(d).diff().lt(0).cumsum() + 1