如何在pythonpandas中标记循环数的值

我有一列，迭代1到3。我需要一个循环号，该循环号出现在中间的列中。如何使用pandas获取第二列编号？

这是表格：

column  | I need   |Note
-----------------------------------------------------------------------
2       | 1        |first cycle although not starting from 1
3       | 1        |first cycle although not starting from 1
-----------------------------------------------------------------------
1       | 2        |second cycle
2       | 2        |second cycle
3       | 2        |second cycle
-----------------------------------------------------------------------
1       | 3        |
2       | 3        |
3       | 3        |
-----------------------------------------------------------------------
1       | 4        |
2       | 4        |
3       | 4        |
-----------------------------------------------------------------------
1       | 5        |
2       | 5        |
3       | 5        |
-----------------------------------------------------------------------
1       | 6        |
2       | 6        |
3       | 6        |
-----------------------------------------------------------------------
1       | 7        |7th cycle and does have to end in 3
2       | 7        |

分析解答

在您的样本数据按Series.diff计算第一个差异的情况下，比较不像0那样比较，最后比较Series.cumsum的最后总和：

df['new'] = df['column'].diff().lt(0).cumsum() + 1

如果值是字符串，则可以使用字典通过Series.map编码为数字：

df['new'] = df['column'].map({'1':0, '2':2, '3':3}).diff().lt(0).cumsum() + 1

print (df)
    column  I need  new
0        2       1    1
1        3       1    1
2        1       2    2
3        2       2    2
4        3       2    2
5        1       3    3
6        2       3    3
7        3       3    3
8        1       4    4
9        2       4    4
10       3       4    4
11       1       5    5
12       2       5    5
13       3       5    5
14       1       6    6
15       2       6    6
16       3       6    6
17       1       7    7
18       2       7    7

编辑：您可以使用enumerate在一组中为所有值创建地图字典：

d = {v:k for k, v in enumerate(['1','2','3'])}
#if possible create groups by all unique values - check order before
#print (df.columns.unique())
#d = {v:k for k, v in enumerate(df.columns.unique()}
df['new'] = df['column'].map(d).diff().lt(0).cumsum() + 1

如何在pythonpandas中标记循环数的值

Linux初学者云主机推荐