我有详细的发票数据。
+------------+---------------+--------+-----+-------+------------+-------------+
| Invoice No | Invoice Total | Item # | qty | price | Item Total | Inventory # |
+------------+---------------+--------+-----+-------+------------+-------------+
| 1 | 42 | 123 | 1 | 10 | 10 | 0 |
| 1 | 42 | 234 | 2 | 12 | 24 | 10 |
| 1 | 42 | 345 | 1 | 8 | 8 | 0 |
| 2 | 224 | 123 | 3 | 10 | 30 | 4 |
| 2 | 220 | 234 | 2 | 12 | 24 | 3 |
| 2 | 220 | 345 | 8 | 1 | 8 | 0 |
| 2 | 220 | 456 | 10 | 12 | 120 | 2 |
| 2 | 220 | 567 | 7 | 6 | 42 | 4 |
| 3 | 34 | 123 | 1 | 10 | 10 | 10 |
| 3 | 34 | 234 | 2 | 12 | 24 | 0 |
| 4 | 30 | 123 | 1 | 10 | 10 | 0 |
| 4 | 30 | 234 | 2 | 12 | 24 | 3 |
+------------+---------------+--------+-----+-------+------------+-------------+
对于每个唯一的个人Invoice No
,我想连接Inventory #
并用连接和排序值替换列(从左到右递增)。还应删除任何重复值。例如。 Invoice No - 2
有Inventory # - 4
重复两次。
我想要的结果如下
+------------+---------------+--------+-----+-------+------------+-------------+
| Invoice No | Invoice Total | Item # | qty | price | Item Total | Inventory # |
+------------+---------------+--------+-----+-------+------------+-------------+
| 1 | 42 | 123 | 1 | 10 | 10 | 0,10 |
| 1 | 42 | 234 | 2 | 12 | 24 | 0,10 |
| 1 | 42 | 345 | 1 | 8 | 8 | 0,10 |
| 2 | 224 | 123 | 3 | 10 | 30 | 0,2,3,4 |
| 2 | 220 | 234 | 2 | 12 | 24 | 0,2,3,4 |
| 2 | 220 | 345 | 8 | 1 | 8 | 0,2,3,4 |
| 2 | 220 | 456 | 10 | 12 | 120 | 0,2,3,4 |
| 2 | 220 | 567 | 7 | 6 | 42 | 0,2,3,4 |
| 3 | 34 | 123 | 1 | 10 | 10 | 0,10 |
| 3 | 34 | 234 | 2 | 12 | 24 | 0,10 |
| 4 | 30 | 123 | 1 | 10 | 10 | 0,3 |
| 4 | 30 | 234 | 2 | 12 | 24 | 0,3 |
+------------+---------------+--------+-----+-------+------------+-------------+
请指导我解决这个问题..
分析解答
我会做transform
,set
将删除重复和排序,然后只需要join
df['Inventory #']=df.groupby('Invoice No')['Inventory'].\
transform(lambda x : ','.join(set(x.astype(str))))