给定一个带有三列的pandasdataframe

输入：pd.dataframe，例如

input = pd.DataFrame({
    'A': [
         'asset_one',
         'asset_one',
         'asset_two',         
     ],
    'B': [
        'item_one',
        'item_two',
        'item_three'
    ],
    'C': ['feature_a', 'feature_b', 'feature_c']
})

我如何获得所需的输出：

output = {
      'asset_one': {'item_one': 'feature_a', 'item_two': 'feature_b'},
      'asset_two': {'item_three': 'feature_c'},
}

换句话说，我想通过第一列和返回ta的键，其键是群体，其value是在组之后用"sub-dataframe"命令。

分析解答

在GroupBy.apply中使用zip和dict的Lambda函数：

out = input.groupby('A').apply(lambda x: dict(zip(x.B, x.C))).to_dict()
print (out)
{'asset_one': {'item_one': 'feature_a', 'item_two': 'feature_b'}, 
 'asset_two': {'item_three': 'feature_c'}}

或使用dict comprehension：

out = {n: dict(zip(x.B, x.C)) for n, x in input.groupby('A')}
print (out)
{'asset_one': {'item_one': 'feature_a', 'item_two': 'feature_b'}, 
 'asset_two': {'item_three': 'feature_c'}}

给定一个带有三列的pandasdataframe

Linux初学者云主机推荐