我有一个数字列表,我使用pandas.cut()将它们分成了多个箱。如何选择垃圾箱的一个类别?
manhattanBedrmsPrice.head()
0 859
5 1055
9 615
11 663
13 1317
Name: Price Value, dtype: int64
bins = [400,600,800,1000,1200, 1400,1600,1800,2000,2200,2400,2600,2800,3000]
manPriceCategories = pd.cut(manhattanBedrmsPrice, bins)
我得到以下类别:
Categories (13, interval[int64]): [(400, 600] < (600, 800] < (800, 1000] < (1000, 1200] ... (2200, 2400] < (2400, 2600] < (2600, 2800] < (2800, 3000]]
如何选择特定类别?
分析解答
您可以将类别分配给变量(例如cats
),然后使用布尔索引来检查系列中的值是否等于感兴趣的类别。
cats = manPriceCategories.cat.categories
>>> manPriceCategories.loc[manPriceCategories.eq(cats[1])]
9 (600, 800]
11 (600, 800]
Name: Price Value, dtype: category
Categories (13, interval[int64]): [(400, 600] < (600, 800] < (800, 1000] < (1000, 1200] ... (2200, 2400] < (2400, 2600] < (2600, 2800] < (2800, 3000]]
cats = manPriceCategories.cat.categories
cats = manPriceCategories.cat.categories
您可以使用字典理解来枚举类别,以便知道其索引位置(例如,上述(600,800]的1
)。
>>> {n: cat for n, cat in enumerate(cats)}
{0: Interval(400, 600, closed='right'),
1: Interval(600, 800, closed='right'),
2: Interval(800, 1000, closed='right'),
3: Interval(1000, 1200, closed='right'),
4: Interval(1200, 1400, closed='right'),
5: Interval(1400, 1600, closed='right'),
6: Interval(1600, 1800, closed='right'),
7: Interval(1800, 2000, closed='right'),
8: Interval(2000, 2200, closed='right'),
9: Interval(2200, 2400, closed='right'),
10: Interval(2400, 2600, closed='right'),
11: Interval(2600, 2800, closed='right'),
12: Interval(2800, 3000, closed='right')}