我有一个数字列表,我使用pandas.cut()将它们分成了多个箱。如何选择垃圾箱的一个类别?

manhattanBedrmsPrice.head()

0      859
5     1055
9      615
11     663
13    1317
Name: Price Value, dtype: int64


bins = [400,600,800,1000,1200, 1400,1600,1800,2000,2200,2400,2600,2800,3000]

manPriceCategories = pd.cut(manhattanBedrmsPrice, bins)

我得到以下类别:

Categories (13, interval[int64]): [(400, 600] < (600, 800] < (800, 1000] < (1000, 1200] ... (2200, 2400] < (2400, 2600] < (2600, 2800] < (2800, 3000]]

如何选择特定类别?

分析解答

您可以将类别分配给变量(例如cats),然后使用布尔索引来检查系列中的值是否等于感兴趣的类别。

cats = manPriceCategories.cat.categories
>>> manPriceCategories.loc[manPriceCategories.eq(cats[1])]
9     (600, 800]
11    (600, 800]
Name: Price Value, dtype: category
Categories (13, interval[int64]): [(400, 600] < (600, 800] < (800, 1000] < (1000, 1200] ... (2200, 2400] < (2400, 2600] < (2600, 2800] < (2800, 3000]]
cats = manPriceCategories.cat.categories
cats = manPriceCategories.cat.categories

您可以使用字典理解来枚举类别,以便知道其索引位置(例如,上述(600,800]的1)。

>>> {n: cat for n, cat in enumerate(cats)}
{0: Interval(400, 600, closed='right'),
 1: Interval(600, 800, closed='right'),
 2: Interval(800, 1000, closed='right'),
 3: Interval(1000, 1200, closed='right'),
 4: Interval(1200, 1400, closed='right'),
 5: Interval(1400, 1600, closed='right'),
 6: Interval(1600, 1800, closed='right'),
 7: Interval(1800, 2000, closed='right'),
 8: Interval(2000, 2200, closed='right'),
 9: Interval(2200, 2400, closed='right'),
 10: Interval(2400, 2600, closed='right'),
 11: Interval(2600, 2800, closed='right'),
 12: Interval(2800, 3000, closed='right')}