Question

我对pandas是个新手，这些都是我问题的细节，数据集是使用查询从Postgres DB表创建的： SELECT e_code, e_name, status,date FROM {table_name} WHERE month=(%s) AND year=(%s) AND status !='P' AND status !='OFF' GROUP BY ("e_code","status","e_name","date") ORDER BY("e_code") ;

这导致以下dataframe架：

   e_code  e_name  date  status
26    40        A 2023-04-21      H
24    40        A 2023-04-07      H
25    40        A 2023-04-14      H
28    42        B 2023-04-14      H
29    42        B 2023-04-21      H
..    ...      ...        ...     ...
79    80        S 2023-04-21      H
16    50        T 2023-04-10      1AL
80    50        T 2023-04-07      H
81    50        T 2023-04-14      H
82    50        T 2023-04-21      H

我正在尝试根据该员工行的状态来计算该员工的叶子，例如。如果status == "H"，则应将holiday_count递增1，如果status == *AL，则应通过 *增加other_leave_count并将日期存储在休假的列中，我需要为数据集中的所有员工执行该列表。

之后，我需要从总值（holiday_count和other_leave_count都在此处单独处理holiday_count和other_leave_count）减去该员工拍摄的假期的总NO，并将整个数据集打印到文件中。

我该怎么做才能实现这一目标？

由于目前这个问题对我来说很复杂，因此我正在计算假期和其他叶子，但目前仅计算剩余的假期。到目前为止，我已经尝试了：

附加新列holiday_date ether_leave_date并执行蒙版以确定在哪种类型的日期中确定该行的假期，然后计算该行的休假，然后我从总计中减去。

temp_df["holiday_date"]= ""
temp_df["casual_date"]=""
temp_df["holiday_count"]=0
temp_df["other_leave_count"]=0

temp_df["holiday_date"].mask(temp_df["status"]=="H", temp_df["date"], inplace=True)

# set count of holidays
temp_df.loc[temp_df["holiday_date"]!=0,"holiday_count"]=1

# set count of other leaves
temp_df.loc[temp_df["casual_date"]!=0,"other_leave_count"]=1


#final calculation

temp_df["holiday_leaves_taken"] = None
temp_df["holiday_leaves_taken"] = temp_df["total_no_holiday_leaves"]-temp_df["holiday_count"]

这会导致错误的最终值，因为假日和other_leave_count都在每行处理。

我尝试使用应用程序来创建和检查状态"H"和状态!="H的行，这与上述相同。
我尝试使用df.iterrows（）通过行进行迭代并在状态列中找到value并以这种方式执行计算，但是我无法区分行以了解它们是相同或不同的员工的记录。

预期输出： |员工名称| | | total_holiday_leaves | total_holiday_leaves | holday_leave_count | available_holiday_leaves | available_holiday_leaves |周期| | |总叶| other_leave_count | other_leave_count |可用_other_leaves | |：---：|：---：|：---：|：---：|：---：|：---：|：---：|：---：| |：---：|：---：|：---：| |：---：| | | | | | | | | | | x | 07.04.2023 14.04.2023 21.04.2023 | 07.04.2023 14.04.2023 21.04.2023 | 14 | 14 | 3 | 11 | 11 | NA|NA | 4 | 4 | 0 | 4 | 4 | 4 |

如何对具有重复标识符的多行分散的值进行计算

Linux初学者云主机推荐