我有一个数据,我希望得到该特定列的std偏差,然后再将其结果添加到原始数据。
import pandas as pd
raw_data = {'patient': [242, 151, 111,122, 342],
'obs': [1, 2, 3, 1, 2],
'treatment': [0, 1, 0, 1, 0],
'score': ['strong', 'weak', 'weak', 'weak', 'strong']}
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
df
patient obs treatment score
0 242 1 0 strong
1 151 2 1 weak
2 111 3 0 weak
3 122 1 1 weak
4 342 2 0 strong
所以我想得到score
列的std dev,它是按score
列分组的
所以我想要的方法是扫描列并找到patient
列并检查它是否也是numeric
(希望将来还添加它)和std偏差计算,最后将结果添加到orignial df
我试过这个;
std_dev_patient = []
for col in df.keys():
df=df.groupby("score")
if df[col]=='patient':
np.std(col).append(std_dev_patient)
else:
pass
df.concat([df,std_dev_patient], axis =1)
df
TypeError: 'str' object is not callable
有没有办法有效地完成这个过程?
谢谢
预期的产出
patient obs treatment score std_dev_patient std_dev_obs
0 242 1 0 strong 70.71 ..
1 151 2 1 weak 20.66 ..
2 111 3 0 weak 20.66 ..
3 122 1 1 weak 20.66 ..
4 342 2 0 strong 70.71 ..
分析解答
使用pandas.Dataframe.groupby.transform
:
df['std_dev_patient'] = df.groupby('score')['patient'].transform('std')
print(df)
print(df.select_dtypes(np.number).dtypes)
输出:
patient obs treatment score std_dev_patient
0 242 1 0 strong 70.710678
1 151 2 1 weak 20.663978
2 111 3 0 weak 20.663978
3 122 1 1 weak 20.663978
4 342 2 0 strong 70.710678
对于dtype
检查,将pandas.DataFrame.select_dtypes
与numpy.number
一起使用:
import numpy as np
g = df.groupby('score')
for c in df.select_dtypes(np.number).columns:
df['std_dev_%s' % c] = g[c].transform('std')
输出:
patient obs treatment score std_dev_patient std_dev_obs \
0 242 1 0 strong 70.710678 0.707107
1 151 2 1 weak 20.663978 1.000000
2 111 3 0 weak 20.663978 1.000000
3 122 1 1 weak 20.663978 1.000000
4 342 2 0 strong 70.710678 0.707107
std_dev_treatment
0 0.00000
1 0.57735
2 0.57735
3 0.57735
4 0.00000