我有两个熊猫数据框
import pandas as pd
import numpy as np
import datetime
# intialise data of lists.
data = {'group' :["A","A","A","B","B","B","B"],
'val': ["AA","AB","AC","B1","B2","AA","AB"],
'cal1' :[4,5,7,6,5,8,9],
'cal2' :[10,100,100,10,1,10,100]
}
# Create DataFrame
df1 = pd.DataFrame(data)
df1
group val cal1 cal2
0 A AA 4 10
1 A AB 5 100
2 A AC 7 100
3 B B1 6 10
4 B B2 5 1
5 B AA 8 10
6 B AB 9 100
import pandas as pd
import numpy as np
import datetime
# intialise data of lists.
data = {'group' :["A","A","A","B","B","B","B"],
'flag' : [1,0,0,1,0,0,0],
'var1': [1,2,3,7,8,9,10]
}
# Create DataFrame
df2 = pd.DataFrame(data)
df2
group flag var1
0 A 1 1
1 A 0 2
2 A 0 3
3 B 1 7
4 B 0 8
5 B 0 9
6 B 0 10
步骤1:根据df1中的唯一“ val”在df2中创建列,如下所示:
unique_val = df1['val'].unique().tolist()
new_cols = [t + '_new' for t in unique_val]
for i in new_cols:
df2[i] = 0
df2
group flag var1 AA_new AB_new AC_new B1_new B2_new
0 A 1 1 0 0 0 0 0
1 A 0 2 0 0 0 0 0
2 A 0 3 0 0 0 0 0
3 B 1 7 0 0 0 0 0
4 B 0 8 0 0 0 0 0
5 B 0 9 0 0 0 0 0
6 B 0 10 0 0 0 0 0
步骤2:对于标志= 1的行,AA_new将被计算为var1(来自df2)*对于组“ A”为df1中的'cal1'值,而val“ AA” *对于组“ A”则为df1中为'cal2'的值”和val“ AA”,类似地,将AB_new计算为var1(来自df2)*对于组“ A”,来自df1的'cal1'的值; val“ AB” *值“ AB”,对于组“ A”,来自df1的cal2'的值val“ AB”
我的预期输出应如下所示:
group flag var1 AA_new AB_new AC_new B1_new B2_new
0 A 1 1 40 500 700 0 0
1 A 0 2 0 0 0 0 0
2 A 0 3 0 0 0 0 0
3 B 1 7 570 6300 0 420 35
4 B 0 8 0 0 0 0 0
5 B 0 9 0 0 0 0 0
6 B 0 10 0 0 0 0 0
问题来源:stackoverflow
将DataFrame.pivot_table
与GroupBy.bfill
一起使用,然后我们可以使用DataFrame.mul
。
df2.assign(\*df1.pivot_table(columns='val',
values='cal',
index = ['group', df2.index])
.add_suffix('_new')
.groupby(level=0)
#.apply(lambda x: x.bfill().ffill()) #maybe neccesary instead bfill
.bfill()
.reset_index(level='group',drop='group')
.fillna(0)
.mul(df2['var1'], axis=0)
.where(df2['flag'].eq(1), 0)
#.astype(int) # if you want int
)
输出
group flag var1 AA_new AB_new AC_new B1_new B2_new
0 A 1 1 4.0 5.0 7.0 0.0 0.0
1 A 0 2 0.0 0.0 0.0 0.0 0.0
2 A 0 3 0.0 0.0 0.0 0.0 0.0
3 B 1 7 56.0 63.0 0.0 42.0 35.0
4 B 0 8 0.0 0.0 0.0 0.0 0.0
5 B 0 9 0.0 0.0 0.0 0.0 0.0
6 B 0 10 0.0 0.0 0.0 0.0 0.0
编辑
df2.assign(\*df1.assign(mul_cal = df1['cal1'].mul(df1['cal2']))
.pivot_table(columns='val',
values='mul_cal',
index = ['group', df2.index])
.add_suffix('_new')
.groupby(level=0)
#.apply(lambda x: x.bfill().ffill()) #maybe neccesary instead bfill
.bfill()
.reset_index(level='group',drop='group')
.fillna(0)
.mul(df2['var1'], axis=0)
.where(df2['flag'].eq(1), 0)
#.astype(int) # if you want int
)
group flag var1 AA_new AB_new AC_new B1_new B2_new
0 A 1 1 40.0 500.0 700.0 0.0 0.0
1 A 0 2 0.0 0.0 0.0 0.0 0.0
2 A 0 3 0.0 0.0 0.0 0.0 0.0
3 B 1 7 560.0 6300.0 0.0 420.0 35.0
4 B 0 8 0.0 0.0 0.0 0.0 0.0
5 B 0 9 0.0 0.0 0.0 0.0 0.0
6 B 0 10 0.0 0.0 0.0 0.0 0.0
回答来源:stackoverflow
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。