我有以下数据集;
Subject Student ID Student Number
0 Cit11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
1 EngLang11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
2 EngLit11 [S110, S111, S112, S113, S114, S115, S116, S11... 21
3 Fre11 [S95, S96, S97, S99, S100, S101, S102, S103, S... 26
4 Ger11 [S114, S115, S116, S117, S118, S124, S125, S12... 13
5 His11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
6 Mat11 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 45
7 Spa11 [S95, S97, S98, S99, S100, S102, S103, S104, S... 23
其中,“学生编号”是每个“主题”中“学生ID”的总数。
假设最大的“学生人数”应该为30(classroom_Max_Capacity返回值),下面的代码返回“学生人数”超过最大数量的索引。
idx = filtered_Group[filtered_Group['Student Number'] > classroom_Max_Capacity].index.tolist()
Output: [0, 1, 5, 6]
我想知道是否可以通过更改Subject'名称和
Student ID'将这些行分成两部分以适应最大学生人数;例如,
Subject Student ID Student Number
0 Cit11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
1 Cit11_2 [S110, S115, S116... 15
2 EngLang11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
3 EngLang11_2 [S110, S115, S116... 15
4 EngLit11 [S110, S111, S112, S113, S114, S115, S116, S11... 21
5 Fre11 [S95, S96, S97, S99, S100, S101, S102, S103, S... 26
6 Ger11 [S114, S115, S116, S117, S118, S124, S125, S12... 13
7 His11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
8 His11_2 [S110, S115, S116... 15
9 Mat11_1 [S95, S96, S97, S98, S99, S100, S101, S102, S1... 30
10 Matt11_2 [S110, S115, S116... 15
11 Spa11 [S95, S97, S98, S99, S100, S102, S103, S104, S... 23
通过不专门写修改的'Subject'
名称来添加到数据框中,这是否有可能?
我试图通过做类似的事情来解决这个问题。
filtered = filtered_Group.iloc[idx]
student_list = filtered['Student ID'].explode().str.split(', ')
subject_list = filtered['Subject']
for i in idx:
for number in range(classroom_Max_Capacity):
df.append({temp_subject_list[i]: temp_student_list[number]})
但是,当然,这是行不通的,因此不胜感激。
问题来源:stackoverflow
您可以使用explode枚举学生,然后使用groupby:
# randome data
np.random.seed(1)
df = pd.DataFrame({
'Subject': list('abcdef'),
'Student Number': [np.random.choice(np.arange(20),
np.random.randint(3,10),
replace=None)
for _ in range(6)]
})
# maximum number of students allowed
max_students = 5
# output:
(df.explode('Student Number')
.assign(section=lambda x: x.groupby('Subject')
.cumcount()//max_students + 1
)
.groupby(['Subject','section'])
['Student Number'].agg([list, 'count'])
)
输出:
list count
Subject section
a 1 [15, 10, 3, 18, 17] 5
2 [14, 16, 4] 3
b 1 [3, 2, 5, 8, 17] 5
2 [13, 10] 2
c 1 [11, 18, 2, 12, 16] 5
2 [17, 0, 4] 3
d 1 [16, 19, 11] 3
e 1 [16, 5, 4, 12, 15] 5
2 [19] 1
f 1 [18, 17, 3, 0, 1] 5
2 [9, 14, 13] 3
回答来源:stackoverflow
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。