开发者社区> 问答> 正文

在熊猫数据框中将一行分为几行

我有以下数据集;

     Subject                                         Student ID  Student Number
0      Cit11  [S95, S96, S97, S98, S99, S100, S101, S102, S1...              45
1  EngLang11  [S95, S96, S97, S98, S99, S100, S101, S102, S1...              45
2   EngLit11  [S110, S111, S112, S113, S114, S115, S116, S11...              21
3      Fre11  [S95, S96, S97, S99, S100, S101, S102, S103, S...              26
4      Ger11  [S114, S115, S116, S117, S118, S124, S125, S12...              13
5      His11  [S95, S96, S97, S98, S99, S100, S101, S102, S1...              45
6      Mat11  [S95, S96, S97, S98, S99, S100, S101, S102, S1...              45
7      Spa11  [S95, S97, S98, S99, S100, S102, S103, S104, S...              23

其中,“学生编号”是每个“主题”中“学生ID”的总数。

假设最大的“学生人数”应该为30(classroom_Max_Capacity返回值),下面的代码返回“学生人数”超过最大数量的索引。

idx = filtered_Group[filtered_Group['Student Number'] > classroom_Max_Capacity].index.tolist()
Output: [0, 1, 5, 6]

我想知道是否可以通过更改Subject'名称和Student ID'将这些行分成两部分以适应最大学生人数;例如,

Subject                                         Student ID  Student Number
0      Cit11_1  [S95, S96, S97, S98, S99, S100, S101, S102, S1...              30
1      Cit11_2  [S110, S115, S116...                                           15
2  EngLang11_1  [S95, S96, S97, S98, S99, S100, S101, S102, S1...              30
3  EngLang11_2  [S110, S115, S116...                                           15
4     EngLit11  [S110, S111, S112, S113, S114, S115, S116, S11...              21
5        Fre11  [S95, S96, S97, S99, S100, S101, S102, S103, S...              26
6        Ger11  [S114, S115, S116, S117, S118, S124, S125, S12...              13
7      His11_1  [S95, S96, S97, S98, S99, S100, S101, S102, S1...              30
8      His11_2  [S110, S115, S116...                                           15
9      Mat11_1  [S95, S96, S97, S98, S99, S100, S101, S102, S1...              30
10     Matt11_2 [S110, S115, S116...                                           15
11       Spa11  [S95, S97, S98, S99, S100, S102, S103, S104, S...              23

通过不专门写修改的'Subject'名称来添加到数据框中,这是否有可能?

  • 编辑

我试图通过做类似的事情来解决这个问题。

filtered = filtered_Group.iloc[idx]

student_list = filtered['Student ID'].explode().str.split(', ')
subject_list = filtered['Subject']

for i in idx:
    for number in range(classroom_Max_Capacity):
        df.append({temp_subject_list[i]: temp_student_list[number]})

但是,当然,这是行不通的,因此不胜感激。

问题来源:stackoverflow

展开
收起
is大龙 2020-03-24 20:35:10 395 0
1 条回答
写回答
取消 提交回答
  • 您可以使用explode枚举学生,然后使用groupby:

    # randome data
    np.random.seed(1)
    df = pd.DataFrame({
        'Subject': list('abcdef'),
        'Student Number': [np.random.choice(np.arange(20), 
                                            np.random.randint(3,10),
                                            replace=None)
                           for _ in range(6)]
    })
    
    # maximum number of students allowed
    max_students = 5
    
    # output:
    (df.explode('Student Number')
       .assign(section=lambda x: x.groupby('Subject')
                                  .cumcount()//max_students + 1
              )
       .groupby(['Subject','section'])
       ['Student Number'].agg([list, 'count'])
    )
    

    输出:

                                    list  count
    Subject section                            
    a       1        [15, 10, 3, 18, 17]      5
            2                [14, 16, 4]      3
    b       1           [3, 2, 5, 8, 17]      5
            2                   [13, 10]      2
    c       1        [11, 18, 2, 12, 16]      5
            2                 [17, 0, 4]      3
    d       1               [16, 19, 11]      3
    e       1         [16, 5, 4, 12, 15]      5
            2                       [19]      1
    f       1          [18, 17, 3, 0, 1]      5
            2                [9, 14, 13]      3
    

    回答来源:stackoverflow

    2020-03-24 20:35:17
    赞同 展开评论 打赏
问答分类:
问答地址:
问答排行榜
最热
最新

相关电子书

更多
低代码开发师(初级)实战教程 立即下载
冬季实战营第三期:MySQL数据库进阶实战 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载