hands-on-data-analysis 第二单元 2,3节

简介: 数据合并——concat横向合并

hands-on-data-analysis 第二单元 2,3节

第二节 数据重构

万事开头记得导入基本的库:

# 导入基本库
import numpy as np
import pandas as pd

2.1.数据合并——concat横向合并

官方文档:

pandas.concat — pandas 1.4.2 documentation (pydata.org)

pandas中DataFrame数据合并连接(merge、join、concat

text_left_up,text_right_up两张表,如果横向合并为一张表(就是列与列拼接在一起)

text_left_up

PassengerId Survived Pclass Name
0 1 0 3 Braund, Mr. Owen Harris
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 3 1 3 Heikkinen, Miss. Laina
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 5 0 3 Allen, Mr. William Henry

text_right_up:

Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 female 35.0 1.0 0.0 113803 53.1000 C123 S
4 male 35.0 0.0 0.0 373450 8.0500 NaN S
list_up = [text_left_up,text_right_up]
result_up = pd.concat(list_up,axis=1)
result_up.head()

得到:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S

就好比,把带有小明的学号的表和带有小明成绩的表合在一起。

2.2.数据合并——concat纵向合并

官方文档:

pandas.concat — pandas 1.4.2 documentation (pydata.org)

pandas中DataFrame数据合并连接(merge、join、concat

将train-left-down和train-right-down横向合并为一张表,并保存这张表为result_down。然后将上边的result_up和result_down纵向合并为result。

text_left_down的数据为:

PassengerId Survived Pclass Name
0 440 0 2 Kvillner, Mr. Johan Henrik Johannesson
1 441 1 2 Hart, Mrs. Benjamin (Esther Ada Bloomfield)
2 442 0 3 Hampe, Mr. Leon
3 443 0 3 Petterson, Mr. Johan Emil
4 444 1 2 Reynaldo, Ms. Encarnacion

text_right_down数据为:

Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 male 31.0 0 0 C.A. 18723 10.500 NaN S
1 female 45.0 1 1 F.C.C. 13529 26.250 NaN S
2 male 20.0 0 0 345769 9.500 NaN S
3 male 25.0 1 0 347076 7.775 NaN S
4 female 28.0 0 0 230434 13.000 NaN S
list_down=[text_left_down,text_right_down]
result_down = pd.concat(list_down,axis=1)
result = pd.concat([result_up,result_down])
result.head()

合并后的表为:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1.0 0.0 3.0 Braund, Mr. Owen Harris male 22.0 1.0 0.0 A/5 21171 7.2500 NaN S
1 2.0 1.0 1.0 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1.0 0.0 PC 17599 71.2833 C85 C
2 3.0 1.0 3.0 Heikkinen, Miss. Laina female 26.0 0.0 0.0 STON/O2. 3101282 7.9250 NaN S
3 4.0 1.0 1.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1.0 0.0 113803 53.1000 C123 S
4 5.0 0.0 3.0 Allen, Mr. William Henry male 35.0 0.0 0.0 373450 8.0500 NaN S

2.3.数据合并——join

官方文档:
pandas.DataFrame.join — pandas 1.4.2 documentation (pydata.org)

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

从官方文档上可以知道,join的方式比较灵活。

可以在 索引 上将 列 与其他 DataFrame 连接。 也可以通过传递一个列表,一次有效地按索引连接多个 DataFrame 对象。

参数有:

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

2.4. concat 与 join 比较

concat、join等的比较

Merge, join, concatenate and compare — pandas 1.4.2 documentation (pydata.org)

第三节 GroupBy 接口

官方文档:

pandas.DataFrame.groupby — pandas 1.4.2 documentation (pydata.org)

目录
相关文章
|
13天前
|
算法 数据挖掘 数据处理
文献解读-Sentieon DNAscope LongRead – A highly Accurate, Fast, and Efficient Pipeline for Germline Variant Calling from PacBio HiFi reads
PacBio® HiFi 测序是第一种提供经济、高精度长读数测序的技术,其平均读数长度超过 10kb,平均碱基准确率达到 99.8% 。在该研究中,研究者介绍了一种准确、高效的 DNAscope LongRead 管道,用于从 PacBio® HiFi 读数中调用胚系变异。DNAscope LongRead 是对 Sentieon 的 DNAscope 工具的修改和扩展,该工具曾获美国食品药品管理局(FDA)精密变异调用奖。
22 2
文献解读-Sentieon DNAscope LongRead – A highly Accurate, Fast, and Efficient Pipeline for Germline Variant Calling from PacBio HiFi reads
|
算法 Linux Shell
SGAT丨Single Gene Analysis Tool
SGAT丨Single Gene Analysis Tool
|
自然语言处理 算法 知识图谱
DEGREE: A Data-Efficient Generation-Based Event Extraction Model论文解读
事件抽取需要专家进行高质量的人工标注,这通常很昂贵。因此,学习一个仅用少数标记示例就能训练的数据高效事件抽取模型已成为一个至关重要的挑战。
146 0
《Data infrastructure architecture for a medium size organization tips for collecting, storing and analysis》电子版地址
Data infrastructure architecture for a medium size organization: tips for collecting, storing and analysis
86 0
《Data infrastructure architecture for a medium size organization tips for collecting, storing and analysis》电子版地址
《40 Must Know Questions to test a data scientist on Dimensionality Reduction techniques》电子版地址
40 Must Know Questions to test a data scientist on Dimensionality Reduction techniques
94 0
《40 Must Know Questions to test a data scientist on Dimensionality Reduction techniques》电子版地址
《Distributed End-to-End Drug Similarity Analytics and Visualization Workflow》电子版地址
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow
76 0
《Distributed End-to-End Drug Similarity Analytics and Visualization Workflow》电子版地址
《J.P.Morgan's massive guide to machine learning and big data jobs in finance》电子版地址
J.P.Morgan's massive guide to machine learning and big data jobs in finance
107 0
《J.P.Morgan's massive guide to machine learning and big data jobs in finance》电子版地址
|
机器学习/深度学习 算法 数据挖掘
Re18:读论文 GCI Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis
Re18:读论文 GCI Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis
Re18:读论文 GCI Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis
|
机器学习/深度学习 数据采集 人工智能
Re10:读论文 Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous gr
Re10:读论文 Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous gr
Re10:读论文 Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous gr
|
自然语言处理 算法 数据可视化
Re21:读论文 MSJudge Legal Judgment Prediction with Multi-Stage Case Representation Learning in the Real
Re21:读论文 MSJudge Legal Judgment Prediction with Multi-Stage Case Representation Learning in the Real
Re21:读论文 MSJudge Legal Judgment Prediction with Multi-Stage Case Representation Learning in the Real