hands-on-data-analysis 第二单元 2,3节
第二节 数据重构
万事开头记得导入基本的库:
# 导入基本库
import numpy as np
import pandas as pd
2.1.数据合并——concat横向合并
官方文档:
pandas.concat — pandas 1.4.2 documentation (pydata.org)
对text_left_up
,text_right_up
两张表,如果横向合并为一张表(就是列与列拼接在一起)
text_left_up
:
PassengerId | Survived | Pclass | Name | |
---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) |
4 | 5 | 0 | 3 | Allen, Mr. William Henry |
text_right_up
:
Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|
0 | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
1 | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
2 | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
4 | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
list_up = [text_left_up,text_right_up]
result_up = pd.concat(list_up,axis=1)
result_up.head()
得到:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 0.0 | 3.0 | Braund, Mr. Owen Harris | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2.0 | 1.0 | 1.0 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
2 | 3.0 | 1.0 | 3.0 | Heikkinen, Miss. Laina | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4.0 | 1.0 | 1.0 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
4 | 5.0 | 0.0 | 3.0 | Allen, Mr. William Henry | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
就好比,把带有小明的学号的表和带有小明成绩的表合在一起。
2.2.数据合并——concat纵向合并
官方文档:
pandas.concat — pandas 1.4.2 documentation (pydata.org)
将train-left-down和train-right-down横向合并为一张表,并保存这张表为result_down。然后将上边的result_up和result_down纵向合并为result。
text_left_down
的数据为:
PassengerId | Survived | Pclass | Name | |
---|---|---|---|---|
0 | 440 | 0 | 2 | Kvillner, Mr. Johan Henrik Johannesson |
1 | 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) |
2 | 442 | 0 | 3 | Hampe, Mr. Leon |
3 | 443 | 0 | 3 | Petterson, Mr. Johan Emil |
4 | 444 | 1 | 2 | Reynaldo, Ms. Encarnacion |
text_right_down
数据为:
Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|
0 | male | 31.0 | 0 | 0 | C.A. 18723 | 10.500 | NaN | S |
1 | female | 45.0 | 1 | 1 | F.C.C. 13529 | 26.250 | NaN | S |
2 | male | 20.0 | 0 | 0 | 345769 | 9.500 | NaN | S |
3 | male | 25.0 | 1 | 0 | 347076 | 7.775 | NaN | S |
4 | female | 28.0 | 0 | 0 | 230434 | 13.000 | NaN | S |
list_down=[text_left_down,text_right_down]
result_down = pd.concat(list_down,axis=1)
result = pd.concat([result_up,result_down])
result.head()
合并后的表为:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 0.0 | 3.0 | Braund, Mr. Owen Harris | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2.0 | 1.0 | 1.0 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
2 | 3.0 | 1.0 | 3.0 | Heikkinen, Miss. Laina | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4.0 | 1.0 | 1.0 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
4 | 5.0 | 0.0 | 3.0 | Allen, Mr. William Henry | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
2.3.数据合并——join
官方文档:
pandas.DataFrame.join — pandas 1.4.2 documentation (pydata.org)
Join columns of another DataFrame.Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.
从官方文档上可以知道,join
的方式比较灵活。
可以在 索引 或 列 上将 列 与其他 DataFrame 连接。 也可以通过传递一个列表,一次有效地按索引连接多个 DataFrame 对象。
参数有:
DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
2.4. concat 与 join 比较
concat、join等的比较
Merge, join, concatenate and compare — pandas 1.4.2 documentation (pydata.org)
第三节 GroupBy 接口
官方文档:
pandas.DataFrame.groupby — pandas 1.4.2 documentation (pydata.org)