&
:与
|
:或
不同条件需要用()括起来
import pandas as pd # 构造字典数据 dic = { "name":["shanjialan","shanyanhong","luckyapple"], "age":[21,23,12], "hobby":["sports","music","programming"] } # dataframe读取字典 df = pd.DataFrame(dic)
image.png
# 简单判断 print(df[df["age"]>18]) print(df[df["name"].str.len()>10]) # 复杂判断 print(df[(df["age"]>18)&(df["age"]<22)])
image.png
pandas字符串的方法
pandas字符串的方法.png
pandas 缺失数据的处理
pd.isnull(df)
:df每个数据是否为空的bool矩阵
pd.notnull(df)
:df每个数据是否不为空的bool矩阵
缺失数据包括np.nan/None
image.png
import pandas as pd import numpy as np # 构造字典数据 dic = { "name":["shanjialan","shanyanhong","luckyapple","hunvibe","chenwenhao"], "age":[21,23,0,np.nan,21], "hobby":["sports","music","programming","eating","basketball"] } # dataframe读取字典 df = pd.DataFrame(dic) print(pd.isnull(df)) print(pd.notnull(df))
image.png
缺失值的处理方法:删除或者填充
df.dropna(how='all/any',inplace='True/False',axis=n)
:
how
——以何种方式删除,all:所有数据都为nan,any表示只要有一个就可;inplace
:是否原地修改,TRUE为原地修改,FALSE为默认选择axis
:指定轴
df.fillna(value)
:填充为value值
print(df.dropna(how='any',axis=0,inplace=False)) print(df["age"].fillna(value=df['age'].mean()))
image.png
注意:在pandas中出现nan进行求均值等操作会默认为0,和在numpy中不同
处理0值
t[t==0]=np.nan