Python 假设检验实例

Python 假设检验实例 | 学习笔记

2022-11-13 245

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 快速学习 Python 假设检验实例

开发者学堂课程【人工智能必备基础：概率论与数理统计：Python 假设检验实例】学习笔记，与课程紧密联系，让用户快速学习知识。

课程地址：https://developer.aliyun.com/learning/course/545/detail/7452

内容介绍

一. 数据集下载

二. 导入数据

三. 画出分布图

四. 正态检验

一.数据集下载

数据集下载地址: https://ww2.amstat.org/publications/ise/ise data archive.htm

数据集描述: http://ww 2.amstat.org/publications/ise/datase ts/normtem p.txt

包括 130 条记录，我们主要利用体温和性别来进行实验

二.导入数据

import pandas as pd

import pylab

import math

import numpy as np

import matplotlib.pyplot as plt%matplotlib inline

import numpy as np

from scipy.stats import norm

import scipy.stats

import warnings

warnings.filterwarnings("ignore")

df=pd.read csv('normtemp.txt’,sep=’',names = ['Temperature', Gender','Heart Rate'])

df.describe()

df.head()

（所导入的数据）

三.画出分布图

observed temperatures=df['Temperature'].sort_values()//找到 Temperature 列并排序

bin_val = np.arange(start= observed temperatures.min(), stop= observed temperatures.max(), step =.05)

mu, std = np. mean(observed temperatures),np.std(observed temperatures)//计算均值和标准差

p=norm.pdf(observedtemperatures, mu, std)//画出正态分布图

plt.hist(observed temperatures,bins =bin_val, normed=True, stacked=True)

plt.plot(observed temperatures,p,color='red")

plt.xticks(np.arange(95.75,101.25,0.25),rotation=90) plt.xlabel('Human BodyrTemperature Distributions') plt.xlabel(human body temperature)

plt.show()

print('Average (Mu):'+ str(mu)+’/’'Standard Deviation:' +str(s td))

所得正态分布图

四.正态检验

x=observed temperatures

#Shapiro-Wilk Test: https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk test shapiro test, shapirop=scipy.statsshapiro(x)

print("Shapiro-Wilk Stat:",shapiro test,"Shapiro-Wilk p-Value:", shapiro p)

k2,p=scipystats.normaltest(observed_temperatures) print(p:，p)

#Another method to determining normality is through Quantile-Quantile Plots scipy.stats.probplot(observed temperatures,dist=norm,plot=pylab)

pylab.show()

两种方法所得 p 值 Shapiro-Wilk Stat:0.9865769743919373 Shapiro-Wilk p-Value: 02331680953502655 p:0258747986349

蓝点和红线基本重合

三种方法都可以得出所导入的数据符合正态分布

画出 ecdf

def ecdf(data):

#Compute ECDF

n=len(data)

x=np.sort(data)

y=np.arange(1，n+1)/n

return x,y

# Compute empirical mean and standard deviation

# Number of samples

n= len(df['Temperature'])

# Sample mean

mu=np.mean(df['Temperature'])

# Sample standard deviation

std=npstd(df[Temperature'])

print('Mean temperature: ', mu, 'with standard deviation of +/-', std)

#Random sampling of the data based off of the mean of the data.

normalized sample=np.random.normal(mu, std,size=10000) x_temperature, y_temperature=ecdf(df['Temperature']) normalized_x,normalizedy=ecdf(normalized_sample)

黄色的点和蓝色的线基本吻合也可以确定所导入数据符合正态分布

做出假设检验

1.有学者提出 98.6 是人类的平均体温，我们该这样认为吗?

在这里我们选择 t 检验，因为我们只能计算样本的标准差

from scipy import stats

CW_mu=98.6

stats.ttest_lsamp(df['Temperature'],Cw_mu, axis=0)Ttest_IsampResult(statistic=-5.4548232923640771,pvalue=2.410632041561008le-07)

进行 t 检验

T-Stat-5.454p-value 近乎 0 了.我们该拒绝这样的假设

2.男性和女性的体温有明显差异吗

两独立样本 t 检验 HO :没有明显差异 H1 :有明显差异

female_temp=df.Temperature[df.Gender == 2] male_temp=df.Temperature[dfGender == 1]

mean female_temp=np.mean(femaletemp)

mean male temp=npmean(male_temp)

print('Average female body temperature='+str(mean female_temp))

print( Average male body temperature='+str(mean male temp))

# Compute independent t-test

stats.ttest_ind(female_temp,male_temp,axis=0)//传入两列数据

Average female body temperature=9839384615384616 Average male body temperature = 981046153846154

Ttest_indResult(statistic=2.2854345381654984, pvalue=002393188312240236)

由于 P 值 =0024<005，我们需要拒绝原假设，我们有 %95 的自信认为是有差异的!

Python 假设检验实例 | 学习笔记