beta分布

开发者学堂课程【人工智能必备基础：概率论与数理统计：beta分布】学习笔记，与课程紧密联系，让用户快速学习知识。

课程地址：https://developer.aliyun.com/learning/course/545/detail/7427

内容介绍

一、定义

二、公式

三、参数

一、定义

beta 分布可以看作一个概率的概率分布，当你不知道一个东西的具体概率是多少时，它可以给出了所有概率出现的可能性大小

举一个简单的例子，熟悉棒球运动的都知道有一个指标就是棒球击球率 (batting average)，就是用一个运动员击中的球数除以击球的总数，我们一般认为 0.266 是正常水平的击球率，而如果击球率高达 0.3 就被认为是非常优秀的。现在有一个棒球运动员，我们希望能够预测他在这一赛季中的棒球击球率是多少。你可能就会直接计算棒球击球率，用击中的数除以击球数，但是如果这个杯球运动员只打了一次，而且还命中了，那么他就击球军就是 100% 了，这显然是不合理的，因为根据棒球的历史信息，我们知道这个击球率应该是 0.215 到 0.36 之间才对啊。对于这个问题一个最好的方法就是用 beta 分布，这表示在我们没有看到这个运动员打球之前，我们就有了一个大概的范围。beta 分有的定义域是 (0.1) 这就跟概率的范围是一样的。接下来我们将这些先验信息转换 beta 分布的参数，我们知道一个击球率应该是平均 0.27 左右，而他的范围是 0.21 到 0.35 , 那么根据这个信息，我们可以取 α=81, β=219 (击中了 81 次，未击中 219 次)

之所以取这两个参数是因为:

beta 分布的均值是从图中可以看到这个分布主要落在了( 0.2，0.35 )间，这是经验中得出的合理的范围。

在这个例子里，我们的 x 轴就表示各个击球率的取值，x 对应的 y 值就是这个击球率所对应的概率，也就是说 beta 分布可以看作一个概率的概率分布。

0 和 β0 是一开始的参数，在这里是 81 和 219。当, α 增加了 1(击中了一次。β 没有增加(没有漏球)。这就是我们的新的 beta 分布 Beta(81+1,219)

可以看到这个分布其实没多大变化，这是因为只打了 1 次球并不能说明什么问题。但是如果我们得到了更多的数据，假设一共打了 300 次，其中击中了 100 次，200 次没击中，那么这一新分布就是: Beta(81+100, 219+200)

注意到这个曲线变得更加尖，并且平移到了一个右边的位置，表示比平均水平要高。因此，对于一个我们不知道概事是什么，而又有一些合理的猜测时，beta 分布能很好的作为一个表示概率的概率分布。

二、公式

α and β 可以当成是我们成功，失败的次数

In[1]:#IMPORTS

import numpy as np

import scipy.states as stats

import matplotlib.pyplot as plt

import matplotlib.style as style

from IPython.core.display import HTML

# PLOTTING CONFIG

%matplotlib inline

style.use(‘fivethirtyeight’)

plt.rcParams[“figure.figsize”]=(14,7)

plt.figure(dpi=100)

#PDF

plt.plot(np.linspace(0，1，100)，

stats.beta.pdf(np.linspace(0，1，100),a=2,b=2))

print(stats.beta.pdf(np.linspace(0，1，100),a=2,b=2))

plt.fill_between(np.linspace(0，1，100),

stats.beta.pdf(np.linspace(0，1，100),a=2,b=2)

alpha=.15

)

#CDF

plt.plot(np.linspace(0，1，100),

stats.beta.cdf(np.linspace(0，1，100),a=2,b=2),

)

#LEGEND

plt. test(X=0.1, y=.7,s=”pdf (normed)", rotation=52, alpha=. 75, weight="bold", color="#oo8fd5")
plt. Text(x=0.45, y=.5,s=”cdf". Rotation=40, alpha=.75, weight="bold", color="#fc4f30”)
#TICKS
plt. tick_params(axis=’both’ , which=’major’，labelsize=18)
plt.ahline(y=0, color =’black’， linewidth=1.3, alpha=.7)
#TTLE，SUBTITLE & FOOTER
pit. Text(x=-.125 , y=1.85, s . "Beta Distribution - Overview",
fontsize=26,weight=’bold’,alpha=.75)
plt. text(x = -. 125, y = 1.6,
s= , Depicted below are the normed probability density function (pdf) and the cumulative density\nfunction (cdf) of a beta dist

fontsize = 19, alpha =. 85)

Out[1]: Text(-0. 125, 1. 6,’ Depicted below are the normed probability density function (pdf) and the cumulative density\nfunction (cdf) of a bdta
distributed random variable $ y \\sim Beta(\\alpha,\\beta)$, given $ alpha =2 $ and $ \\beta =2$’,)
Beta Distribution - Overview
Depicted below are the normed probability density function (pdf) and the cumulative densityfunction (cdf) of a beta distributed random variable y~ Beta(α, ß), given α=2 and ß= 2.

CDF 越往右越接近 1。

三、参数

In [2]: plt. figure (dpi=100)

#A=B=I

plt. plot (np. linspace(0, 1, 200),

stats. beta. pdf (np. linspace(0, 1, 200), a=1, b=1),（相当于α和β）

plt. fill_between(np. linspace(0, 1, 200),

stats. beta. pdf (np. linspace(0, 1, 200), a=1, b=1),

alpha=. 15,

)
#A=B=10
рlt. рlоt (nр. lіnѕрасе(0, 1, 200),
stats. beta. pdf (np. linspace(0, 1, 200), a=10, b=10),

plt. fill_between(np. linspace(0, 1, 200),
stats. beta. pdf (np. linspace(0, 1, 200), a=10, b=10),

alpha=.15,

#A=B=100
plt. plot (np. linspace(0, 1, 200),
stats. beta. pdf (np. linspace (0,1,200), a=100, b=100),

)
рlt. fіll_ bеtwееn (nр. lіnѕрасе (0,1, 200),
stats. beta. pdf (np. linspace(0,1, 200), a=100, b=100),

Alpha=.15,

)

# LEGEVD
plt. text(x=0.1, y=l. 45, s=r"$ alpha= 1, \beta= 1$", alpha=. 75, weight="bold", color= #008fd5 )
plt. text(x=0. 325, y=3.5, s=r"$ alpha = 10, \beta = 10$", rotation=35, alpha=. 75, weight=”bold”, color=”#fc4f30”)plt. text(x=0. 4125, y=S, s=r"$ alpha =100, \beta = 100$", rotation=80, alpha=. 75, weight=” bold" , color=”#e5ae38 “)
#TICKS
plt. tick_ params(axis = ‘both’ , which =’major’ , labelsize = 18)

plt. axhline(y = 0, color = ‘black’, linewidth = 1.3, alpha = .7)
#TITILE, SLBTITLE & FOOTER
plt. text(x = -.1, y = 13.75, s = r “Beta Distribution - constant $\frac (\alpha) (\beta)$, varying $\alpha + \beta$ ,
fontsize = 26, weight =’ bold’, alpha =.75)
plt. text(x = -.1, y= 12,
s =’Depicted below are three beta distributed random variables with’+’ r’equal $\frac (\alpha) (\beta) $ and varying $\alpha+beta

fontsize = 19, alpha = .85)
Out[2]: Text(-0. 1, 12,’Depicted below are three beta distributed random variables with equal $\\frac (\\alpha)(\\beta] $ and varying $\alpha+\beta$.\nAs one see the sum of $\\alpha + \\beta$ (mainly) sharpens the distribution (the bigger the sharper),’)
Beta Distribution – constant α/β,varying α+βDepicted below are three beta distributed random variables with equal α/β and varying α+β.As one can see the sum of α+ ß (mainly) sharpens the distribution (the bigger the sharper).

α=1，ß=,1 是直线，因数据量太少了，很难捕抓到规律。

让 α=10， ß=10 或 α=100，ß=100，都说明成功率 0.5，当 α 和 ß 时越尖，从而中间越大，为 0.5，分散就越少。数据越少，不是很尖，就说明，在 ß 分布当中，假设成功与失败的比例相同，数据量越多，α，ß 越大时，越确信。

In[3]:plt.figure(dpi=100)

#A/B=1/3
plt. plot (np. linspace (0,1,200),
stats. beta. pdf (np.linspace(0,1, 200), a=25, b=75),
)
plt. fill_ between (np. linspace (0,1, 200),
stats. beta. pdf (np. linspace(0,1, 200), a=25, b=75),

alpha=.15,
#A/B=I

plt. plot (пp. linspace(0, 1, 200),
stats. beta. pdf (np. linspace(0, 1, 200), ã=50, b=50),

)
plt. fill_ between (np. linspace(0, 1, 200),
stats. beta. pdf (np-linspace(0, 1, 200), a=50, b=50),alpha=.15,
#A/B=3
рlt. рlоt (пр. lіnѕрасе(0, 1, 200),
stats. beta. pdf (np. linspace(0,=l, 200), a=75, b=25),
)
plt. fill between (np. linspace(0, 1, 200),
stats. beta. pdf (np. linspace(0, 1, 200), a=75, b=25),

alpha=.15,

)

# LEGEND
plt. text(x=0. 15, y=5, s=r"$ \alpha = 25, \beta = 75$", rotation=80, alpha=. 75, weight= "bold” , color=”#008fd5” )plt. text(x=0. 39, y=5, s=r"$ \alpha = 50, \beta = 50$", rotation=80, alpha=. 75, weight=” bold “, color=" #fc4f30" )plt. text(x=0. 65, y=5, s=r"$ \alpha = 75, \beta = 25$", rotation=80, alpha=. 75, weight="bold", color=" #e5ae38” )
#TICKS
plt. tick_ params(axis = ‘both’ , which = ‘major’, labelsize = 18)plt. axhline(y = 0, color = ‘black’ , linewidth = 1.3, alpha = .7)

#TITLE. SUBTITLE & FOOTER
plt.text(x=-.1,y=11.75, s=r”Beta Distribution - constant $\alpha + \beta$ and varying $\frac (\alpha) (\beta)$”
fontsize = 26, weight = ‘bold’ , alpha =.75)
plt. text(x=-.1,y=10,
s= ‘Depicted below are three beta distributed random variables with’+r ‘equal $\alpha+\beta$ and varying $\frac(\alpha)(\beta

fontsize = 19, alpha = . 85)
t[3]: Text(-0.1, 10,’Depicted below are three beta distributed random variables with’+r’ equal $\lalpha+\beta$ and varying $\frac(\\alpha)(\\beta) $.\nAs one can see the fraction of $\\frac (\\alpha) (\\beta)$(mainly) shifts the distribution ($\\alpha$ towards 1, $\\beta$ toward 0,’)

Beta Distribution - constant α+ ß, varying α/ß
Depicted below are three beta distributed random variables with equalα + ß and varying α/ß.

As one can see the fraction of α/ß (mainly) shifts the distribution (α towards 1, ß towards 0).

随着 α 和 ß 值得变化，得出的结果就会不懂。α 和 ß 的比值就代表分布情况。

In [4]: from scipy. stats import beta
# draw a single sample
print (beta.rvs(a=2, b=2), end= “\n\n”)
#draw10sampilés
print (beta. rvs(a=2, b=2,size=10))
0.212420441349

[0. 1707566 0.82840325 0.53855684 0.5391192 0.60484497
0.7118628
0.65452413 0.50019168 0.42672272 0.53547624]
Probability Density Function
In [5]: from scipy. stats import beta
# additional import for plotting
import numpy as np
import matplotlib. pyplot as plt
%matplotlib inline
pit, rcParams[“figure, figsize"] = (14, 7)
# cootinuous pdf for the plot

x_S =np.1inspace(0, 1, 100)

y_s =beta.pdf(a=2, b=2, x=x_ s)

pit. scatter(x_s, y_s);

Cumulative Probability Density Function
In [6]:from scipy. stats import beta
#probability of x less or equal o.3
print("P(X <0.3) = (:. 3]". format (beta. cdf(a=2, b=2, x=0. 3)))
# probability of x in [-0.2, +0. 2]
print("P(-0.2 < X <0.2) = (:.3)". format (beta.cdf(a=2, b=2, x=0.2) - beta. cdf(a=2, b=2, x= -0.2)))

P(X <0.3) =0.216
P(-0.2

beta分布 | 学习笔记

beta分布

一、定义

二、公式

三、参数

阿里云开发者学堂

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

beta分布 | 学习笔记

beta分布

一、定义

二、公式

三、参数

阿里云开发者学堂

热门文章

最新文章

相关电子书