朴素贝叶斯分类是一种基于概率的分类算法,它通过计算每个特征对于类别的贡献来预测给定数据的类别。它是一种监督学习(Supervised Learning)算法,用于解决分类问题(Classification)。
朴素贝叶斯分类的核心思想是基于贝叶斯定理,即给定类别的概率可以用给定特征的概率来计算。该算法假设特征之间相互独立,这是朴素贝叶斯分类的“朴素”之处。然后,它使用贝叶斯定理计算每个类别的概率并选择概率最高的类别作为预测结果。
下面是Python实现:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from collections import Counter
# 加载数据
iris = load_iris()
X = iris.data
y = iris.target
# 数据分割
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 计算类别的先验概率
counter = Counter(y_train)
priors = {k: v / len(y_train) for k, v in dict(counter).items()}
# 计算每个类别的每个特征的平均值和标准差
means = {}
stds = {}
for i in range(3):
X_i = X_train[y_train == i]
means[i] = X_i.mean(axis=0)
stds[i] = X_i.std(axis=0)
# 预测
def predict(X_test, means, stds, priors):
n_samples, n_features = X_test.shape
y_pred = np.zeros(n_samples)
for i in range(n_samples):
posteriors = []
for j in range(3):
# 计算高斯分布的密度
densities = np.exp(-(X_test[i] - means[j]) ** 2 / (2 * stds[j] ** 2)) / (np.sqrt(2 * np.pi) * stds[j])
# 计算后验概率
posterior = np.prod(densities) * priors[j]
posteriors.append(posterior)
y_pred[i] = np.argmax(posteriors)
return y_pred
y_pred = predict(X_test, means, stds, priors)
# 评估
accuracy = np.mean(y_pred == y_test)
print("Accuracy: {:.2f}%".format(accuracy * 100))