案例：回归求解

案例：回归求解 | 学习笔记

2022-11-13 217

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 快速学习案例：回归求解

开发者学堂课程【人工智能必备基础：概率论与数理统计：案例：回归求解】学习笔记，与课程紧密联系，让用户快速学习知识。

课程地址：https://developer.aliyun.com/learning/course/545/detail/7442

一、Lasso 回归

X[ 1,1,1,1]W1[1/4,1/4,1/4,1/4],w2[1,0,0,0], w1 和 w2 相加起来都等于1. 区分：趋势不同，w1 趋势更稳定，能处理更多类型的数据。

多加了一个绝对值项来惩罚过大的系数，alphas=0 那就是最小二乘了（表示惩罚力度）

In (1078]: # logarithmic scale: log base 2
# high values to rero-out more variables

alphas=2.**k np. arange(2, 12)（选择不同的参数）
scores . np. empty_like (alphas)
for i, a in enumerate (alphas) :
lasso . Lasso (random_state = seed)（指定好模型）

lasso. set_params (alpha = a)（指定惩罚力度）
lasso. fit(X_ train, y_train)
scores[i] = lasso. score(X_ test, y. test)（结果值是由测试集进行计算的，训练集只是建立方程，测试集是利用方程进行结果的评估）（一般测试集只是用来选参数）

（交叉验证）
lassocv = LassoCV(cv = 10, random state F seed)（指定交叉验证的个数）

lassocv, fit (features, target)（将原始数据存入）
lassocv_ score = lassocv. score (features, target)（得到 score 值）

lassocv_ alpha = lassocv. alpha_（将 α 值拿出）

plt. figure(figsize = (10, 4))

plt. plot (alphas, scores,’ -ko’ )
plt. axhline (lassocv score, color = c)

plt. xlabel(r’ $\alpha$’ )

plt. ylabel(‘CV Score’)
plt. xscale (‘log’,basex=2)
sns. despine(offset=15)

print(‘CV results:’, lassocv_ score, lassocv. _alpha)

CV results: 0826897171636 225757779669

lt’s already a good result: R squared of 0.82. Let’s take a look a the features contributing to the model.
In [1115]: a lassocv coefficients
coefs = pd. Series(lassocv. coef_ . index =features. columns)（将系数都拿出）
# prints out the number of picked/eliminated/features
print( “Lasso picked”+ str( sum(coefs!=0))+”features and eliminated the other
str(sum(coefs ==0))+” features”.)
# takes first and last 10
coefs =pd.concat([coefs. sort values(). head(5),coefs.sortvalues().tail(5)])
plt. figure(figsize = (10, 4))
coefs. plot(kind =” barh”, color= c)
plt. title("Coefficients in the Lasso Model)

plt. show()
Lasso picked 13 features and eliminated the other52 features.

In [1080]: model_11 = LassoCV(alphas = alphas, cv =10, random_state = seed).fit(X train. y_ train)
y_ pred_ 11 = mode1_ 11. predict(x test)
model_ 11. score(X_ test, y_ test)

Out[1080]: 0. 83307445226244159
We get higher score on the test than on the train set, which showsthat the model can propbably generalize well on the unseen data.
In (1113]: =# residual plot
plt. rcParams[‘ figure. Figsize’] = (6.0, 6. 0)preds = pd. DataFrame(“preds “: model_11. predict(X_ train), “true": y _train))（将当前结果导入）

preds[" residuals “] = preds[" true"] preds[preds" ]
preds. plot(x = preds , y = residuals, kind = scatter, color = c)
Out(1113]: <matplotlib. axes. _ subplots. AxesSubplot at 0x15c9f48d0>

可用 x 值表示预测值，y 值表示之间的差异。

In [1082]: def MSE(y_ true,y_ pred):
mse = mean_squared_error(y_true, y_pred)

print(‘MSE:% 92. 3f’% mse)
return mse
def R2(y_true,y_pred):
r2 = r2_score(y_true, y_pred)

print(‘ R2: %2. 3f’% r2)
return r2
MSE(y_test, y_pred _11);（表示局方误差）

R2(y_test, y_pred 11);
MSE: 3870543. 789

R2: 0.833
In [1083]: # predictions
d= (‘ true’: list(y_ test),
‘predicted&apos’ : pd. Series(y_pred_11)

}
pd. DataFrame (d). head()

Out[1083]:

Predicted true
0 8698.454622 8499.0
1 16848.107734 17450.0
2 11050.616354 9279.0
3 10177.257093 7975.0
4 6505.098638 6692.0

scikit-learn 的用法：

进入 Linear Models，刚才使用的是linear_model.Lasso([alpha,fit_intercept,…])（表示交叉验证）linear_model.LassoCV([eps,n_alphas,…])（直接进行建模）。比如对模块不熟悉，点击进入查看即可。可看到公式既公式是怎么算的。也会说明 α 的作用。会有例子。会说明用 scikit-learn 进行弧边模型的建立。scikit-learn 里的 API 可以帮助找到你想要的。

案例：回归求解 | 学习笔记