RoBERTa-Large的IA3微调-阿里云开发者社区

1.前言

作业要求

作业要求：在ModelArts平台上，使用MindSpore NLP组件对 Roberta-Large模型进行IA3微调训练。
数据集： GLUE-MRPC
具体要求：使用MindSpore NLP组件加载Roberta-Large模型，设置IA3算法配置并初始化微调模型，加载数据

集进行微调训练，并最终使用微调后的模型在验证集上进行评估。

战场准备

Hardware Environment(Ascend/GPU/CPU) / 硬件环境: Ascend: 1*ascend-snt9b1|ARM: 24核 192GB
Software Environment / 软件环境 (Mandatory / 必填):
ModelART--西南贵阳一
-mindspore_2.3.0
-cann_8.0.rc2
-py_3.9
-euler_2.10.7
- aarch64-snt9b
| Ascend Snt9B+ARM算法开发和训练基础镜像，AI引擎预置MindSpore
-MindNLP_0.4.1

# TERMINALBashpip uninstall mindformerspip install git+https://github.com/mindspore-lab/mindnlp.gitpip show mindnlp #检查版本是否最新版0.4.1#如果要尝试回到0.4.0版本pip install mindnlp==0.4.0pip show mindnlp# 找到返回值Location: /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages#手动安装common登陆Github/Mindnlp代码仓，整个下载下来。#把Mindnlp内的common文件夹整个上传到Notebook文件夹/home/ma-user/work/中cp -r /home/ma-user/work/common /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindnlp/#Notebook中验证是否安装成功import mindnlp.commonprint(mindnlp.common.__file__)#后面还会缺core文件cp -r /home/ma-user/work/core /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindnlp/

标准答案IA3_sequence_classification.ipynb

作业的标准答案其实一开始就放在了仓库里：

https://github.com/mindspore-lab/mindnlp/blob/master/llm/peft/ia3/sequence_classification.ipynb

这份标准答案是通过mindnlp.transformers库导入RoBERTa-Large模型的AutoModelForSequenceClassification模块，借助mindnlp.peft库的IA3微调模式，搭建训练流水线进行微调，完成MRPC的序列分类任务。

Tips：正逢MindNLP0.4.1版本，代码功能出了一点问题，无法正常更新模型参数，出现了每个Epoches打印出来的准确率和F1都一模一样的问题。后经反馈，快速修复。

但是有一次训练的epoches数量设置到20时，发现又一次出现了这个问题。此后一直把数量设置在12以下，目测应该不会再次触发。

编辑

如图，每个Epoches都返回同样的准确率=68.38%和F1=81.22%。

毫无疑问这是有毛病的，每个Epoche反馈结果刚好全部都一样，或许说明参数完全没有更新。只有这样才能解释：评估模式下每个epoche的推理都是一模一样的结果。

而更有经验的网友会发现，68%这个数字内有玄机：这是MRPC评估数据集的分布概率（68%的标签为“1”，其余为“0”）。如果把评估循环的outputs打印出来，估计全部都是“1"，也就是说模型参数都是在初始化之后没有改变过。（下面数据集环节会提到）

打印出来的outputs.logits里，一水的[1，1，1，1，1，1，1，1]（batch_size=8）

编辑

再进一步，打印argmax（取最大值，返回最大值的标签”0“或者”1“）处理前的outputs.logits，发现这个概率分布状况如下：”0'的概率都是负数，“1”的概率都是正数。

编辑

在缝合下面这篇DIY答案时，同样面对着这个恼人的问题。改了两天两夜，才在官方修复完这个问题后跑通了下面这篇。而这个时候，其实也已经改得和标准答案差不了多少了。

DIY答案peft_ia3_s2c_diy.ipynb

其实最开始并没有找到上面那一篇标准答案，我打算自己尝试根据相关代码来调整一篇。借助了以下两篇代码

代码1.MindNLP官方代码仓库提供的IA3微调序列对序列生成任务代码：

https://github.com/mindspore-lab/mindnlp/tree/master/llm/peft/ia3/seq_2_seq

代码2.昇思CSDN官方账号发布的《基于MindSpore NLP的Roberta模型Prompt Tuning》

https://blog.csdn.net/Kenji_Shinji/article/details/144395136

缝合思路：借助代码1来组成IA3微调的配置，借助代码2来组成RoBERTa模型的配置和数据的处理。最终完成这篇代码缝合，进行了以下的改动：

1roberta-large模型配置和AutoModelForSequenceClassification来完成序列分类任务

2peft.IA3Config的微调配置

3mrpc的数据集处理流

4统一变量名称，比如datasets['train']等

5改变标签的处理，从原来的生成标签改为分类标签。去掉了batch_decode和.generate的生成任务处理。

6数据格式从str改为int

7根据MRPC数据集的定义，重写了标签文本为'Dissimilar', 'Similar'

8改用了标准答案的优化器、peft调用、关闭了评估模式下梯度计算的禁用（我也不懂为什么但是有用）

最后得出的代码和标准相比，虽然不够简洁不够直接，增加了ppl和损失值打印。增加了许多打印中间值的环节，训练后保存了模型，并且再次加载推理验证输出文本标签。在一遍一遍地抠代码的过程中，加深了对代码的理解。

2.数据集GLUE-MRPC

0代表不相似，1代表相似。

本任务的数据集，包含两句话，每个样本的句子长度都非常长，且数据不均衡，正样本占比68%，负样本仅占32%。

MRPC(The Microsoft Research Paraphrase Corpus，微软研究院释义语料库)，相似性和释义任务，是从在线新闻源中自动抽取句子对语料库，并人工注释句子对中的句子是否在语义上等效。类别并不平衡，其中68%的正样本，所以遵循常规的做法，报告准确率（accuracy）和F1值。

一个包含5800对句子的文本文件，这些句子是从网络上的新闻来源中提取出来的，并带有人类注释，说明每一对句子是否包含释义/语义等价关系。最后出版日期：2005年3月3日

原文链接：https://blog.csdn.net/qq_33583069/article/details/115734097

MRPC数据集广泛应用于自然语言处理领域，特别是用于训练和评估释义识别模型。研究者和开发者可以使用该数据集来训练机器学习模型，以识别句子对之间的释义关系。此外，MRPC数据集也可用于验证和比较不同释义识别算法的性能。在使用过程中，用户可以根据具体需求选择合适的模型和算法，并通过交叉验证等方法评估模型的效果。

{

'sentence1': Tensor(shape=[], dtype=String, value= 'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .'),

'sentence2': Tensor(shape=[], dtype=String, value= 'Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .'),

'label': Tensor(shape=[], dtype=Int64, value= 1),

'idx': Tensor(shape=[], dtype=Int64, value= 0) }

from mindnlp.dataset import load_datasetdatasets = load_dataset("glue", "mrpc")print(next(datasets['train'].create_dict_iterator()))

from mindnlp.dataset import BaseMapFunction class MapFunc(BaseMapFunction): def __call__(self, sentence1, sentence2, label, idx): outputs = tokenizer(sentence1, sentence2, truncation=True, max_length=None) return outputs['input_ids'], outputs['attention_mask'], label def get_dataset(dataset, tokenizer): input_colums=['sentence1', 'sentence2', 'label', 'idx'] output_columns=['input_ids', 'attention_mask', 'labels'] dataset = dataset.map(MapFunc(input_colums, output_columns), input_colums, output_columns) dataset = dataset.padded_batch(batch_size, pad_info={'input_ids': (None, tokenizer.pad_token_id), 'attention_mask': (None, 0)}) return dataset train_dataset = get_dataset(datasets['train'], tokenizer)eval_dataset = get_dataset(datasets['validation'], tokenizer)

3.模型RoBERTa-Large

FacebookAI/xlm-roberta-large

https://huggingface.co/FacebookAI/xlm-roberta-large

Model size：561M params Tensor type：F32

机构：Facebook & 华盛顿大学作者：Yinhan Liu 、Myle Ott 发布地方：arxiv 论文地址：https://arxiv.org/abs/1907.11692 论文代码：https://github.com/pytorch/fairseq

https://blog.csdn.net/Decennie/article/details/120010025

通过AutoModelForSequenceClassification加载模型，进行序列分类任务。

from tqdm import tqdmimport mindsporefrom mindnlp.core.optim import AdamWfrom mindnlp.dataset import load_datasetimport mindnlp.peft as peftimport mindnlp.evaluate as evaluatefrom mindnlp.dataset import load_datasetfrom mindnlp.common.optimization import get_linear_schedule_with_warmupfrom mindnlp.transformers import AutoModelForSequenceClassification, AutoTokenizerbatch_size = 8model_name_or_path = "roberta-large"task = "mrpc"peft_type = peft.PeftType.IA3num_epochs = 12if any(k in model_name_or_path for k in ("gpt", "opt", "bloom")): padding_side = "left"else: padding_side = "right"tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, padding_side=padding_side)if getattr(tokenizer, "pad_token_id") is None: tokenizer.pad_token_id = tokenizer.eos_token_id model = AutoModelForSequenceClassification.from_pretrained(model_name_or_path, return_dict=True)model = peft.get_peft_model(model, peft_config)model.print_trainable_parameters()

有意思的是，当我们打印模型参数时，会发现与huggingface给出的参数不符。

编辑

为什么这里的全参数只有356M呢？

RobertaForSequenceClassification：这里使用的模型是 RobertaForSequenceClassification，这是一个用于序列分类任务的 RoBERTa 模型。它在基础的 RoBERTa 模型之上添加了一个分类头（classifier），用于处理特定的下游任务（如文本分类、情感分析等）。这个分类头包含额外的参数，例如 classifier.dense.bias、classifier.dense.weight、classifier.out_proj.bias 和 classifier.out_proj.weight，356M正是这个模型的参数。

RobertaForSequenceClassification

https://github.com/mindspore-lab/mindnlp/blob/master/mindnlp/transformers/models/roberta/modeling_roberta.py

模型代码中可以看到相关组件的列表

__all__ = [ "RobertaForCausalLM", "RobertaForMaskedLM", "RobertaForMultipleChoice", "RobertaForQuestionAnswering", "RobertaForSequenceClassification", "RobertaForTokenClassification", "RobertaModel", "RobertaPreTrainedModel",]

RobertaForSequenceClassification的代码

class RobertaForSequenceClassification(RobertaPreTrainedModel): """ This class represents a Roberta model for sequence classification tasks. It is a subclass of RobertaPreTrainedModel and is specifically designed for sequence classification tasks. The class's code includes an initialization method (__init__) and a forward method. The __init__ method initializes the RobertaForSequenceClassification object by taking a config argument. It calls the super() method to initialize the parent class (RobertaPreTrainedModel) with the provided config. It also initializes other attributes such as num_labels and classifier. The forward method takes several input arguments and returns either a tuple of tensors or a SequenceClassifierOutput object. It performs the main computation of the model. It first calls the roberta() method of the parent class to obtain the sequence output. Then, it passes the sequence output to the classifier to obtain the logits. If labels are provided, it calculates the loss based on the problem type specified in the config. The loss and other outputs are returned as per the value of the return_dict parameter. It is important to note that this class is specifically designed for sequence classification tasks, where the labels can be used to compute either a regression loss (Mean-Square loss) or a classification loss (Cross-Entropy). The problem type is determined automatically based on the number of labels and the dtype of the labels tensor. For more details on the usage and functionality of this class, please refer to the RobertaForSequenceClassification documentation. """ def __init__(self, config): """ Initializes a new instance of the RobertaForSequenceClassification class. Args: self: The instance of the class. config (RobertaConfig): The configuration object for the Roberta model. It contains the model configuration settings such as num_labels, which is the number of labels for classification. This parameter is required for configuring the model initialization. Returns: None. Raises: None. """ super().__init__(config) self.num_labels = config.num_labels self.config = config self.roberta = RobertaModel(config, add_pooling_layer=False) self.classifier = RobertaClassificationHead(config) # Initialize weights and apply final processing self.post_init() def forward( self, input_ids: Optional[mindspore.Tensor] = None, attention_mask: Optional[mindspore.Tensor] = None, token_type_ids: Optional[mindspore.Tensor] = None, position_ids: Optional[mindspore.Tensor] = None, head_mask: Optional[mindspore.Tensor] = None, inputs_embeds: Optional[mindspore.Tensor] = None, labels: Optional[mindspore.Tensor] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, ) -> Union[Tuple[mindspore.Tensor], SequenceClassifierOutput]: r""" Args: labels (`mindspore.Tensor` of shape `(batch_size,)`, *optional*): Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If `config.num_labels > 1` a classification loss is computed (Cross-Entropy). """ return_dict = ( return_dict if return_dict is not None else self.config.use_return_dict ) outputs = self.roberta( input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, position_ids=position_ids, head_mask=head_mask, inputs_embeds=inputs_embeds, output_attentions=output_attentions, output_hidden_states=output_hidden_states, return_dict=return_dict, ) sequence_output = outputs[0] logits = self.classifier(sequence_output) loss = None if labels is not None: if self.config.problem_type is None: if self.num_labels == 1: self.config.problem_type = "regression" elif self.num_labels > 1 and ( labels.dtype in (mindspore.int32, mindspore.int64) ): self.config.problem_type = "single_label_classification" else: self.config.problem_type = "multi_label_classification" if self.config.problem_type == "regression": if self.num_labels == 1: loss = F.mse_loss(logits.squeeze(), labels.squeeze()) else: loss = F.mse_loss(logits, labels) elif self.config.problem_type == "single_label_classification": loss = F.cross_entropy( logits.view(-1, self.num_labels), labels.view(-1) ) elif self.config.problem_type == "multi_label_classification": loss = F.binary_cross_entropy_with_logits(logits, labels) if not return_dict: output = (logits,) + outputs[2:] return ((loss,) + output) if loss is not None else output return SequenceClassifierOutput( loss=loss, logits=logits, hidden_states=outputs.hidden_states, attentions=outputs.attentions, )

4.微调IA3

简单的说就是，在冻结大模型参数的情况下，在输入时，给定一些样本包含数据和标签，同时给一个待预测数据，由模型输出这条数据的预测值。这个过程中模型的参数不发生变化。因此在应用到下游任务时，不需要更新参数，可以扩展到各种各样的任务场景。

https://blog.csdn.net/qq_44665283/article/details/139423937

PEFT配置

https://mindnlp-ai.readthedocs.io/en/latest/zh/tutorials/peft/

IA3配置

https://mindnlp-ai.readthedocs.io/en/stable/zh/api/peft/tuners/ia3/

# mindnlp.peft.tuners.ia3.config.IA3Config```py>>> from transformers import AutoModelForSeq2SeqLM, ia3Config>>> from peft import IA3Model, IA3Config>>> config = IA3Config(... peft_type="IA3",... task_type="SEQ_2_SEQ_LM",... target_modules=["k", "v", "w0"],... feedforward_modules=["w0"],... )

配置好peft_config和IA3Config,定义训练和评估过程

import mindnlp.peft as peftpeft_config = peft.IA3Config(task_type="SEQ_CLS", inference_mode=False)from mindnlp.core import value_and_graddef forward_fn(**batch): outputs = model(**batch) loss = outputs.loss return lossgrad_fn = value_and_grad(forward_fn, tuple(model.trainable_params()))for epoch in range(num_epochs): model.set_train() train_total_size = train_dataset.get_dataset_size() for step, batch in enumerate(tqdm(train_dataset.create_dict_iterator(), total=train_total_size)): optimizer.zero_grad() loss = grad_fn(**batch) optimizer.step() lr_scheduler.step() model.set_train(False) eval_total_size = eval_dataset.get_dataset_size() for step, batch in enumerate(tqdm(eval_dataset.create_dict_iterator(), total=eval_total_size)): outputs = model(**batch) predictions = outputs.logits.argmax(axis=-1) predictions, references = predictions, batch["labels"] metric.add_batch( predictions=predictions, references=references, ) eval_metric = metric.compute() print(f"epoch {epoch}:", eval_metric)

5.其他

AdamW

https://github.com/mindspore-lab/mindnlp/blob/master/mindnlp/core/optim/adamw.py

lr_scheduler

https://github.com/mindspore-lab/mindnlp/blob/master/mindnlp/core/optim/lr_scheduler.py

————————————————

原文链接：https://blog.csdn.net/a1966565/article/details/144751602

RoBERTa-Large的IA3微调