一 摘要
二 关于onnx
三 transformers中的onnx包
3.1 onnx包简介
transformers 提供了transformers.onnx
3.2 onnx的相关配置
- Encoder-based models 继承 OnnxConfig
- Decoder-based models 继承 OnnxConfigWithPast
- Encoder-decoder models 继承 OnnxSeq2SeqConfigWithPast
四 transformers导出onnx示例
4.1 安装环境依赖
pip install transformers[onnx]
(tutorial-env) (base) [root@xxx onnx]# python -m transformers.onnx --help 2023-07-09 16:50:52.082389: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-07-09 16:50:52.965206: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT usage: Hugging Face Transformers ONNX exporter [-h] -m MODEL [--feature FEATURE] [--opset OPSET] [--atol ATOL] [--framework {pt,tf}] [--cache_dir CACHE_DIR] [--preprocessor {auto,tokenizer,feature_extractor,processor}] [--export_with_transformers] output positional arguments: output Path indicating where to store generated ONNX model. options: -h, --help show this help message and exit -m MODEL, --model MODEL Model ID on huggingface.co or path on disk to load model from. --feature FEATURE The type of features to export the model with. --opset OPSET ONNX opset version to export the model with. --atol ATOL Absolute difference tolerance when validating the model. --framework {pt,tf} The framework to use for the ONNX export. If not provided, will attempt to use the local checkpoint's original framework or what is available in the environment. --cache_dir CACHE_DIR Path indicating where to store cache. --preprocessor {auto,tokenizer,feature_extractor,processor} Which type of preprocessor to use. 'auto' tries to automatically detect it. --export_with_transformers Whether to use transformers.onnx instead of optimum.exporters.onnx to perform the ONNX export. It can be useful when exporting a model supported in transformers but not in optimum, otherwise it is not recommended.
4.2 导出命令
python -m transformers.onnx --model=distilbert-base-uncased onnx/
2023-07-09 16:48:37.895868: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-07-09 16:48:38.785971: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Framework not requested. Using torch to export to ONNX. Loading TensorFlow model in PyTorch before exporting to ONNX. Downloading tf_model.h5: 100%|███████████████████████████████████████████████████████| 363M/363M [00:36<00:00, 9.96MB/s] 2023-07-09 16:49:20.811614: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-07-09 16:49:20.813190: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... All TF 2.0 model weights were used when initializing DistilBertModel. All the weights of DistilBertModel were initialized from the TF 2.0 model. If your task is similar to the task the model of the checkpoint was trained on, you can already use DistilBertModel for predictions without further training. Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 200kB/s] Downloading (…)solve/main/vocab.txt: 100%|████████████████████████████████████████████| 232k/232k [00:00<00:00, 600kB/s] Downloading (…)/main/tokenizer.json: 100%|███████████████████████████████████████████| 466k/466k [00:00<00:00, 1.96MB/s] Using framework PyTorch: 2.0.1+cu117 /root/onnx/tutorial-env/lib/python3.10/site-packages/transformers/models/distilbert/modeling_distilbert.py:223: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. mask, torch.tensor(torch.finfo(scores.dtype).min) ============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 ============= verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== Validating ONNX model... -[✓] ONNX model output names match reference model ({'last_hidden_state'}) - Validating ONNX Model output "last_hidden_state": -[✓] (3, 9, 768) matches (3, 9, 768) -[✓] all values close (atol: 1e-05) All good, model saved at: onnx/model.onnx /root/onnx/tutorial-env/lib/python3.10/site-packages/transformers/onnx/__main__.py:178: FutureWarning: The export was done by transformers.onnx which is deprecated and will be removed in v5. We recommend using optimum.exporters.onnx in future. You can find more information here: https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model. warnings.warn(
除了一些提示和模型的config.json等配置文件之外,与官方示例基本一致。 上述命令导出由--model参数定义的检查点的ONNX图。在这个例子中,它是distilbert-base-uncased,但它可以是Hugging Face Hub上的任何checkpoint,也可以是本地存储的checkpoint。
4.3 模型加载
导出执行完毕后,可以在当前目录的onnx/目录下看到model.onnx。model.onnx文件可以在众多支持onnx标准的加速器之一上运行。例如,我们可以使用ONNX Runtime加载并运行模型,如下所示(注意执行命令的目录):
from transformers import AutoTokenizer from onnxruntime import InferenceSession tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") session = InferenceSession("onnx/model.onnx") # ONNX Runtime expects NumPy arrays as input inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np") outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs)) print(outputs)
from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig config = DistilBertConfig() onnx_config = DistilBertOnnxConfig(config) print(list(onnx_config.outputs.keys())) # 输出: ["last_hidden_state"]
这个过程与在hub上的Transformer checkpoints相同。例如,我们可以从 Keras organization导出一个纯TensorFlow checkpoint,如下所示:
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
from transformers import AutoTokenizer, AutoModelForSequenceClassification # 从hub加载tokenizer和PyTorch权重 tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") pt_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") # 保存到本地磁盘 tokenizer.save_pretrained("local-pt-checkpoint") pt_model.save_pretrained("local-pt-checkpoint")
python -m transformers.onnx --model=local-pt-checkpoint onnx/
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification # 从hub加载tokenizer和TensorFlow weights tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") tf_model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") # 保存到 tokenizer.save_pretrained("local-tf-checkpoint") tf_model.save_pretrained("local-tf-checkpoint")
五 小结