一、引言
pipeline(管道)是huggingface transformers库中一种极简方式使用大模型推理的抽象,将所有大模型分为语音(Audio)、计算机视觉(Computer vision)、自然语言处理(NLP)、多模态(Multimodal)等4大类,28小类任务(tasks)。共计覆盖32万个模型
本文对pipeline进行整体介绍,之后本专栏以每个task为主题,分别介绍各种task使用方法。
二、pipeline库
2.1 概述
管道是一种使用模型进行推理的简单而好用的方法。这些管道是从库中抽象出大部分复杂代码的对象,提供了专用于多项任务的简单 API,包括命名实体识别、掩码语言建模、情感分析、特征提取和问答
。在使用上,主要有2种方法
- 使用task实例化pipeline对象
- 使用model实例化pipeline对象
2.2 使用task实例化pipeline对象
2.2.1 基于task实例化“自动语音识别”
自动语音识别的task为automatic-speech-recognition:
import os os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" os.environ["CUDA_VISIBLE_DEVICES"] = "2" from transformers import pipeline speech_file = "./output_video_enhanced.mp3" pipe = pipeline(task="automatic-speech-recognition") result = pipe(speech_file) print(result)
2.2.2 task列表
task共计28类,按首字母排序,列表如下,直接替换2.2.1代码中的pipeline的task即可应用:
"audio-classification"
:将返回一个AudioClassificationPipeline。"automatic-speech-recognition"
:将返回一个AutomaticSpeechRecognitionPipeline。"depth-estimation"
:将返回一个DepthEstimationPipeline。"document-question-answering"
:将返回一个DocumentQuestionAnsweringPipeline。"feature-extraction"
:将返回一个FeatureExtractionPipeline。"fill-mask"
:将返回一个FillMaskPipeline:。"image-classification"
:将返回一个ImageClassificationPipeline。"image-feature-extraction"
:将返回一个ImageFeatureExtractionPipeline。"image-segmentation"
:将返回一个ImageSegmentationPipeline。"image-to-image"
:将返回一个ImageToImagePipeline。"image-to-text"
:将返回一个ImageToTextPipeline。"mask-generation"
:将返回一个MaskGenerationPipeline。"object-detection"
:将返回一个ObjectDetectionPipeline。"question-answering"
:将返回一个QuestionAnsweringPipeline。"summarization"
:将返回一个SummarizationPipeline。"table-question-answering"
:将返回一个TableQuestionAnsweringPipeline。"text2text-generation"
:将返回一个Text2TextGenerationPipeline。"text-classification"
("sentiment-analysis"
可用别名):将返回一个 TextClassificationPipeline。"text-generation"
:将返回一个TextGenerationPipeline:。"text-to-audio"
("text-to-speech"
可用别名):将返回一个TextToAudioPipeline:。"token-classification"
("ner"
可用别名):将返回一个TokenClassificationPipeline。"translation"
:将返回一个TranslationPipeline。"translation_xx_to_yy"
:将返回一个TranslationPipeline。"video-classification"
:将返回一个VideoClassificationPipeline。"visual-question-answering"
:将返回一个VisualQuestionAnsweringPipeline。"zero-shot-classification"
:将返回一个ZeroShotClassificationPipeline。"zero-shot-image-classification"
:将返回一个ZeroShotImageClassificationPipeline。"zero-shot-audio-classification"
:将返回一个ZeroShotAudioClassificationPipeline。"zero-shot-object-detection"
:将返回一个ZeroShotObjectDetectionPipeline。
2.2.3 task默认模型
针对每一个task,pipeline默认配置了模型,可以通过pipeline源代码查看:
SUPPORTED_TASKS = { "audio-classification": { "impl": AudioClassificationPipeline, "tf": (), "pt": (AutoModelForAudioClassification,) if is_torch_available() else (), "default": {"model": {"pt": ("superb/wav2vec2-base-superb-ks", "372e048")}}, "type": "audio", }, "automatic-speech-recognition": { "impl": AutomaticSpeechRecognitionPipeline, "tf": (), "pt": (AutoModelForCTC, AutoModelForSpeechSeq2Seq) if is_torch_available() else (), "default": {"model": {"pt": ("facebook/wav2vec2-base-960h", "55bb623")}}, "type": "multimodal", }, "text-to-audio": { "impl": TextToAudioPipeline, "tf": (), "pt": (AutoModelForTextToWaveform, AutoModelForTextToSpectrogram) if is_torch_available() else (), "default": {"model": {"pt": ("suno/bark-small", "645cfba")}}, "type": "text", }, "feature-extraction": { "impl": FeatureExtractionPipeline, "tf": (TFAutoModel,) if is_tf_available() else (), "pt": (AutoModel,) if is_torch_available() else (), "default": { "model": { "pt": ("distilbert/distilbert-base-cased", "935ac13"), "tf": ("distilbert/distilbert-base-cased", "935ac13"), } }, "type": "multimodal", }, "text-classification": { "impl": TextClassificationPipeline, "tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (), "pt": (AutoModelForSequenceClassification,) if is_torch_available() else (), "default": { "model": { "pt": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"), "tf": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"), }, }, "type": "text", }, "token-classification": { "impl": TokenClassificationPipeline, "tf": (TFAutoModelForTokenClassification,) if is_tf_available() else (), "pt": (AutoModelForTokenClassification,) if is_torch_available() else (), "default": { "model": { "pt": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"), "tf": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"), }, }, "type": "text", }, "question-answering": { "impl": QuestionAnsweringPipeline, "tf": (TFAutoModelForQuestionAnswering,) if is_tf_available() else (), "pt": (AutoModelForQuestionAnswering,) if is_torch_available() else (), "default": { "model": { "pt": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"), "tf": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"), }, }, "type": "text", }, "table-question-answering": { "impl": TableQuestionAnsweringPipeline, "pt": (AutoModelForTableQuestionAnswering,) if is_torch_available() else (), "tf": (TFAutoModelForTableQuestionAnswering,) if is_tf_available() else (), "default": { "model": { "pt": ("google/tapas-base-finetuned-wtq", "69ceee2"), "tf": ("google/tapas-base-finetuned-wtq", "69ceee2"), }, }, "type": "text", }, "visual-question-answering": { "impl": VisualQuestionAnsweringPipeline, "pt": (AutoModelForVisualQuestionAnswering,) if is_torch_available() else (), "tf": (), "default": { "model": {"pt": ("dandelin/vilt-b32-finetuned-vqa", "4355f59")}, }, "type": "multimodal", }, "document-question-answering": { "impl": DocumentQuestionAnsweringPipeline, "pt": (AutoModelForDocumentQuestionAnswering,) if is_torch_available() else (), "tf": (), "default": { "model": {"pt": ("impira/layoutlm-document-qa", "52e01b3")}, }, "type": "multimodal", }, "fill-mask": { "impl": FillMaskPipeline, "tf": (TFAutoModelForMaskedLM,) if is_tf_available() else (), "pt": (AutoModelForMaskedLM,) if is_torch_available() else (), "default": { "model": { "pt": ("distilbert/distilroberta-base", "ec58a5b"), "tf": ("distilbert/distilroberta-base", "ec58a5b"), } }, "type": "text", }, "summarization": { "impl": SummarizationPipeline, "tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (), "pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (), "default": { "model": {"pt": ("sshleifer/distilbart-cnn-12-6", "a4f8f3e"), "tf": ("google-t5/t5-small", "d769bba")} }, "type": "text", }, # This task is a special case as it's parametrized by SRC, TGT languages. "translation": { "impl": TranslationPipeline, "tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (), "pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (), "default": { ("en", "fr"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}}, ("en", "de"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}}, ("en", "ro"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}}, }, "type": "text", }, "text2text-generation": { "impl": Text2TextGenerationPipeline, "tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (), "pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (), "default": {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}}, "type": "text", }, "text-generation": { "impl": TextGenerationPipeline, "tf": (TFAutoModelForCausalLM,) if is_tf_available() else (), "pt": (AutoModelForCausalLM,) if is_torch_available() else (), "default": {"model": {"pt": ("openai-community/gpt2", "6c0e608"), "tf": ("openai-community/gpt2", "6c0e608")}}, "type": "text", }, "zero-shot-classification": { "impl": ZeroShotClassificationPipeline, "tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (), "pt": (AutoModelForSequenceClassification,) if is_torch_available() else (), "default": { "model": { "pt": ("facebook/bart-large-mnli", "c626438"), "tf": ("FacebookAI/roberta-large-mnli", "130fb28"), }, "config": { "pt": ("facebook/bart-large-mnli", "c626438"), "tf": ("FacebookAI/roberta-large-mnli", "130fb28"), }, }, "type": "text", }, "zero-shot-image-classification": { "impl": ZeroShotImageClassificationPipeline, "tf": (TFAutoModelForZeroShotImageClassification,) if is_tf_available() else (), "pt": (AutoModelForZeroShotImageClassification,) if is_torch_available() else (), "default": { "model": { "pt": ("openai/clip-vit-base-patch32", "f4881ba"), "tf": ("openai/clip-vit-base-patch32", "f4881ba"), } }, "type": "multimodal", }, "zero-shot-audio-classification": { "impl": ZeroShotAudioClassificationPipeline, "tf": (), "pt": (AutoModel,) if is_torch_available() else (), "default": { "model": { "pt": ("laion/clap-htsat-fused", "973b6e5"), } }, "type": "multimodal", }, "image-classification": { "impl": ImageClassificationPipeline, "tf": (TFAutoModelForImageClassification,) if is_tf_available() else (), "pt": (AutoModelForImageClassification,) if is_torch_available() else (), "default": { "model": { "pt": ("google/vit-base-patch16-224", "5dca96d"), "tf": ("google/vit-base-patch16-224", "5dca96d"), } }, "type": "image", }, "image-feature-extraction": { "impl": ImageFeatureExtractionPipeline, "tf": (TFAutoModel,) if is_tf_available() else (), "pt": (AutoModel,) if is_torch_available() else (), "default": { "model": { "pt": ("google/vit-base-patch16-224", "3f49326"), "tf": ("google/vit-base-patch16-224", "3f49326"), } }, "type": "image", }, "image-segmentation": { "impl": ImageSegmentationPipeline, "tf": (), "pt": (AutoModelForImageSegmentation, AutoModelForSemanticSegmentation) if is_torch_available() else (), "default": {"model": {"pt": ("facebook/detr-resnet-50-panoptic", "fc15262")}}, "type": "multimodal", }, "image-to-text": { "impl": ImageToTextPipeline, "tf": (TFAutoModelForVision2Seq,) if is_tf_available() else (), "pt": (AutoModelForVision2Seq,) if is_torch_available() else (), "default": { "model": { "pt": ("ydshieh/vit-gpt2-coco-en", "65636df"), "tf": ("ydshieh/vit-gpt2-coco-en", "65636df"), } }, "type": "multimodal", }, "object-detection": { "impl": ObjectDetectionPipeline, "tf": (), "pt": (AutoModelForObjectDetection,) if is_torch_available() else (), "default": {"model": {"pt": ("facebook/detr-resnet-50", "2729413")}}, "type": "multimodal", }, "zero-shot-object-detection": { "impl": ZeroShotObjectDetectionPipeline, "tf": (), "pt": (AutoModelForZeroShotObjectDetection,) if is_torch_available() else (), "default": {"model": {"pt": ("google/owlvit-base-patch32", "17740e1")}}, "type": "multimodal", }, "depth-estimation": { "impl": DepthEstimationPipeline, "tf": (), "pt": (AutoModelForDepthEstimation,) if is_torch_available() else (), "default": {"model": {"pt": ("Intel/dpt-large", "e93beec")}}, "type": "image", }, "video-classification": { "impl": VideoClassificationPipeline, "tf": (), "pt": (AutoModelForVideoClassification,) if is_torch_available() else (), "default": {"model": {"pt": ("MCG-NJU/videomae-base-finetuned-kinetics", "4800870")}}, "type": "video", }, "mask-generation": { "impl": MaskGenerationPipeline, "tf": (), "pt": (AutoModelForMaskGeneration,) if is_torch_available() else (), "default": {"model": {"pt": ("facebook/sam-vit-huge", "997b15")}}, "type": "multimodal", }, "image-to-image": { "impl": ImageToImagePipeline, "tf": (), "pt": (AutoModelForImageToImage,) if is_torch_available() else (), "default": {"model": {"pt": ("caidas/swin2SR-classical-sr-x2-64", "4aaedcb")}}, "type": "image", }, }
2.3 使用model实例化pipeline对象
2.3.1 基于model实例化“自动语音识别”
如果不想使用task中默认的模型,可以指定huggingface中的模型:
import os os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" os.environ["CUDA_VISIBLE_DEVICES"] = "2" from transformers import pipeline speech_file = "./output_video_enhanced.mp3" #transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-medium") pipe = pipeline(model="openai/whisper-medium") result = pipe(speech_file) print(result)
2.3.2 查看model与task的对应关系
可以登录https://huggingface.co/tasks查看
三、总结
本文为transformers之pipeline专栏的第0篇,后面会以每个task为一篇,共计讲述28+个tasks的用法,通过28个tasks的pipeline使用学习,可以掌握语音、计算机视觉、自然语言处理、多模态乃至强化学习等30w+个huggingface上的开源大模型。让你成为大模型领域的专家!