"在modelscope社区,阿里云弹性加速计算EAIS 执行错误,怎么解决?:
from swift.llm import (
ModelType, get_vllm_engine, get_default_template_type,
get_template, inference_vllm
)
import torch
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
model_type = ModelType.qwen1half_1_8b_chat
llm_engine = get_vllm_engine(model_type)
template_type = get_default_template_type(model_type)
template = get_template(template_type, llm_engine.hf_tokenizer)
llm_engine.generation_config.max_new_tokens = 2048
request_list = [{'query': '蚂蚁'}, {'query': '大象'}]
resp_list = inference_vllm(llm_engine, template, request_list)
for request, resp in zip(request_list, resp_list):
print(f""query: {request['query']}"")
print(f""response: {resp['response']}"")
错误:ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla P100-PCIE-16GB GPU has compute capability 6.0. You can use float16 instead by explicitly setting thedtype
flag in CLI, for example: --dtype=half.
更改:
llm_engine = get_vllm_engine(model_type, torch_dtype=torch.float16)
错误:CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
"
ModelScope旨在打造下一代开源的模型即服务共享平台,为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品,让模型应用更简单!欢迎加入技术交流群:微信公众号:魔搭ModelScope社区,钉钉群号:44837352