swift webui导出模型时报KeyError: 'base_model.model.model

我用swift webui导出模型时报KeyError: 'base_model.model.model.layers.0.self_attn.k_proj.base_layer'，哪位大神知道啥原因？
[INFO:swift] Loading the model using model_dir: /mnt/workspace/output/v3-20250116-173230/checkpoint-537
[INFO:swift] Successfully loaded /mnt/workspace/output/v3-20250116-173230/checkpoint-537/args.json.
[INFO:swift] Successfully registered []
[INFO:swift] rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
[INFO:swift] Loading the model using model_dir: Qwen2.5-7B-Instruct-GPTQ-Int4
/usr/local/lib/python3.10/site-packages/gradio/blocks.py:1780: UserWarning: A function (export_model) returned too many output values (needed: 2, returned: 3). Ignoring extra values.
Output components:
[accordion, dropdown]
Output values returned:
[{'open': True, 'type': 'update'}, {'choices': ['pid:7794/create:2025-01-16, 19:10/running:2s/cmd:/usr/local/bin/python /usr/local/bin/swift export --model_type qwen2_5 --template qwen2_5 --quant_bits 4 --quant_method gptq --output_dir /mnt/workspace/export/1 --dataset datajson/sft.json --ckpt_dir /mnt/workspace/output/v3-20250116-173230/checkpoint-537 --log_file /mnt/workspace/output/qwen2_5-2025116191045/run_export.log --ignore_args_error true'], 'value': 'pid:7794/create:2025-01-16, 19:10/running:2s/cmd:/usr/local/bin/python /usr/local/bin/swift export --model_type qwen2_5 --template qwen2_5 --quant_bits 4 --quant_method gptq --output_dir /mnt/workspace/export/1 --dataset datajson/sft.json --ckpt_dir /mnt/workspace/output/v3-20250116-173230/checkpoint-537 --log_file /mnt/workspace/output/qwen2_5-2025116191045/run_export.log --ignore_args_error true', 'type': 'update'}, [None]]
warnings.warn(

日志内容如下：

···

Quantizing layers inside the block: 36%|█████████████████████████████████████████████ | 5/14 [00:05<00:09, 1.04s/it][A

Quantizing layers inside the block: 43%|██████████████████████████████████████████████████████ | 6/14 [00:05<00:06, 1.26it/s][A

Quantizing layers inside the block: 50%|███████████████████████████████████████████████████████████████ | 7/14 [00:06<00:07, 1.01s/it][A

Quantizing layers inside the block: 57%|████████████████████████████████████████████████████████████████████████ | 8/14 [00:07<00:04, 1.26it/s][A

Quantizing layers inside the block: 64%|█████████████████████████████████████████████████████████████████████████████████ | 9/14 [00:16<00:16, 3.36s/it][A

Quantizing layers inside the block: 71%|█████████████████████████████████████████████████████████████████████████████████████████▎ | 10/14 [00:16<00:09, 2.43s/it][A

Quantizing layers inside the block: 79%|██████████████████████████████████████████████████████████████████████████████████████████████████▏ | 11/14 [00:18<00:06, 2.14s/it][A

Quantizing layers inside the block: 86%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 12/14 [00:18<00:03, 1.59s/it][A

Quantizing layers inside the block: 93%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 13/14 [00:19<00:01, 1.54s/it][A

Quantizing layers inside the block: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:20<00:00, 1.17s/it][A

[A

Quantizing base_model.model.model.layers blocks : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [09:29<00:00, 20.39s/it] Quantizing base_model.model.model.layers blocks : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [09:29<00:00, 20.34s/it] Using Exllamav2 backend will reorder the weights offline, thus you will not be able to save the model with the right weights.Setting `disable_exllama=True`. You should only use Exllamav2 backend for inference. /usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py:5055: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead warnings.warn( Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/swift/cli/export.py", line 5, in
export_main()
File "/usr/local/lib/python3.10/site-packages/swift/llm/export/export.py", line 41, in export_main
return SwiftExport(args).main()
File "/usr/local/lib/python3.10/site-packages/swift/llm/base.py", line 45, in main
result = self.run()
File "/usr/local/lib/python3.10/site-packages/swift/llm/export/export.py", line 26, in run
quantize_model(args)
File "/usr/local/lib/python3.10/site-packages/swift/llm/export/quant.py", line 227, in quantize_model
QuantEngine(args).quantize()
File "/usr/local/lib/python3.10/site-packages/swift/llm/export/quant.py", line 39, in quantize
gptq_quantizer = self.gptq_model_quantize()
File "/usr/local/lib/python3.10/site-packages/swift/llm/export/quant.py", line 221, in gptq_model_quantize
gptq_quantizer.quantize_model(self.model, self.tokenizer)
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, *kwargs)
File "/usr/local/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 569, in quantize_model
self.pack_model(model=model, quantizers=quantizers)
File "/usr/local/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 644, in pack_model
quantizers[name], scale, zero, g_idx = quantizers[name]
KeyError: 'base_model.model.model.layers.0.self_attn.k_proj.base_layer'