语音合成-广东粤语这个模型跑不通，但在同样环境下四川话上海话与普通话的模型都没问题

OS：Ubuntu 22.04.1 LTS
报错：
一个是 ERROR:root:Language Cantonese not supported. Using PinYin as default
另一个是 size mismatch for text_encoder.sy_emb.weight: copying a param with shape torch.Size([107, 512]) from checkpoint, the shape in current model is torch.Size([147, 512]).
size mismatch for text_encoder.tone_emb.weight: copying a param with shape torch.Size([14, 512]) from checkpoint, the shape in current model is torch.Size([10, 512]).

(maas) user@user-virtual-machine:~/MyProjects/KAN-TTS-main$ python yueyu.py 
2023-10-16 15:01:14,671 - modelscope - INFO - PyTorch version 1.13.1 Found.
2023-10-16 15:01:14,671 - modelscope - INFO - Loading ast index from /home/user/.cache/modelscope/ast_indexer
2023-10-16 15:01:15,039 - modelscope - INFO - Loading done! Current index file version is 1.9.2, with md5 32146877e89460b4730c48c414aaea6c and a total number of 941 components indexed
2023-10-16 15:01:17,399 - modelscope - INFO - Model revision not specified, use revision: v1.0.4
2023-10-16 15:01:22,417 - modelscope - INFO - initiate model from /home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k
2023-10-16 15:01:22,418 - modelscope - INFO - initiate model from location /home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k.
2023-10-16 15:01:22,420 - modelscope - INFO - initialize model from /home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k
2023-10-16 15:01:22,427 - modelscope - INFO - am_config=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/am/config.yaml voc_config=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/voc/config.yaml
2023-10-16 15:01:22,427 - modelscope - INFO - audio_config=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/audio_config.yaml
2023-10-16 15:01:22,427 - modelscope - INFO - am_ckpts=OrderedDict([(980000, '/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/am/ckpt/checkpoint_980000.pth')])
2023-10-16 15:01:22,427 - modelscope - INFO - voc_ckpts=OrderedDict([(360000, '/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/voc/ckpt/checkpoint_360000.pth')])
2023-10-16 15:01:22,427 - modelscope - INFO - se_path=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/am/se.npy se_model_path=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/se/ckpt/se.onnx
2023-10-16 15:01:22,427 - modelscope - INFO - mvn_path=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/am/mvn.npy
ERROR:root:Language Cantonese not supported. Using PinYin as default
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
text.cc: festival_Text_init
2023-10-16 15:01:34,330 - modelscope - WARNING - No preprocessor field found in cfg.
2023-10-16 15:01:34,331 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2023-10-16 15:01:34,331 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k'}. trying to build by task and model information.
2023-10-16 15:01:34,331 - modelscope - WARNING - No preprocessor key ('sambert-hifigan', 'text-to-speech') found in PREPROCESSOR_MAP, skip building preprocessor.
2023-10-16 15:01:34,332 - modelscope - INFO - cuda is not available, using cpu instead.
Traceback (most recent call last):
  File "yueyu.py", line 8, in <module>
    output = sambert_hifigan_tts(input=text)
  File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/pipelines/base.py", line 219, in __call__
    output = self._process_single(input, *args, **kwargs)
  File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/pipelines/base.py", line 254, in _process_single
    out = self.forward(out, **forward_params)
  File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/pipelines/audio/text_to_speech_pipeline.py", line 38, in forward
    output_wav = self.model.forward(input, forward_params.get('voice'))
  File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/models/audio/tts/sambert_hifi.py", line 272, in forward
    audio = self.synthesis_one_sentences(voice, line[1])
  File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/models/audio/tts/sambert_hifi.py", line 192, in synthesis_one_sentences
    return self.voices[voice_name].forward(text)
  File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/models/audio/tts/voice.py", line 650, in forward
    self.load_am()
  File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/models/audio/tts/voice.py", line 251, in load_am
    self.am.load_state_dict(state_dict['model'], strict=False)
  File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1672, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for KanTtsSAMBERT:
    size mismatch for text_encoder.sy_emb.weight: copying a param with shape torch.Size([107, 512]) from checkpoint, the shape in current model is torch.Size([147, 512]).
    size mismatch for text_encoder.tone_emb.weight: copying a param with shape torch.Size([14, 512]) from checkpoint, the shape in current model is torch.Size([10, 512]).

这个maas虚拟环境应该是没问题的，是直接按KAN-TTS中的environment.yaml生成的，同时安装了ModelScope Library，包括核心组件与语音组件。在这个虚拟环境中，能够成功跑出四川话、上海话以及普通话（发音人Zhiyan）。

尝试过在cache中删除所下模型重新下载，但仍有该问题。

语音合成-广东粤语这个模型跑不通，但在同样环境下四川话上海话与普通话的模型都没问题

语音

相关文章

相关解决方案

热门讨论

热门文章

语音合成-广东粤语 这个模型跑不通，但在同样环境下四川话上海话与普通话的模型都没问题

语音

相关文章

相关解决方案

热门讨论

热门文章

语音合成-广东粤语这个模型跑不通，但在同样环境下四川话上海话与普通话的模型都没问题