Qwen3-LiveTranslate-Flash：视、听、说全模态同传大模型

2025-11-10 283

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 通义千问Qwen3-LiveTranslate-Flash推出实时多模态同声传译，支持18种语言及多种方言，融合视觉信息增强理解，实现3秒超低延迟、高精度语音翻译，适用于复杂环境下的跨语言交流。

Swipe for Chinese >>>

News Today

Qwen3-LiveTranslate-Flash: Real‑Time Multimodal Interpretation — See It, Hear It, Speak It

Qwen3-LiveTranslate-Flash delivers high‑precision, lightning‑fast and ultra‑reliable real‑time multilingual audio and video interpretation. With the extensive capabilities of Qwen3‑Omni and training on millions of hours of multimodal data, it enables both offline and live translation in 18 languages, making cross‑language communication seamless.

Key Features:

Multilingual and Dialect Coverage: Supports major official languages including Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Indonesian, Thai, Vietnamese, Arabic, Hindi, Greek, Turkish; as well as dialect and accent translation for Mandarin, Cantonese, Beijing, Wu, Sichuan, and Tianjin dialects.
Vision‑Enhanced Comprehension: For the first time, Qwen3‑LiveTranslate‑Flash incorporates visual context augmentation, enabling it to not only understand what it hears but also understand what it sees. By detecting and interpreting lip movements, gestures, on‑screen text, and real‑world entities, the system robustly handles noisy audio environments and resolves ambiguities in terms with multiple meanings.
3s Latency: A lightweight mixture‑of‑experts architecture, coupled with dynamic sampling, enables simultaneous interpretation with latency as low as three seconds.
Lossless Interpretation: Utilizes semantic unit prediction to mitigate cross‑lingual reordering challenges in translation, achieving real‑time translation quality that is close to offline translation.
Natural Voice Quality: With training on massive speech datasets, the model delivers lifelike voices whose tone and expressiveness naturally follow the meaning of the source speech.

Performance

Qwen3‑LiveTranslate‑Flash achieves significantly higher accuracy than strong large-scale models, including Gemini‑2.5‑Flash, GPT‑4o‑Audio‑Preview, and Voxtral Small‑24B, on public benchmarks for Chinese, English and multilingual speech translation.

Qwen3‑LiveTranslate‑Flash consistently delivers leading translation performance across different domains and under challenging acoustic conditions.

Semantic unit prediction technology alleviates cross-lingual reordering issues, enabling real-time simultaneous interpretation to significantly reduce latency while maintaining over 94% of the accuracy achieved by non-real-time translation.

Visual enhancement technology further improves Qwen3-LiveTranslate-Flash’s translation precision in challenging scenarios such as noisy audio, ambiguous word meanings, and proper noun translation. In real-time settings, visual information compensates for missing speech context, making its advantages even more pronounced.

Examples

1 Speech‑to‑Speech Simultaneous Translation

Local API Test: real‑time interpretation | English → Chinese

2 Vision‑Enhanced Speech Translation

Homophones / Ambiguous Terms | English → Chinese

What's Next

Qwen will keep advancing the accuracy, naturalness, and emotional fidelity of our speech translation; extend coverage to more languages; and reinforce its robustness across varied and challenging acoustic environments. The goal is to bridge linguistic divides, enabling conversations to flow as smoothly and naturally as if speaking face to face.

/ END /

来源 | Alibaba Cloud Internationa公众号

Qwen3-LiveTranslate-Flash：视、听、说全模态同传大模型

Swipe for Chinese >>>

Performance

Examples

1 Speech‑to‑Speech Simultaneous Translation

2 Vision‑Enhanced Speech Translation

What's Next

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

Qwen3-LiveTranslate-Flash：视、听、说全模态同传大模型

Swipe for Chinese >>>

Performance

Examples

1 Speech‑to‑Speech Simultaneous Translation

2 Vision‑Enhanced Speech Translation

What's Next

热门文章

最新文章

相关电子书