Qwen3-LiveTranslate-Flash:视、听、说全模态同传大模型

简介: 通义千问Qwen3-LiveTranslate-Flash推出实时多模态同声传译,支持18种语言及多种方言,融合视觉信息增强理解,实现3秒超低延迟、高精度语音翻译,适用于复杂环境下的跨语言交流。

Swipe for Chinese >>>

News Today

Qwen3-LiveTranslate-Flash: Real‑Time Multimodal Interpretation — See It, Hear It, Speak It

Qwen3-LiveTranslate-Flash delivers high‑precision, lightning‑fast and ultra‑reliable real‑time multilingual audio and video interpretation. With the extensive capabilities of Qwen3‑Omni and training on millions of hours of multimodal data, it enables both offline and live translation in 18 languages, making cross‑language communication seamless.

Key Features:

  • Multilingual and Dialect Coverage: Supports major official languages including Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Indonesian, Thai, Vietnamese, Arabic, Hindi, Greek, Turkish; as well as dialect and accent translation for Mandarin, Cantonese, Beijing, Wu, Sichuan, and Tianjin dialects.
  • Vision‑Enhanced Comprehension: For the first time, Qwen3‑LiveTranslate‑Flash incorporates visual context augmentation, enabling it to not only understand what it hears but also understand what it sees. By detecting and interpreting lip movements, gestures, on‑screen text, and real‑world entities, the system robustly handles noisy audio environments and resolves ambiguities in terms with multiple meanings.
  • 3s Latency: A lightweight mixture‑of‑experts architecture, coupled with dynamic sampling, enables simultaneous interpretation with latency as low as three seconds.
  • Lossless Interpretation: Utilizes semantic unit prediction to mitigate cross‑lingual reordering challenges in translation, achieving real‑time translation quality that is close to offline translation.
  • Natural Voice Quality: With training on massive speech datasets, the model delivers lifelike voices whose tone and expressiveness naturally follow the meaning of the source speech.

Performance

Qwen3‑LiveTranslate‑Flash achieves significantly higher accuracy than strong large-scale models, including Gemini‑2.5‑Flash, GPT‑4o‑Audio‑Preview, and Voxtral Small‑24B, on public benchmarks for Chinese, English and multilingual speech translation.

Qwen3‑LiveTranslate‑Flash consistently delivers leading translation performance across different domains and under challenging acoustic conditions.

Semantic unit prediction technology alleviates cross-lingual reordering issues, enabling real-time simultaneous interpretation to significantly reduce latency while maintaining over 94% of the accuracy achieved by non-real-time translation.

Visual enhancement technology further improves Qwen3-LiveTranslate-Flash’s translation precision in challenging scenarios such as noisy audio, ambiguous word meanings, and proper noun translation. In real-time settings, visual information compensates for missing speech context, making its advantages even more pronounced.

Examples

1 Speech‑to‑Speech Simultaneous Translation

Local API Test: real‑time interpretation | English → Chinese

2 Vision‑Enhanced Speech Translation

Homophones / Ambiguous Terms | English → Chinese

What's Next

Qwen will keep advancing the accuracy, naturalness, and emotional fidelity of our speech translation; extend coverage to more languages; and reinforce its robustness across varied and challenging acoustic environments. The goal is to bridge linguistic divides, enabling conversations to flow as smoothly and naturally as if speaking face to face.

/ END /

来源  | Alibaba Cloud Internationa公众号


相关文章
|
3月前
|
机器学习/深度学习 人工智能 自然语言处理
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
644 13
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
|
1月前
|
人工智能 语音技术 流计算
一图掌握通义千问:模型生态与应用场景全览
通义千问(Qwen)系列提供全栈开源AI能力,涵盖语言、视觉、语音等多模态应用。旗舰模型Qwen3-Max性能领先,支持92种语言翻译与高精度语音识别,具备强大代码生成与图像处理能力,助力开发者与企业高效构建智能应用。
363 2
一图掌握通义千问:模型生态与应用场景全览
|
数据可视化 数据挖掘 Python
数据分析案例-往届世界杯数据可视化
数据分析案例-往届世界杯数据可视化
893 0
数据分析案例-往届世界杯数据可视化
|
人工智能 移动开发 自然语言处理
阿里云百炼产品月刊【2025年9月】
本月通义千问模型大升级,新增多模态、语音、视频生成等高性能模型,支持图文理解、端到端视频生成。官网改版上线全新体验中心,推出高代码应用与智能体多模态知识融合,RAG能力增强,助力企业高效部署AI应用。
917 0
|
3月前
|
存储 机器学习/深度学习 人工智能
云栖 2025|阿里云 Qwen3 系列领衔:AI 模型全栈突破与开发者落地指南
阿里云发布Qwen3全栈AI体系,七大模型升级、性能全球领先,开源生态稳居第一。从底层基建到开发工具链全面优化,助力企业高效落地AI应用,共建超级AI云生态。
1530 11
|
3月前
|
人工智能 自然语言处理 语音技术
阿里云百炼官网首页登录入口:开通百炼,每个大模型免费100万Tokens
阿里云百炼平台现开放免费领Token福利,开通即享超5000万额度。提供大模型推理、部署及训练服务,涵盖通义千问、万相等多个系列模型。前台介绍平台详情与价格,后台支持API-Key申请及管理操作。
930 8
|
7月前
|
人工智能 安全 Android开发
手机也能跑通义Qwen3大模型,手把手教你部署!
全球开源模型冠军Qwen3与端到端全模态模型Qwen2.5-Omni现已成功在手机上跑通!借助MNN支持,适配Android、iOS及桌面端,实现低延迟、本地化、高安全的AI体验。用户可通过自定义Sampler设置、System Prompt和Max New Tokens调节模型输出风格与长度。
3653 11
|
11月前
|
人工智能 前端开发 API
Gemini Coder:基于 Google Gemini API 的开源 Web 应用生成工具,支持实时编辑和预览
Gemini Coder 是一款基于 Google Gemini API 的 AI 应用生成工具,支持通过文本描述快速生成代码,并提供实时代码编辑和预览功能,简化开发流程。
795 38
Gemini Coder:基于 Google Gemini API 的开源 Web 应用生成工具,支持实时编辑和预览
|
11月前
|
人工智能 资源调度 数据可视化
StoryWeaver:故事可视化生成模型,快速生成故事绘本,支持处理单角色和多角色的故事可视化任务
StoryWeaver 是厦门大学与网易伏羲联合推出的 AI 模型,通过知识图谱和角色定制技术,实现高质量的故事可视化。
639 18
StoryWeaver:故事可视化生成模型,快速生成故事绘本,支持处理单角色和多角色的故事可视化任务