Alibaba Cloud Native API Gateway Helps Industries Connect to DeepSeek Safely and Reliably

本文涉及的产品
性能测试 PTS,5000VUM额度
云原生网关 MSE Higress,422元/月
任务调度 XXL-JOB 版免费试用,400 元额度,开发版规格
简介: Since DeepSeek has open-sourced the complete DeepSeek-R1 model weights, enterprises can deploy the model within their networks, thus keeping the entire AI application data flow under their control.

Background

1.png

Currently, demands related to DeepSeek can be summarized into two categories:

  • Due to the official APP/Web services frequently failing to return results, various cloud vendors, hardware or software enterprises provide full-strength or distilled versions of APIs + computing power services. There are also numerous local deployment solutions based on open-source technologies and home computing and storage devices to alleviate the service pressure on DeepSeek's official servers.
  • Various industries start to call DeepSeek APIs for designing large model applications serving both internal and external purposes, focusing on the efficiency and stability of application construction.

Previously, we have released numerous cloud-based and local deployments addressing the first demand; this article will discuss engineering solutions for traffic management concerning the second demand.
DeepSeek Deployment
Since DeepSeek has open-sourced the complete DeepSeek-R1 model weights, enterprises can deploy the model within their networks, thus keeping the entire AI application data flow under their control.

  • Model Weights Download: Available through the ModelScope community (https://modelscope.cn/).
  • Considering the full DeepSeek-R1 model has 671 billion parameters, running it requires substantial GPU resources. Quantization methods like int8/int4 can be considered for inference. Meanwhile, DeepSeek has released several distilled models of different specifications that can be deployed on machines with lower configurations.

    2.png

Deployment Solutions
Alibaba Cloud officially provides multiple deployment options, including PAI, Bailian, GPU + ACK, ModelScope+FC, Spring AI Alibaba + Ollama. Details can be found via links provided in the original text.
PAI:https://mp.weixin.qq.com/s/Ly9bseQxhmunlbePphRsnA
Bailian:https://mp.weixin.qq.com/s/UgB90HfKlMDfarMugc5F5w
container(GPU + ACK):https://mp.weixin.qq.com/s/SSGD5G7KL8iYLy2jxh9FOg
Serverless(ModelScope+FC):https://mp.weixin.qq.com/s/yk5t0oIv7XQR0ky6phiq6g
Local Deployment(Spring AI Alibaba + Ollama + Higress):https://mp.weixin.qq.com/s/-8z9OFHvn0A1ga2rFsmeww
Common Requirements and Engineering Challenges During Large Model Application Implementation
Similar to deploying web applications, when deploying large model applications, challenges such as sudden traffic surges and overload, network fluctuations and delays, security and compliance issues, invocation quotas and cost control, online faults caused by releases arise. However, due to the differences in architecture between large model applications and web applications, the corresponding solutions vary accordingly.
The importance of traffic management for the engineering of large model applications was shared in "A Comprehensive View of Large Model Inference", where AI gateways have become a standard feature for large model applications. By registering deployed models as services through an AI gateway and exposing APIs to callers, capabilities such as rate limiting, authentication, and statistics are enabled.

Higress is a high-performance gateway open-sourced by Alibaba Cloud, designed for the deployment of web applications and large model applications. It also offers commercial services through the Alibaba Cloud Native API Gateway. This article will demonstrate using the console of the Alibaba Cloud Native API Gateway.

3.png

Specific Needs and Solutions

Fallback Strategies for Self-built DeepSeek Services:** Utilizing smaller parameter models like DeepSeek-R1-Distill-Qwen-32B for fallback.

Given the vast 671 billion parameters of DeepSeek-R1, deploying it incurs significant costs. It is advisable to deploy some distilled models from the R1 series as a fallback option. For instance, the DeepSeek-R1-Distill-Qwen-32B, trained based on the Qwen model, is an excellent alternative.
The AI Gateway within Alibaba Cloud's Native API Gateway supports configuring multiple backend model services and includes a fallback capability to reschedule failed requests. If a call to a self-deployed DeepSeek-R1 fails, the request can be routed to models with fewer parameters. Additionally, routing to online API services, such as DeepSeek-V3 or Qwen-max, can be selected to ensure comprehensive service capabilities.

4.png

Content Security Assurance for Self-built DeepSeek Services: Ensuring real-time processing and blocking of sensitive content using Alibaba Cloud Content Security.

5.png

The output style of DeepSeek's R1 series open-source models tends to be relatively "free". If these models are used to provide external services, there may be concerns regarding content security. Should the model respond to sensitive questions, it could potentially lead to additional explanation liabilities for enterprises.
Alibaba Cloud Native API Gateway integrates with Alibaba Cloud Content Security, offering real-time processing and content blocking for large model requests/responses. Alibaba Cloud Content Security has received certification from the China Academy of Information and Communications Technology (CAICT), providing robust AI-based content security assurance.
Once content security is enabled, if a user sends inappropriate content, they will receive a response indicating that the content is in violation. This ensures that any potential breaches of content policy are promptly addressed, thereby safeguarding the service against inappropriate or harmful content.
By leveraging these capabilities, enterprises can ensure that their AI applications maintain high standards of content safety and regulatory compliance, minimizing risks associated with inappropriate responses from large model deployments.
{
"id": "chatcmpl-E45zRLc5hUCxhsda4ODEhjvkEycC9",
"object": "chat.completion",
"model": "from-security-guard",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "我不能处理隐私信息"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}

Additionally, the content security console allows you to view the audit logs for each request. This feature enables detailed monitoring and tracking of all requests processed by the system, helping ensure that any potentially inappropriate or sensitive content can be audited and managed effectively. By reviewing these logs, enterprises can gain insights into the types of requests being made and the responses being generated, thereby facilitating better management of content security and compliance with relevant regulations. This transparency is crucial for maintaining the integrity and safety of AI-powered services.

6.png

Authorization and Quota Control for API Users: Issuing API keys to control permissions and quota usage.

7.png

Based on the consumer authentication capabilities of Alibaba Cloud Native API Gateway, it supports multi-tenancy for model services. Users can issue their own API Keys on the gateway just like model service providers, controlling the invocation permissions and quotas of consumers. Combined with observability features, this also allows for monitoring and statistics on each consumer's token usage.
For online model services, this functionality enables the masking of the original model provider's API Key, thereby achieving multi-tenancy management of API Keys. This setup not only enhances security by protecting the original API Keys from exposure but also provides a structured way to manage different users or tenants accessing the service. Each tenant can be allocated specific access rights and usage limits, ensuring controlled and secure access to the model services. This approach is ideal for enterprises looking to offer customized access to AI models while maintaining robust security and control over usage and costs.

8.png

Traffic Distribution Between Different LLMs: Supporting gradual traffic switching for model migration.

9.png

Alibaba Cloud Native API Gateway supports model traffic shifting capabilities, facilitating smoother transitions between models. As illustrated, 90% of request traffic can be routed to OpenAI, while 10% is routed to DeepSeek. Subsequent adjustments in the traffic distribution for gray release (canary release) only require configuration changes and redeployment without necessitating any code-level modifications.

10.png

Cost Reduction Through Caching Common Requests: Reducing backend model load by caching frequent requests.

11.png

Alibaba Cloud Native API Gateway supports caching of Large Language Model (LLM) production results. Once the caching capability is enabled, common requests—such as greetings or inquiries about product capabilities—can be directly responded to from the cache without needing to access the backend model. This prevents the usage of valuable inference resources for routine queries.

These capabilities are complemented by rich observability features, including monitoring of content security, rate limiting, caching, etc., along with advanced semantic vector indexing functions for further improving model application performance.

12.png

In addition to the aforementioned capabilities, Alibaba Cloud Native API Gateway, in conjunction with SLS (Simple Logging Service), provides advanced functionalities such as semantic vector indexing and semantic enrichment based on large model dialogues. These features enable topic clustering, intent recognition, sentiment analysis, quality assessment, and more, assisting users in progressively enhancing the effectiveness of their model applications.

13.png

相关文章
|
7天前
|
机器学习/深度学习 人工智能 自然语言处理
PAI Model Gallery 支持云上一键部署 DeepSeek-V3、DeepSeek-R1 系列模型
DeepSeek 系列模型以其卓越性能在全球范围内备受瞩目,多次评测中表现优异,性能接近甚至超越国际顶尖闭源模型(如OpenAI的GPT-4、Claude-3.5-Sonnet等)。企业用户和开发者可使用 PAI 平台一键部署 DeepSeek 系列模型,实现 DeepSeek 系列模型与现有业务的高效融合。
|
7天前
|
人工智能 搜索推荐 Docker
手把手教你使用 Ollama 和 LobeChat 快速本地部署 DeepSeek R1 模型,创建个性化 AI 助手
DeepSeek R1 + LobeChat + Ollama:快速本地部署模型,创建个性化 AI 助手
2611 111
手把手教你使用 Ollama 和 LobeChat 快速本地部署 DeepSeek R1 模型,创建个性化 AI 助手
|
2天前
|
云安全 边缘计算 人工智能
对话|ESA如何助力企业高效安全开展在线业务?
ESA如何助力企业安全开展在线业务
1014 7
|
4天前
|
人工智能 自然语言处理 JavaScript
宜搭上新,DeepSeek 插件来了!
钉钉宜搭近日上线了DeepSeek插件,无需编写复杂代码,普通用户也能轻松调用强大的AI大模型能力。安装后,平台新增「AI生成」组件,支持创意内容生成、JS代码编译、工作汇报等场景,大幅提升工作效率。快来体验这一高效智能的办公方式吧!
1314 5
|
14天前
|
Linux iOS开发 MacOS
deepseek部署的详细步骤和方法,基于Ollama获取顶级推理能力!
DeepSeek基于Ollama部署教程,助你免费获取顶级推理能力。首先访问ollama.com下载并安装适用于macOS、Linux或Windows的Ollama版本。运行Ollama后,在官网搜索“deepseek”,选择适合你电脑配置的模型大小(如1.5b、7b等)。通过终端命令(如ollama run deepseek-r1:1.5b)启动模型,等待下载完成即可开始使用。退出模型时输入/bye。详细步骤如下图所示,轻松打造你的最强大脑。
9370 86
|
2天前
|
人工智能 自然语言处理 API
DeepSeek全尺寸模型上线阿里云百炼!
阿里云百炼平台近日上线了DeepSeek-V3、DeepSeek-R1及其蒸馏版本等六款全尺寸AI模型,参数量达671B,提供高达100万免费tokens。这些模型在数学、代码、自然语言推理等任务上表现出色,支持灵活调用和经济高效的解决方案,助力开发者和企业加速创新与数字化转型。示例代码展示了如何通过API使用DeepSeek-R1模型进行推理,用户可轻松获取思考过程和最终答案。
|
6天前
|
API 开发工具 Python
阿里云PAI部署DeepSeek及调用
本文介绍如何在阿里云PAI EAS上部署DeepSeek模型,涵盖7B模型的部署、SDK和API调用。7B模型只需一张A10显卡,部署时间约10分钟。文章详细展示了模型信息查看、在线调试及通过OpenAI SDK和Python Requests进行调用的步骤,并附有测试结果和参考文档链接。
1280 9
阿里云PAI部署DeepSeek及调用
|
1月前
|
供应链 监控 安全
对话|企业如何构建更完善的容器供应链安全防护体系
阿里云与企业共筑容器供应链安全
171376 18
|
1月前
|
供应链 监控 安全
对话|企业如何构建更完善的容器供应链安全防护体系
随着云计算和DevOps的兴起,容器技术和自动化在软件开发中扮演着愈发重要的角色,但也带来了新的安全挑战。阿里云针对这些挑战,组织了一场关于云上安全的深度访谈,邀请了内部专家穆寰、匡大虎和黄竹刚,深入探讨了容器安全与软件供应链安全的关系,分析了当前的安全隐患及应对策略,并介绍了阿里云提供的安全解决方案,包括容器镜像服务ACR、容器服务ACK、网格服务ASM等,旨在帮助企业构建涵盖整个软件开发生命周期的安全防护体系。通过加强基础设施安全性、技术创新以及倡导协同安全理念,阿里云致力于与客户共同建设更加安全可靠的软件供应链环境。
150313 32
|
5天前
|
缓存 自然语言处理 安全
快速调用 Deepseek API!【超详细教程】
Deepseek 强大的功能,在本教程中,将指导您如何获取 DeepSeek API 密钥,并演示如何使用该密钥调用 DeepSeek API 以进行调试。