Interview with Alibaba Cloud Heterogeneous Computing Director: A Trio of Heterogeneous Computing: Domination by a Trio of GPU, FPGA, and ASIC Chips

简介: From October 11 to 14, 2017, The Computing Conference will be held once again in Hangzhou's Yunqi township.

1_png

Editor's Note: From October 11 to 14, 2017, The Computing Conference will be held once again in Hangzhou's Yunqi township (get your tickets now!). As one of the world's most influential technology expos, this conference will include brilliant lectures by many Alibaba Group's experts and industry leaders. Starting from today, the Yunqi Community will interview a series of conference guests.

Today, we interviewed Alibaba Cloud's virtualization platform director Dr. Zhang Xiantao. During the Computing Conference in October, Dr. Zhang will deliver a speech on the field of heterogeneous computing and its future prospects.

In the world of IT, heterogeneous computing is not some new buzzword.

Over the past 10 years, the computing industry has seen the transition from the 32bit, x86-64, multi-core, and the general-purpose GPGPU heterogeneous computing architectures, particularly the CPU-GPU heterogeneous design in 2010. In recent years, the emergence of computing-intensive fields, such as artificial intelligence, high-performance data analytics, and financial analysis, has put heterogeneous computing under the spotlight.

Since traditional general-purpose computing can no longer satisfy our computing power needs, heterogeneous computing is now recognized as a key technology that can play a pivotal role in boosting computing power. To cope with such demands, Alibaba Cloud has developed several heterogeneous computing products and solutions. At the helm of Alibaba Cloud's heterogeneous computing team is Dr. Zhang Xiantao.

Dr. Zhang Xiantao, nicknamed Xuqing, received his PhD in information security from Wuhan University. Before joining Alibaba, he worked at Intel's Asia-Pacific R&D Center and was a primary contributor to multiple open-source projects, such as Xen and KVM. He served as a maintainer for the Xen/IOMMU and KVM/IA64 projects. At the same time, he was one of the main authors of and contributors to the Intel HAXM accelerator, and thus received the Intel Achievement Award.

In 2014, Dr. Zhang officially joined Alibaba as a senior expert. Currently, his main responsibilities at Alibaba include virtualization technology, high-performance computing products, heterogeneous computing products, and the technology and R&D teams for several innovative products.

In this interview, Dr. Zhang spoke about the pain points for companies using heterogeneous computing solutions and gave a deep dive into Alibaba Cloud's work in the field of balanced heterogeneous computing resources.

Opportunities and Challenges for Heterogeneous Computing

Heterogeneous computing refers to computing systems that are composed of computing elements from different types of command sets and system architectures. At present, the "CPU+GPU" and "CPU+FPGA" architectures are the most popular heterogeneous computing platforms in the industry. The primary advantage of heterogeneous computing is computing performance with higher efficiency and lower latency in comparison with parallel computing using traditional CPUs. Given the industry's demand for ever-greater computing performance, heterogeneous computing is becoming increasingly important. Heterogeneous computing is the result of the collective efforts of the entire computing industry ecosystem. Chip companies make substantial investments in hardware research, development standards for heterogeneous programming are gradually becoming more mature, while mainstream cloud service providers are vigorously make strategic moves in the field. In time, it looks as if heterogeneous computing will replace traditional homogeneous computing.

Dr. Zhang stated that heterogeneous computing can meet the computing needs of computing-intensive fields, such as artificial intelligence, high-performance data analytics, and financial analysis. Heterogeneous computing will gradually replace general-purpose computing in areas where this approach is not advantageous.

However, despite all the hype, the procurement, deployment, and use of heterogeneous computing still put it out of the reach of the vast majority of companies. On this topic, Dr. Zhang discussed the following pain points suffered by users:

1.High procurement costs: Users looking to make small purchases have almost no bargaining power. Particularly for FPGA boards, the unit costs for purchase in small quantities can be very high.

2.Long delivery cycle: Average users will generally spend several months to move from requirement decision, model selection, hardware architecture design, supplier selection, data center selection, to financial approval.

3.No elasticity: The quantity of GPUs/FPGAs purchased completely determine a system's capacity. Therefore, the GPU/FPGA resources may be too high during low traffic, but the same quantity may be insufficient during peak hours.

4.No hardware dividend: After purchasing a certain model, it cannot be changed. Therefore, if a new GPU/FPGA architecture is released, users will have to make another purchase, because the performance of the old GPU/FPGA architecture will not be able to keep up with new applications.

5.Data islands: Offline GPUs/FPGAs cannot communicate with online services.

In addition, he added that the greatest challenge for FPGA products is its poor overall FPGA ecosystem and there are few clients capable of FPGA development, particularly computing acceleration. Therefore, Dr. Zhang and his team plan to build a cloud-based IP development marketplace, introduce a series of FPGA IP partners, and promote the creation of development standards for cloud-based FPGAs. With these efforts, they aim to enrich the overall FPGA development ecosystem and entice more IP developers and partners to release their IPs on the IP development marketplace in order to serve their end users. Ultimately, this will further enrich the overall FPGA ecosystem."

Within a short timeframe, Alibaba Cloud successively introduced elastic GPU and FPGA heterogeneous computing solutions. They aim to lower the barriers to heterogeneous computing resource, so that companies that require high-performance computing can easily buy and use resources as needed.

Yunqi Community learned that Alibaba Cloud's elastic GPU product is primarily designed for scenarios such as artificial intelligence, data analytics, scientific computing, film rendering, video and image processing, and video transcoding. It has already been applied successfully for behavior data analysis, targeted advertising, facial recognition, video recognition, image recognition, and object classification. Alibaba Cloud's elastic FPGA product is primarily oriented to artificial intelligence, semiconductor design, genetic computing, video image processing, data analytics decision making, and other such scenarios. It has already been applied successfully for deep learning reasoning, deep learning model cutting, non-regular data computing, video and image processing, and hardware semiconductor design.

Alibaba Cloud's Journey in Heterogeneous Computing

It is widely recognized that, compared with CPUs, GPUs and FPGAs have many advantages. GPUs offer better parallel performance, higher maximum computing power for each machine, and more efficient computing. The main advantages of FPGAs are their higher per-watt performance, better performance for non-regular data computing, improved hardware acceleration performance, and lower inter-device latency.

However, when used in cloud solutions, the advantages of these technologies are even more apparent. Dr. Zhang explained that Alibaba Cloud's GPU and FPGA heterogeneous computing solutions possess the following features:

1.GPU/FPGA resource are ready for use upon purchase and can be elastically scaled.

2.The ultra-large scale resource pool can meet the need for greater numbers of GPU/FPGA resources during business peaks.

3.Users enjoy heterogeneous computing hardware dividends, with performance growth in excess of Moore's Law. They can use GPU/FPGA instances with improved performance at the same price.

4.Alibaba Cloud offers the most comprehensive line of heterogeneous products, meeting the needs of scenarios such as artificial intelligence training and reasoning, image processing, and video processing.

5.Product integration: These products are deeply integrated into the overall Alibaba Cloud product system, allowing data to flow between products.

These features provide a perfect solution for the difficulties users encounter when using heterogeneous computing solutions. Dr. Zhang also revealed that, for model training on a single machine, most customers require several weeks to a month. To save the time spent on training, Alibaba Cloud plans to release an ultra-high performance heterogeneous cluster product.

"This product directly links GPU and FPGA using 25/100 GB ROCE to RDMA protocol. It can adopt a multi-host, multi-chip method to train a model using a cluster of a great number of GPU/FPGA devices. This approach significantly reduces the required training time from several weeks or a month to a day or even a few hours."

It is worth noting that Alibaba Cloud's heterogeneous computing solutions also provide a more friendly user experience for developers.

For GPU programming, Alibaba Cloud will release a distributed multi-host, multi-chip training framework and other GPU performance optimization services. These will significantly lower the barrier for clients to adopt a multi-host, multi-chip approach, and thereby reduce their cloud-based deep learning model training time.

As regards to FPGA, Alibaba Cloud will establish an IP development marketplace and encourage a series of FPGA IP partners to release their proprietary IP product series. With this prosperous IP marketplace, Alibaba hopes to give more end users the opportunity to enjoy FPGA performance acceleration.

In addition, Alibaba Cloud has already released IaaS+ services, including an E-HPC product for heterogeneous cluster resource scheduling, account management, and elastic scaling. Using Alibaba Cloud's Container Service, users can enjoy one-click deployment, distributed training, and elastic scaling, and use XDL to analyze behavior data. Finally, users can take advantage of Alibaba Cloud's self-developed GPU assembler to optimize and enhance application performance, increase the utilization of heterogeneous computing equiptment, and reduce resource procurement costs.

A Future Dominated by a Trio of GPU, FPGA, and ASIC

Artificial intelligence and other emerging application fields demand computing capacity that already exceeds the growth of CPU power as described by Moore's Law. Moreover, the rate of performance growth for heterogeneous computing can satisfy the needs of these emerging trends. It is likely that, in the future, heterogeneous computing will occupy an increasing share of data center resources.

At the macro level, the development of heterogeneous computing benefits from the driving force of national strategies. For instance, China has recently issued an artificial intelligence development plan, elevating AI to the level of national strategy. This is bound to stimulate demand for heterogeneous computing. Naturally, Dr. Zhang also admitted that, although the demand for heterogeneous computing is growing, there will always been a need for general-purpose computing as well. These two computing structures will coexist over the long term.

Without doubt, GPU processors from the heterogeneous computing field have already attained mainstream status. However, as regards future trends, Dr. Zhang stated that, "as the FPGA ecosystem came into existence and continues to develop, and the ASIC chip technology gradually matures, the world of heterogeneous computing will present a tripartite division among the GPU, FPGA, and ASIC chip technologies. These technologies will build on their respective unique advantages and applications, as well as their very own customer base."

This trend is also of great interest to Dr. Zhang and his team. In the future, the team will release 8-card/16-card GPU products, next-gen Volta architecture GPU products, and a new generation of FPGA products. In addition, R&D is underway for cloud-based ASIC chip products.

At present, Dr. Zhang's team has two main goals. First, they are committed to transforming heterogeneous computing into a computing resource users can easily purchase and use. To this end, they seek to provide the most comprehensive range of heterogeneous computing products and solutions. Second, they are committed to giving users the means to make good use of heterogeneous resources. In this way, users can take full advantage of the processing capabilities of these resources to make their products more competitive. The team is promoting the transformation of heterogeneous computing into a universal computing capability.

Computing Conference Highlights

During the Hangzhou Computing Conference, there will be special forums on heterogeneous computing/high-performance computing and virtualization technology. At both of these forums, Dr. Zhang will deliver keynote speeches. Prior to the formal start of the conference, he revealed important news to the Yunqi Community: Alibaba Cloud will launch several important heterogeneous computing products. These products will involve heterogeneous computing, general-purpose computing, high-performance computing, and other fields. He explained that these products were designed to solve the difficulties users face when using Alibaba Cloud, including cluster management and scheduling problems, license problems when elastically using paid software on the cloud, instances that need both the flexibility of virtual hosts and the performance of physical hosts, and the use of multi-host, multi-card distributed training for shorter training times.

The following is the transcript of our interview with Dr. Zhang:

Yunqi: Heterogeneous computing can provide more efficient and lower latency computing performance compared with parallel computing using traditional CPUs. Does this mean that it will replace CPU computing? What do you think about the future trends of these two technologies?

Dr. Zhang: The demand for general-purpose computing will continue to exist alongside the demand for heterogeneous computing. General-purpose computing will not be completely replaced. However, as heterogeneous computing can better meet the computing needs of emerging computing-intensive fields, such as artificial intelligence, high-performance data analytics, and financial analysis, heterogeneous computing will gradually replace general-purpose computing in areas where the traditional approach does not have an edge. Going with the tide, Alibaba Cloud released elastic GPU and FPGA heterogeneous computing solutions, so as to better meet the increasing heterogeneous computing demands of artificial intelligence, data analytics, and business intelligence. These products allow customers to easily purchase and use the resources they need. In this way, heterogeneous computing will no longer be an extremely expensive resource, but become a universal basic computing resource. This will promote the development of artificial intelligence and other such industries.

Yunqi: In January of this year, Alibaba Cloud released elastic GPU and FPGA heterogeneous computing solutions. What application scenarios are these solutions designed for? At present, which fields have seen successful applications of these solutions?

Dr. Zhang: First, compared with CPUs, GPUs offer better parallel performance, higher maximum computing power for each machine, and more efficient computing. Alibaba Cloud's elastic GPU product is primarily designed for scenarios such as artificial intelligence, data analytics, scientific computing, film rendering, video and image processing, and video transcoding. It has already been applied successfully in behavior data analysis, targeted advertising, facial recognition, video recognition, image recognition, and object classification.

Second, the main advantages of FPGAs are their higher per-watt performance, better performance for non-regular data computing, higher hardware acceleration performance, and lower inter-device latency. Alibaba Cloud's elastic FPGA product is primarily oriented to artificial intelligence, semiconductor design, genetic computing, video image processing, data analytics decision making, and other such scenarios. It has already been applied successfully in deep learning reasoning, deep learning model cutting, non-regular data computing, video and image processing, and hardware semiconductor design.

In addition, for model training on a single machine, most clients require several weeks to a month. To help these customers, we plan to release an ultra-high performance heterogeneous cluster product. This product directly links GPU and FPGA using 25/100 GB ROCE to RDMA protocol. It can adopt a multi-host, multi-chip method to train a model using a cluster of a great number of GPU/FPGA devices. This approach significantly reduces the required training time from several weeks or a month to a day or even a few hours.

Yunqi: Heterogeneous computing solutions have distinct advantages, but they are still in the development stage. What are the greatest challenges heterogeneous computing modes face at present?

Dr. Zhang: Currently, the greatest pain points suffered by users who have already purchased heterogeneous computing products include:

(1) High procurement costs: Users looking to make small purchases have almost no bargaining power. Particularly for FPGA boards, the unit costs for purchase in small quantities can be very high.

(2) Long delivery cycle: Average users will generally spend several months to move from requirement decision, model selection, hardware architecture design, supplier selection, data center selection, to financial approval.

(3) No elasticity: The quantity of GPUs/FPGAs purchased completely determine a system's capacity. Therefore, when there are few computing tasks, it will be a waste of GPU/FPGA resources, but the same quantity may be insufficient during peak hours.

(4) No hardware dividend: After purchasing a certain model, it cannot be changed. Therefore, if a new GPU/FPGA architecture is released, users will have to make another purchase because the performance of the old GPU/FPGA architecture will not been able to keep up with applications.

(5) Data islands: Offline GPUs/FPGAs cannot communicate with online services.

To resolve these problems, Alibaba Cloud released elastic heterogeneous computing solutions that offer the following features: (1) GPU/FPGA resources are ready for use upon purchase and can be elastically scaled. (2) The ultra-large scale resource pool can meet the need for greater numbers of GPU/FPGA resources during business peaks. (3) Users enjoy heterogeneous computing hardware dividends, with performance growth in excess of Moore's Law. They can use GPU/FPGA instances with improved performance at the same price. (4) Alibaba Cloud offers the most comprehensive line of heterogeneous products, meeting the needs of scenarios such as artificial intelligence training and reasoning, image processing, and video processing. (5) Product integration: These products are deeply integrated into the overall Alibaba Cloud product system, allowing data to flow between products.

In addition, the biggest challenge for FPGA products is the poor overall FPGA ecosystem and there are few customers capable of FPGA development, particularly computing acceleration. Therefore, we plan to build a cloud-based IP development marketplace, introduce a series of FPGA IP partners, and promote the creation of development standards for cloud-based FPGAs. With these efforts, we aim to enrich the overall FPGA development ecosystem and entice more IP developers and partners to release their IPs on the IP development marketplace in order to serve their end users. Ultimately, this will further enrich the overall FPGA ecosystem.

Yunqi: For developers, heterogeneous computing programming is difficult and the development costs are high. How does Alibaba Cloud address this?

Dr. Zhang: Concerning GPU programming, Alibaba Cloud will release a distributed multi-host, multi-chip training framework and other GPU performance optimization services. These will significantly reduce the difficulty of adopting a multi-host, multi-chip approach, and thereby reduce customers' cloud-based deep learning model training time. As regards to FPGA, Alibaba Cloud will establish an IP development marketplace and encourage a series of FPGA IP partners to release their proprietary IP product series. With this prosperous IP marketplace, Alibaba hopes to give more end users the opportunity to enjoy FPGA performance acceleration. In addition, Alibaba Cloud has already released IaaS+ services, including an E-HPC product for heterogeneous cluster resource scheduling, account management, and elastic scaling. Using Alibaba Cloud's Container Service, users can enjoy one-click deployment, distributed training, and elastic scaling, and use XDL to analyze behavior data. Finally, users can take advantage of Alibaba Cloud's self-developed GPU assembler to optimize and enhance application performance, increase the utilization of heterogeneous computing equipment, and reduce resource procurement costs.

Yunqi: Can you share some of your thoughts about heterogeneous computing? What valuable experience have you gained at work?

Dr. Zhang: Artificial intelligence and other emerging application fields demand computing capacity that already exceeds the growth of CPU power as described by Moore's Law. Moreover, the rate of performance growth for heterogeneous computing can satisfy the needs of these emerging trends. It is likely that, in the future, heterogeneous computing will occupy an increasing share of data center resources. China has recently issued an artificial intelligence development plan, elevating AI to the level of national strategy. In the future, this will promote comprehensive upgrades to national industries and social progress based on artificial intelligence. Naturally, such activities will involve heterogeneous computing, as this technology is essential to artificial intelligence. In our work, we have two main goals. First, we are committed to transforming heterogeneous computing into a computing resource users can easily purchase and use. To this end, we want to provide the most comprehensive range of heterogeneous computing products and solutions. Second, we are committed to giving users the means to make good use of heterogeneous resources. This way, users can take full advantage of the processing capabilities of these resources to make their products more competitive. We want to promote the transformation of heterogeneous computing into a universal computing capability. In this way, we will also spur the development of artificial intelligence, thereby driving industrial upgrades and social progress. In the end, this will change how we make things and live our lives.

Yunqi: In your opinion, what new changes will the heterogeneous computing field embrace in the future?

Dr. Zhang: GPU processors are the mainstream in the heterogeneous computing field. Going forward, as the FPGA ecosystem came into existence and continues to grow, and ASIC chip technology gradually matures, the world of heterogeneous computing will present a tripartite division among GPU, FPGA, and ASIC chip technologies. These technologies have their respective unique advantages and applications, as well as their very own customer base. In the future, Alibaba Cloud will release more products to expand the heterogeneous computing product family. These will include 8-card/16-card GPU products, next-gen Volta architecture GPU products, and a new generation of FPGA products. In addition, R&D is underway for cloud-based ASIC chip products.

Yunqi: What do you want to share with attendees during this Computing Conference? Can you give us a preview of the topics your will discuss and tell us why you chose them?

Dr. Zhang: During this Computing Conference, we will launch several important products, in the fields of heterogeneous computing, general-purpose computing, and high-performance computing, among others. These products are designed to give users a better experience by resolving some of their challenges, including cluster management and scheduling problems, license problems when elastically using paid software on the cloud, instances that need both the flexibility of virtual hosts and the performance of physical hosts, and the use of multi-host, multi-card distributed training for shorter training times. I hope those interested can pay close attention to the special forums on heterogeneous computing, virtualization technology, and elastic computing to be held during the Computing Conference.

相关实践学习
部署Stable Diffusion玩转AI绘画(GPU云服务器)
本实验通过在ECS上从零开始部署Stable Diffusion来进行AI绘画创作,开启AIGC盲盒。
目录
相关文章
|
3月前
|
机器学习/深度学习 弹性计算 编解码
阿里云服务器计算架构X86/ARM/GPU/FPGA/ASIC/裸金属/超级计算集群有啥区别?
阿里云服务器ECS提供了多种计算架构,包括X86、ARM、GPU/FPGA/ASIC、弹性裸金属服务器及超级计算集群。X86架构常见且通用,适合大多数应用场景;ARM架构具备低功耗优势,适用于长期运行环境;GPU/FPGA/ASIC则针对深度学习、科学计算、视频处理等高性能需求;弹性裸金属服务器与超级计算集群则分别提供物理机级别的性能和高速RDMA互联,满足高性能计算和大规模训练需求。
134 6
|
8月前
|
存储 机器学习/深度学习 并行计算
阿里云服务器X86计算、Arm计算、GPU/FPGA/ASIC、高性能计算架构区别
在我们选购阿里云服务器的时候,云服务器架构有X86计算、ARM计算、GPU/FPGA/ASIC、弹性裸金属服务器、高性能计算可选,有的用户并不清楚他们之间有何区别,本文主要简单介绍下不同类型的云服务器有何不同,主要特点及适用场景有哪些。
阿里云服务器X86计算、Arm计算、GPU/FPGA/ASIC、高性能计算架构区别
|
机器学习/深度学习 调度 芯片
快速入门数字芯片设计,UCSD ECE111(九)FPGA vs ASIC(一)
快速入门数字芯片设计,UCSD ECE111(九)FPGA vs ASIC
118 0
|
机器学习/深度学习 弹性计算 并行计算
阿里云服务器租用收费价格参考,GPU/FPGA/ASIC架构云服务器收费价格表
GPU/FPGA/ASIC架构阿里云服务器有GPU计算型gn7r、GPU计算型gn7i、GPU计算型gn6v、GPU计算型gn6i等实例规格可选,不同实例规格的租用收费价格是不一样的,本文为大家汇总了目前基于GPU/FPGA/ASIC架构下的各个实例规格的阿里云服务器收费标准,以供参考。
阿里云服务器租用收费价格参考,GPU/FPGA/ASIC架构云服务器收费价格表
|
人工智能 弹性计算 算法
阿里云异构计算类云产品相关知识大全(GPU云服务器、FPGA云服务器等)
阿里云异构计算云服务器产品可为用户提供了软件与硬件结合的完整服务体系,助力您在人工智能业务中实现资源的灵活分配、弹性扩展、算力的提升以及成本的控制。异构计算类云产品包括GPU云服务器、神龙AI加速引擎AIACC、AI分布式训练通信优化库AIACC-ACSpeed、AI训练计算优化编译器AIACC-AGSpeed、集群极速部署工具FastGPU、GPU容器共享技术cGPU、弹性加速计算实例EAIS和FPGA云服务器。
阿里云异构计算类云产品相关知识大全(GPU云服务器、FPGA云服务器等)
|
存储 前端开发 芯片
快速入门数字芯片设计,UCSD ECE111(九)FPGA vs ASIC(三)
快速入门数字芯片设计,UCSD ECE111(九)FPGA vs ASIC(三)
127 0
|
芯片 C++ 异构计算
快速入门数字芯片设计,UCSD ECE111(九)FPGA vs ASIC(二)
快速入门数字芯片设计,UCSD ECE111(九)FPGA vs ASIC(二)
91 0
|
12天前
|
算法 数据安全/隐私保护 异构计算
基于FPGA的变步长LMS自适应滤波器verilog实现,包括testbench
### 自适应滤波器仿真与实现简介 本项目基于Vivado2022a实现了变步长LMS自适应滤波器的FPGA设计。通过动态调整步长因子,该滤波器在收敛速度和稳态误差之间取得良好平衡,适用于信道均衡、噪声消除等信号处理应用。Verilog代码展示了关键模块如延迟单元和LMS更新逻辑。仿真结果验证了算法的有效性,具体操作可参考配套视频。
106 74
|
1月前
|
算法 数据安全/隐私保护 异构计算
基于FPGA的16QAM调制+软解调系统,包含testbench,高斯信道模块,误码率统计模块,可以设置不同SNR
本项目基于FPGA实现了16QAM基带通信系统,包括调制、信道仿真、解调及误码率统计模块。通过Vivado2019.2仿真,设置不同SNR(如8dB、12dB),验证了软解调相较于传统16QAM系统的优越性,误码率显著降低。系统采用Verilog语言编写,详细介绍了16QAM软解调的原理及实现步骤,适用于高性能数据传输场景。
146 69
|
1月前
|
移动开发 算法 数据安全/隐私保护
基于FPGA的QPSK调制+软解调系统,包含testbench,高斯信道模块,误码率统计模块,可以设置不同SNR
本文介绍了基于FPGA的QPSK调制解调系统,通过Vivado 2019.2进行仿真,展示了在不同信噪比(SNR=1dB, 5dB, 10dB)下的仿真效果。与普通QPSK系统相比,该系统的软解调技术显著降低了误码率。文章还详细阐述了QPSK调制的基本原理、信号采样、判决、解调及软解调的实现过程,并提供了Verilog核心程序代码。
73 26

热门文章

最新文章