PostgreSQL 10.0 preview 性能增强 - libpq支持pipeline batch模式减少网络交互提升性能

简介:

标签

PostgreSQL , 10.0 , libpq , pipeline , batch


背景

PostgreSQL 10.0 libpq支持pipeline batch两种模式,batch模式意味着客户端可以将多个QUERY塞入pipeline,作为一个batch提交给server段,从而减少客户端和服务端的网络交互次数。

在网络环境不太好的环境中,特别是云环境,大幅提升性能。

+   <application>libpq</application> supports queueing up mulitiple queries into  
+   a pipeline to be executed as a batch on the server. Batching queries allows  
+   applications to avoid a client/server round-trip after each query to get  
+   the results before issuing the next query.  

详见

Hi all  

Following on from the foreign table batch inserts thread[1], here's a patch  
to add support for pipelining queries into asynchronous batches in libpq.  

Attached, and also available at  
https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch (subject  
to rebasing and force pushes).  

It's cleaned up over the draft I posted on that thread and has error  
recovery implemented. I've written and included the SGML docs for it. The  
test program is now pretty comprehensive, more so than for anything else in  
libpq anyway. I'll submit it to the next CF as a 9.7/10.0 candidate.  

I'm measuring 300x (not %) performance improvements doing batches on  
servers over the Internet, so this seems pretty worthwhile. It turned out  
to be way less invasive than I expected too.  

(I intentionally didn't add any way for clients to annotate each work-item  
in a batch with their own private data. I think that'd be really useful and  
would make implementing clients easier, but should be a separate patch).  

This should be very useful for optimising FDWs, Postgres-XC, etc.  


[1]  
http://www.postgresql.org/message-id/CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com  

--   
 Craig Ringer                   http://www.2ndQuadrant.com/  
 PostgreSQL Development, 24x7 Support, Training & Services  

一些例子

+  <para>  
+   An example of batch use may be found in the source distribution in  
+   <filename>src/test/examples/libpqbatch.c</filename>.  
+  </para>  
+  
+  <sect2>  
+   <title>When to use batching</title>  
+  
+   <para>  
+    Much like asynchronous query mode, there is no performance disadvantage to  
+    using batching and pipelining. It somewhat increased client application  
+    complexity and extra caution is required to prevent client/server network  
+    deadlocks, but can offer considerable performance improvements.  
+   </para>  
+  
+   <para>  
+    Batching is most useful when the server is distant, i.e. network latency  
+    ("ping time") is high, and when many small operations are being performed in  
+    rapid sequence. There is usually less benefit in using batches when each  
+    query takes many multiples of the client/server round-trip time to execute.  
+    A 100-statement operation run on a server 300ms round-trip-time away would take  
+    30 seconds in network latency alone without batching; with batching it may spend  
+    as little as 0.3s waiting for results from the server.  
+   </para>  
+  
+   <para>  
+    Use batches when your application does lots of small  
+    <literal>INSERT</literal>, <literal>UPDATE</literal> and  
+    <literal>DELETE</literal> operations that can't easily be transformed into  
+    operations on sets or into a  
+    <link linkend="libpq-copy"><literal>COPY</literal></link> operation.  
+   </para>  
+  
+   <para>  
+    Batching less useful when information from one operation is required by the  
+    client before it knows enough to send the next operation. The client must  
+    introduce a synchronisation point and wait for a full client/server  
+    round-trip to get the results it needs. However, it's often possible to  
+    adjust the client design to exchange the required information server-side.  
+    Read-modify-write cycles are especially good candidates; for example:  
+    <programlisting>  
+     BEGIN;  
+     SELECT x FROM mytable WHERE id = 42 FOR UPDATE;  
+     -- result: x=2  
+     -- client adds 1 to x:  
+     UPDATE mytable SET x = 3 WHERE id = 42;  
+     COMMIT;  
+    </programlisting>  
+    could be much more efficiently done with:  
+    <programlisting>  
+     UPDATE mytable SET x = x + 1;  
+    </programlisting>  
+   </para>  
+  
+   <note>  
+    <para>  
+     The batch API was introduced in PostgreSQL 9.6, but clients using it can  
+     use batches on server versions 8.4 and newer. Batching works on any server  
+     that supports the v3 extended query protocol.  
+    </para>  
+   </note>  
+  
+  </sect2>  
+  
+  <sect2 id="libpq-batch-using">  
+   <title>Using batch mode</title>  
+  
+   <para>  
+    To issue batches the application must switch  
+    <application>libpq</application> into batch mode. Enter batch mode with <link  
+    linkend="libpq-pqbeginbatchmode"><function>PQbeginBatchMode(conn)</function></link> or test  
+    whether batch mode is active with <link  
+    linkend="libpq-pqisinbatchmode"><function>PQisInBatchMode(conn)</function></link>. In batch mode only <link  
+    linkend="libpq-async">asynchronous operations</link> are permitted, and  
+    <literal>COPY</literal> is not allowed. (The restriction on <literal>COPY</literal> is an implementation  
+    limit; the PostgreSQL protocol and server can support batched <literal>COPY</literal>).  
+   </para>  
+  
+   <para>  
+    The client uses libpq's asynchronous query functions to dispatch work,  
+    marking the end of each batch with <function>PQsendEndBatch</function>.  
+    Concurrently, it uses <function>PQgetResult</function> and  
+    <function>PQgetNextQuery</function> to get results. It may eventually exit  
+    batch mode with <function>PQendBatchMode</function> once all results are  
+    processed.  
+   </para>  
+  
+   <note>  
+    <para>  
+     It is best to use batch mode with <application>libpq</application> in  
+     <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in  
+     blocking mode it is possible for a client/server deadlock to occur. The  
+     client will block trying to send queries to the server, but the server will  
+     block trying to send results from queries it's already processed to the  
+     client. This only occurs when the client sends enough queries to fill its  
+     output buffer and the server's receive buffer before switching to  
+     processing input from the server, but it's hard to predict exactly when  
+     that'll happen so it's best to always use non-blocking mode.  
+    </para>  
+   </note>  
+  
+   <sect3 id="libpq-batch-sending">  
+    <title>Issuing queries</title>  
+  
+    <para>  
+     After entering batch mode the application dispatches requests  
+     using normal asynchronous <application>libpq</application> functions like  
+     <function>PQsendQueryParams</function>, <function>PQsendPrepare</function>,  
+     etc. The asynchronous requests are followed by a <link  
+     linkend="libpq-pqsendendbatch"><function>PQsendEndBatch(conn)</function></link> call to mark  
+     the end of the batch. The client <emphasis>does not</emphasis> need to call  
+     <function>PQgetResult</function> immediately after dispatching each  
+     operation. <link linkend="libpq-batch-results">Result processing</link>  
+     is handled separately.  
+    </para>  
+      
+    <para>  
+     Batched operations will be executed by the server in the order the client  
+     sends them. The server will send the results in the order the statements  
+     executed. The server usually begins executing the batch before all commands  
+     in the batch are queued and the end of batch command is sent. If any  
+     statement encounters an error the server aborts the current transaction and  
+     skips processing the rest of the batch. Query processing resumes after the  
+     end of the failed batch.  
+    </para>  
+  
+    <para>  
+     It's fine for one operation to depend on the results of a  
+     prior one. One query may define a table that the next query in the same  
+     batch uses; similarly, an application may create a named prepared statement  
+     then execute it with later statements in the same batch.  
+    </para>  
+  
+   </sect3>  
+  
+   <sect3 id="libpq-batch-results">  
+    <title>Processing results</title>  
+  
+    <para>  
+     The client <link linkend="libpq-batch-interleave">interleaves result  
+     processing with sending batch queries</link>, or for small batches may  
+     process all results after sending the whole batch.  
+    </para>  
+  
+    <para>  
+     To get the result of the first batch entry the client must call <link  
+     linkend="libpq-pqgetnextquery"><function>PQgetNextQuery</function></link>. It must then call  
+     <function>PQgetResult</function> and handle the results until  
+     <function>PQgetResult</function> returns null (or would return null if  
+     called). The result from the next batch entry may then be retrieved using  
+     <function>PQgetNextQuery</function> and the cycle repeated.  The  
+     application handles individual statement results as normal.  
+    </para>  
+  
+    <para>  
+     <function>PQgetResult</function> behaves the same as for normal asynchronous  
+     processing except that it may contain the new <type>PGresult</type> types  
+     <literal>PGRES_BATCH_END</literal> and <literal>PGRES_BATCH_ABORTED</literal>.  
+     <literal>PGRES_BATCH_END</literal> is reported exactly once for each  
+     <function>PQsendEndBatch</function> call at the corresponding point in  
+     the result stream and at no other time. <literal>PGRES_BATCH_ABORTED</literal>  
+     is emitted during error handling; see <link linkend="libpq-batch-errors">  
+     error handling</link>.  
+    </para>  
+  
+    <para>  
+     <function>PQisBusy</function>, <function>PQconsumeInput</function>, etc  
+     operate as normal when processing batch results.  
+    </para>  
+  
+    <para>  
+     <application>libpq</application> does not provide any information to the  
+     application about the query currently being processed. The application  
+     must keep track of the order in which it sent queries and the expected  
+     results. Applications will typically use a state machine or a FIFO queue  
+     for this.  
+    </para>  

这个patch的讨论,详见邮件组,本文末尾URL。

PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。

参考

https://commitfest.postgresql.org/13/1024/

https://www.postgresql.org/message-id/attachment/44303/0001-Pipelining-batch-support-for-libpq.patch

https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch

https://www.postgresql.org/message-id/flat/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com#CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍如何基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
3月前
|
人工智能 运维 安全
从被动防御到主动免疫进化!迈格网络 “天机” AI 安全防护平台,助推全端防护性能提升
迈格网络推出“天机”新版本,以AI自学习、全端防护、主动安全三大核心能力,重构网络安全防线。融合AI引擎与DeepSeek-R1模型,实现威胁预测、零日防御、自动化响应,覆盖Web、APP、小程序全场景,助力企业从被动防御迈向主动免疫,护航数字化转型。
从被动防御到主动免疫进化!迈格网络 “天机” AI 安全防护平台,助推全端防护性能提升
|
2月前
|
存储 机器学习/深度学习 监控
网络管理监控软件的 C# 区间树性能阈值查询算法
针对网络管理监控软件的高效区间查询需求,本文提出基于区间树的优化方案。传统线性遍历效率低,10万条数据查询超800ms,难以满足实时性要求。区间树以平衡二叉搜索树结构,结合节点最大值剪枝策略,将查询复杂度从O(N)降至O(logN+K),显著提升性能。通过C#实现,支持按指标类型分组建树、增量插入与多维度联合查询,在10万记录下查询耗时仅约2.8ms,内存占用降低35%。测试表明,该方案有效解决高负载场景下的响应延迟问题,助力管理员快速定位异常设备,提升运维效率与系统稳定性。
227 4
|
7月前
|
存储 消息中间件 弹性计算
阿里云服务器ECS计算型c7和通用算力型u1在适用场景、计算性能、网络与存储性能等方面的对比
阿里云ECS服务器u1和c7实例在适用场景、性能、处理器特性等方面存在显著差异。u1为通用算力型,性价比高,适合中小企业及对性能要求不高的场景;c7为企业级计算型,采用最新Intel处理器,性能稳定且强大,适用于高性能计算需求。u1支持多种CPU内存配比,但性能一致性可能受底层平台影响;c7固定调度模式,确保高性能与稳定性。选择时可根据预算与性能需求决定。
392 23
|
6月前
|
机器学习/深度学习 数据采集 监控
基于CNN卷积神经网络和GEI步态能量提取的步态识别算法matlab仿真,对比不同角度下的步态识别性能
本项目基于CNN卷积神经网络与GEI步态能量提取技术,实现高效步态识别。算法使用不同角度(0°、45°、90°)的步态数据库进行训练与测试,评估模型在多角度下的识别性能。核心流程包括步态图像采集、GEI特征提取、数据预处理及CNN模型训练与评估。通过ReLU等激活函数引入非线性,提升模型表达能力。项目代码兼容Matlab2022a/2024b,提供完整中文注释与操作视频,助力研究与应用开发。
|
8月前
|
传感器 存储 算法
基于ECC簇内分组密钥管理算法的无线传感器网络matlab性能仿真
本程序基于ECC(椭圆曲线密码学)簇内分组密钥管理算法,对无线传感器网络(WSN)进行MATLAB性能仿真。通过对比网络通信开销、存活节点数量、网络能耗及数据通信量四个关键指标,验证算法的高效性和安全性。程序在MATLAB 2022A版本下运行,结果无水印展示。算法通过将WSN划分为多个簇,利用ECC生成和分发密钥,降低计算与通信成本,适用于资源受限的传感器网络场景,确保数据保密性和完整性。
|
10月前
|
缓存 小程序 API
微信小程序网络请求与API调用:实现数据交互
本文深入探讨了微信小程序的网络请求与API调用,涵盖`wx.request`的基本用法、常见场景(如获取数据、提交表单、上传和下载文件)及注意事项(如域名配置、HTTPS协议、超时设置和并发限制)。通过一个简单案例,演示了如何实现小程序与服务器的数据交互。掌握这些技能将帮助你构建功能更丰富的应用。
|
人工智能 自然语言处理
WebDreamer:基于大语言模型模拟网页交互增强网络规划能力的框架
WebDreamer是一个基于大型语言模型(LLMs)的网络智能体框架,通过模拟网页交互来增强网络规划能力。它利用GPT-4o作为世界模型,预测用户行为及其结果,优化决策过程,提高性能和安全性。WebDreamer的核心在于“做梦”概念,即在实际采取行动前,用LLM预测每个可能步骤的结果,并选择最有可能实现目标的行动。
332 1
WebDreamer:基于大语言模型模拟网页交互增强网络规划能力的框架
|
12月前
|
数据采集 网络协议 JavaScript
网络爬虫性能提升:requests.Session的会话持久化策略
网络爬虫性能提升:requests.Session的会话持久化策略
|
存储 缓存 监控
Docker容器性能调优的关键技巧,涵盖CPU、内存、网络及磁盘I/O的优化策略,结合实战案例,旨在帮助读者有效提升Docker容器的性能与稳定性。
本文介绍了Docker容器性能调优的关键技巧,涵盖CPU、内存、网络及磁盘I/O的优化策略,结合实战案例,旨在帮助读者有效提升Docker容器的性能与稳定性。
1157 7
|
机器学习/深度学习
YOLOv10优改系列一:YOLOv10融合C2f_Ghost网络,让YoloV10实现性能的均衡
本文介绍了YOLOv10的性能优化,通过融合Ghost模块和C2f结构,实现了网络性能的均衡。GhostNet通过GhostModule和GhostBottleNeck减少参数量,适用于资源有限的场景。YOLOv10-C2f_Ghost在减少参数和计算量的同时,保持了与原始网络相当或更好的性能。文章还提供了详细的代码修改步骤和可能遇到的问题解决方案。
1850 1
YOLOv10优改系列一:YOLOv10融合C2f_Ghost网络,让YoloV10实现性能的均衡

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版
  • 推荐镜像

    更多