PolarDB 开源基础教程系列 6 开源插件扩展-阿里云开发者社区

安装与使用PostgreSQL开源插件/工具

实验环境:

mkdir ~/pb11_1  
cd ~/pb11_1  
docker run -d -it -P --shm-size=1g \
--cap-add=SYS_PTRACE --cap-add SYS_ADMIN \
--privileged=true --name pb11_1 \
-v ${PWD}:/var/polardb \
registry.cn-hangzhou.aliyuncs.com/polardb_pg/polardb_pg_local_instance:11 bash

本节阅读导航

1、当前环境已安装并支持哪些插件

2、AI外脑插件: vector

3、营销场景目标人群圈选插件: smlar

1、当前环境已安装并支持哪些插件

通过pg_available_extensions视图可以查看当前数据库支持哪些插件, 安装了哪些插件?

postgres=# select * from pg_available_extensions;  
             name             | default_version | installed_version |                                                       comment                                                         
------------------------------+-----------------+-------------------+---------------------------------------------------------------------------------------------------------------------  
 postgres_fdw                 | 1.0             |                   | foreign-data wrapper for remote PostgreSQL servers  
 postgis                      | 3.3.2           |                   | PostGIS geometry and geography spatial types and functions  
 polar_monitor_preload        | 1.1             |                   | examine the polardb information  
 jsonb_plpython3u             | 1.0             |                   | transform between jsonb and plpython3u  
 ltree                        | 1.1             |                   | data type for hierarchical tree-like structures  
 tablefunc                    | 1.0             |                   | functions that manipulate whole tables, including crosstab  
 polar_csn                    | 1.0             |                   | polar_csn  
 plperl                       | 1.0             |                   | PL/Perl procedural language  
 polar_worker                 | 1.0             |                   | polar_worker  
 btree_gin                    | 1.3             |                   | support for indexing common datatypes in GIN  
 polar_stat_sql               | 1.3             |                   | Kernel statistics gathering, and sql plan nodes information gathering  
 address_standardizer         | 3.3.2           |                   | Used to parse an address into constituent elements. Generally used to support geocoding address normalization step.  
 btree_gist                   | 1.5             |                   | support for indexing common datatypes in GiST  
 earthdistance                | 1.1             |                   | calculate great-circle distances on the surface of the Earth  
 pltclu                       | 1.0             |                   | PL/TclU untrusted procedural language  
 amcheck                      | 1.1             |                   | functions for verifying relation integrity  
 ltree_plpython3u             | 1.0             |                   | transform between ltree and plpython3u  
 isn                          | 1.2             |                   | data types for international product numbering standards  
 tcn                          | 1.0             |                   | Triggered change notifications  
 pg_visibility                | 1.2             |                   | examine the visibility map (VM) and page-level visibility info  
 address_standardizer_data_us | 3.3.2           |                   | Address Standardizer US dataset example  
 sslinfo                      | 1.2             |                   | information about SSL certificates  
 polar_parameter_check        | 1.0             |                   | kernel extension for parameter validation  
 intagg                       | 1.1             |                   | integer aggregator and enumerator (obsolete)  
 insert_username              | 1.0             |                   | functions for tracking who changed a table  
 moddatetime                  | 1.0             |                   | functions for tracking last modification time  
 pageinspect                  | 1.7             |                   | inspect the contents of database pages at a low level  
 pg_trgm                      | 1.4             |                   | text similarity measurement and index searching based on trigrams  
 pg_prewarm                   | 1.2             |                   | prewarm relation data  
 ltree_plpython2u             | 1.0             |                   | transform between ltree and plpython2u  
 refint                       | 1.0             |                   | functions for implementing referential integrity (obsolete)  
 hstore_plpython3u            | 1.0             |                   | transform between hstore and plpython3u  
 postgis_tiger_geocoder       | 3.3.2           |                   | PostGIS tiger geocoder and reverse geocoder  
 adminpack                    | 2.0             |                   | administrative functions for PostgreSQL  
 pg_freespacemap              | 1.2             |                   | examine the free space map (FSM)  
 fuzzystrmatch                | 1.1             |                   | determine similarities and distance between strings  
 polar_tde_utils              | 1.0             |                   | Internal extension for TDE  
 lo                           | 1.1             |                   | Large Object maintenance  
 citext                       | 1.5             |                   | data type for case-insensitive character strings  
 bloom                        | 1.0             |                   | bloom access method - signature file based index  
 polar_vfs                    | 1.0             |                   | polar_vfs  
 pgrowlocks                   | 1.2             |                   | show row-level locking information  
 polar_monitor                | 1.2             |                   | examine the polardb information  
 timescaledb                  | 2.3.1           |                   | Enables scalable inserts and complex queries for time-series data  
 pltcl                        | 1.0             |                   | PL/Tcl procedural language  
 hstore_plperlu               | 1.0             |                   | transform between hstore and plperlu  
 hstore                       | 1.5             |                   | data type for storing sets of (key, value) pairs  
 timetravel                   | 1.0             |                   | functions for implementing time travel  
 plperlu                      | 1.0             |                   | PL/PerlU untrusted procedural language  
 smlar                        | 1.0             |                   | compute similary of any one-dimensional arrays  
 hstore_plperl                | 1.0             |                   | transform between hstore and plperl  
 jsonb_plpythonu              | 1.0             |                   | transform between jsonb and plpythonu  
 dict_int                     | 1.0             |                   | text search dictionary template for integers  
 jsonb_plperlu                | 1.0             |                   | transform between jsonb and plperlu  
 uuid-ossp                    | 1.1             |                   | generate universally unique identifiers (UUIDs)  
 autoinc                      | 1.0             |                   | functions for autoincrementing fields  
 postgis_topology             | 3.3.2           |                   | PostGIS topology spatial types and functions  
 dict_xsyn                    | 1.0             |                   | text search dictionary template for extended synonym processing  
 pgstattuple                  | 1.5             |                   | show tuple-level statistics  
 tsm_system_rows              | 1.0             |                   | TABLESAMPLE method which accepts number of rows as a limit  
 file_fdw                     | 1.0             |                   | foreign-data wrapper for flat file access  
 jsonb_plperl                 | 1.0             |                   | transform between jsonb and plperl  
 vector                       | 0.5.1           |                   | vector data type and ivfflat and hnsw access methods  
 dblink                       | 1.2             |                   | connect to other PostgreSQL databases from within a database  
 seg                          | 1.3             |                   | data type for representing line segments or floating-point intervals  
 pgcrypto                     | 1.3             |                   | cryptographic functions  
 plpython3u                   | 1.0             |                   | PL/Python3U untrusted procedural language  
 jsonb_plpython2u             | 1.0             |                   | transform between jsonb and plpython2u  
 plpgsql                      | 1.0             | 1.0               | PL/pgSQL procedural language  
 hstore_plpythonu             | 1.0             |                   | transform between hstore and plpythonu  
 ltree_plpythonu              | 1.0             |                   | transform between ltree and plpythonu  
 unaccent                     | 1.1             |                   | text search dictionary that removes accents  
 cube                         | 1.4             |                   | data type for multidimensional cubes  
 polar_px                     | 1.0             |                   | Parallel Execution extension  
 tsm_system_time              | 1.0             |                   | TABLESAMPLE method which accepts time in milliseconds as a limit  
 intarray                     | 1.2             |                   | functions, operators, and index support for 1-D arrays of integers  
 postgis_raster               | 3.3.2           |                   | PostGIS raster types and functions  
 hstore_plpython2u            | 1.0             |                   | transform between hstore and plpython2u  
 pg_buffercache               | 1.3             |                   | examine the shared buffer cache  
 pg_stat_statements           | 1.6             |                   | track execution statistics of all SQL statements executed  
 xml2                         | 1.1             |                   | XPath querying and XSLT  
(81 rows)

2、AI外脑插件: vector

开源项目地址: https://github.com/pgvector/pgvector

向量索引的原理、召回与参数的关系、向量索引优化详情, 请参考如下:

vector插件顾名思义是一种支持向量数据类型的插件, 每一个向量由一组浮点数组成, 代表一个特征值, 通常用来进行基于特征相似性的检索, 例如相似图像, 相似文本, 相似音频, 相似视频等. 和大模型结合起来, 可以用于存储知识库/标准答案/优质答案的向量, 作为大模型的外脑, 在对话过程中, 从向量中提取与问题相关的文本, 发给大模型可以让大模型具备更好的解题能力. 例如

在私有知识库领域, 由于大模型未训练过私有知识素材, 所以大模型无法处理私有领域的问题, 有了向量数据库, 可以在问答过程中给大模型投喂相关知识, 让大模型可以具备回复未知领域知识的能力.
也可用来解决大模型幻觉问题, 通过提取问题的相似文本, 让大模型在限定的相似文本内容范围内进行回复, 从而解决大模型幻觉问题. 例如将史书的文本切分后录入向量数据库, 在回答问题时, 从向量数据库提取与问题相近的史书文本, 限定大模型在这些文本范围内回答问题, 从而得到与史书匹配的答案.

vector除了提供向量类型, 还支持了配套的索引, 操作符等来加速向量相似的排序操作.

用法举例.

创建插件

postgres=# create extension vector ;   
CREATE EXTENSION

创建向量表

-- 在实际使用过程中, 从非结构化数据转换为向量, 可以通过云厂商API, 或者使用开源工具  
create table tbl_vec (  
  id int primary key,  
  info text,  
  vec vector(512)  
);

生成512维度模拟向量10万条

-- 生成随机向量的函数  
create or replace function gen_rand_vec(int) returns vector as $$  
  select array_to_vector(array_agg((random()*100)::real), $1, true) from generate_series(1,$1);  
$$ language sql strict;   
  
insert into tbl_vec select generate_series(1,100000), md5(random()::text), gen_rand_vec(512);

实际使用中需要使用embedding模型来生成向量, 例如 https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings

curl http://localhost:11434/api/embed -d '{
  "model": "mxbai-embed-large:latest",
  "input": "Why is the sky blue?"
}'
# 表结构建议
create table t_vec (
  id int,
  model_name text,  -- embedding 模型名:size
  content text,    -- 原始内容(例如文本、视频、音频等)
  content_vec vector(N),  -- 经过embedding模型, 将原始内容转换为对应向量. 
  other ...
);
# 建议一张表内的数据采用同一个embedding模型进行转换, 保证所有原始数据都被映射到相同的向量空间中.

算子

postgres=# \do+  
                                                List of operators  
 Schema | Name | Left arg type | Right arg type |   Result type    |           Function            | Description   
--------+------+---------------+----------------+------------------+-------------------------------+-------------  
 public | *    | vector        | vector         | vector           | vector_mul                    |   
 public | +    | vector        | vector         | vector           | vector_add                    |   
 public | -    | vector        | vector         | vector           | vector_sub                    |   
 public | <    | vector        | vector         | boolean          | vector_lt                     |   
 public | <#>  | vector        | vector         | double precision | vector_negative_inner_product |   
 public | <->  | vector        | vector         | double precision | l2_distance                   |   
 public | <=   | vector        | vector         | boolean          | vector_le                     |   
 public | <=>  | vector        | vector         | double precision | cosine_distance               |   
 public | <>   | vector        | vector         | boolean          | vector_ne                     |   
 public | =    | vector        | vector         | boolean          | vector_eq                     |   
 public | >    | vector        | vector         | boolean          | vector_gt                     |   
 public | >=   | vector        | vector         | boolean          | vector_ge                     |   
(12 rows)

搜索某个向量的cosine_distance相似向量

postgres=# select ''''||gen_rand_vec(512)::text||'''' as v1 \gset  
  
postgres=# \echo :v1  
'[42.3123,49.094,36.027,97.4979,51.5076,51.8525,89.3701,98.1134,41.6966,88.2519,98.8663,38.4313,68.9565,82.9641,76.3079,32.8708,93.7287,13.7805,59.3289,89.8906,54.0369,39.8219,16.9884,94.0551,9.97198,95.7318,10.5824,25.3164,90.8695,49.5991,55.4999,33.1818,98.6931,91.5269,30.6798,50.2007,43.3794,20.0499,48.3141,85.0761,8.30179,47.1804,23.5074,77.2583,30.1445,99.8153,10.1291,23.8732,13.5958,69.458,13.7638,67.6327,9.27985,30.7522,61.6878,19.2518,26.4839,72.2703,44.5683,17.3534,21.8694,0.0681802,50.5353,20.5625,91.5951,81.215,70.7632,34.9745,1.26487,19.0773,20.0506,9.56666,66.2576,43.558,86.8249,96.4021,43.3733,96.954,20.2753,56.9691,66.412,34.0391,24.6018,75.6919,64.7913,86.2896,94.9437,91.2752,58.5599,39.512,8.62865,80.4293,39.5801,59.1639,0.991781,31.1752,40.3789,71.755,66.1498,41.6438,90.8322,86.2004,51.2104,57.0899,29.7584,38.0354,53.492,73.1317,34.9894,73.7673,30.1009,1.40146,7.80648,54.7027,77.0933,72.5978,40.9923,72.037,63.873,99.5522,11.549,72.5016,79.9815,51.1291,31.6656,80.9733,82.3044,72.0445,52.7283,48.4542,13.6883,43.5605,34.6546,64.8987,0.650352,64.413,2.9341,54.1424,37.5447,37.9235,27.9097,67.6456,39.325,35.7162,22.3482,16.4183,8.31396,63.3406,88.4553,72.187,62.8928,0.00434043,44.6886,42.8743,51.1335,76.3542,23.8476,33.4378,48.3986,76.5758,81.892,62.0869,20.1363,16.5466,26.9856,20.7867,80.9596,29.9197,74.929,18.5043,67.8432,2.83872,86.1499,7.16822,38.5549,8.49812,23.5865,46.8689,71.8387,12.0419,19.0558,34.7314,12.0462,63.7444,77.6057,63.1797,40.0986,1.45328,96.6175,88.4972,78.0291,78.5096,50.5841,98.1654,95.0561,77.5697,18.9521,76.0157,7.48943,93.8811,94.52,75.3327,96.7198,80.6698,82.5009,35.2747,89.168,6.08741,82.1436,61.0066,18.1293,1.19938,95.7381,30.1755,64.9438,73.3438,93.3552,5.04243,74.7971,89.9728,93.5397,52.8262,68.4823,44.1238,50.9916,63.5384,21.6935,69.9436,39.5541,29.183,63.8247,34.0741,4.51563,60.5445,14.7439,87.0165,95.8192,3.91189,93.1039,77.9628,64.9185,11.2332,79.1622,60.6566,41.4087,44.106,34.0004,34.7639,49.1484,8.79743,24.7367,42.6881,61.6236,93.219,86.8119,12.6152,56.7574,8.50542,82.5588,96.3116,37.6884,46.3836,30.3857,42.204,6.92808,45.1296,29.2205,2.74729,49.0415,22.3244,80.7101,13.96,33.5576,59.8722,74.6166,74.9664,3.97819,8.61693,9.73029,53.1266,17.4144,34.467,95.8147,79.038,27.686,82.6265,91.6531,84.4434,91.132,74.212,80.7549,28.8203,20.5955,11.1406,71.0244,27.5236,56.2702,0.244898,30.2709,5.31167,22.5693,10.981,19.2717,56.127,70.8532,93.8882,31.0934,74.8314,2.50517,40.8236,27.958,19.9195,75.2906,23.7726,98.9575,2.97656,6.39915,90.6106,87.4199,97.5311,64.8226,68.1749,26.3514,85.4181,79.3155,97.3758,12.9417,35.5857,97.6207,43.2126,40.8973,20.19,54.1935,60.169,76.317,25.0467,54.0572,7.41039,99.8781,56.5624,48.234,27.8361,76.4819,23.5246,51.6087,75.4394,26.5012,58.0078,66.05,13.9211,55.5389,30.8726,82.096,81.8904,16.2907,61.4114,79.2662,29.2324,96.9971,76.8869,72.445,37.8944,97.0769,26.6385,98.0634,73.3939,51.6852,52.1206,80.8043,51.5633,8.68304,29.0384,79.3994,85.165,52.563,31.0081,60.6044,79.0642,89.0159,26.6544,92.9853,44.5548,57.527,75.0813,26.4451,73.8177,36.4927,5.7113,3.05006,33.4898,82.5982,75.495,71.3843,79.6751,2.13354,69.4477,53.069,53.8188,21.5683,33.8733,5.38213,30.2514,62.9117,84.7815,15.4163,15.4747,15.7896,76.0207,94.5388,4.80547,2.67508,87.5241,49.3603,60.2021,62.6054,75.8054,34.0197,99.0982,81.5167,37.0698,32.588,64.1149,12.5648,3.97224,43.79,14.6984,73.4199,96.859,68.5172,94.9882,30.7323,73.8993,25.2396,93.644,58.6808,40.6559,9.11869,74.4704,16.6766,3.65755,79.2759,19.3517,91.1817,28.6361,79.5537,53.7871,4.44154,13.5735,52.8853,85.9583,50.6433,85.4733,50.0731,63.2081,89.4455,93.8631,77.9065,62.8654,90.7221,46.4236,57.8536,21.4544,20.3229,83.0932,15.0984,79.0037,23.7491,24.2171,53.4741,40.4257,27.8746,32.7499,59.7774,19.0563,61.3861,39.3311,72.8434,65.8276,52.9046,25.7287,51.7859,3.54789,11.202,1.85898,66.756,0.647476,95.7221,44.6624,63.5129,86.4441,91.086,21.3665,7.89851,11.4089,4.45974,22.9969,90.4126,28.2089,47.214,43.8867,68.6346,75.0886,76.6366,28.412,94.1449,38.0227,67.7431,66.9884,3.85029,20.6477,92.7171,55.6361]'  
  
postgres=# \timing on  
Timing is on.   
postgres=# select id,info, vec <=> :v1 from tbl_vec order by vec <=> :v1 limit 10;  
  id   |               info               |     ?column?        
-------+----------------------------------+-------------------  
 33990 | 73357616b73392961610fd4056df7872 | 0.200381241291853  
 65206 | f7126017dee115ba292f5fc1e6fe094c | 0.202510531635712  
 28629 | 47a751432609ac468a021c9af3666c8b | 0.202872040131985  
 78789 | 8ca283c6f2719e0ecfb5ed7f33285bfd |  0.20292308001207  
 25061 | d6d02501fef41890b6e999bd8ef45d4e | 0.203712549333763  
 65012 | 21dee0e74a8f547d4a8c4e9e9cea42b2 |  0.20454229165218  
 82237 | ad49dd1204e19e04da0428fb28ce5c64 | 0.204661291873772  
 27672 | 8aa2bea9608899ab8e13866dd2387f47 |  0.20499257967393  
 26030 | 1c94ae0fe923bd775278aff7dd7650d0 | 0.205011457373235  
  5666 | 237ae0154cf6ca0e87a90fe85c20765b | 0.205843957675826  
(10 rows)  
  
Time: 140.098 ms

创建向量索引

postgres=# set maintenance_work_mem ='512MB';  
  
postgres=# CREATE INDEX ON tbl_vec USING hnsw (vec vector_cosine_ops);  
CREATE INDEX  
Time: 126018.043 ms (02:06.018)

使用索引加速搜索某个向量的cosine_distance相似向量

postgres=# select id,info, vec <=> :v1 from tbl_vec order by vec <=> :v1 limit 10;  
  id   |               info               |     ?column?        
-------+----------------------------------+-------------------  
 28629 | 47a751432609ac468a021c9af3666c8b | 0.202872040131985  
  3843 | 16e453bdc6e7cce238b649d0d162c7c7 | 0.206543485852313  
 57218 | 0fa2b675124b3f1c3bd1fc26203c5eaf |  0.20931599952677  
 58681 | 562db2b3b7851727fa2b8f1ba15788fc | 0.211436702245081  
 84409 | d5432e2c5c010e49fbf8927f8d881365 | 0.211825938887412  
 67532 | 90a3511dc90684f22ef9756af142dc7a | 0.211948480104953  
 27195 | 6b6d2b3157eae64168b8d419da0fb32b | 0.212202785190134  
 49533 | 4b31ad2839436ce5bdc86358ec3bf2f2 | 0.212349440226144  
 62096 | d2adff2b3322fd98419e3c7cef737b05 | 0.212489664916925  
 47447 | 0df1dcdea170add8cd3f987e79638ae6 | 0.213359929199801  
(10 rows)  
  
Time: 9.736 ms

3、营销场景目标人群圈选插件: smlar

开源项目地址: https://github.com/jirutka/smlar

smlar插件是GIN索引的一种功能扩展, 可以通过索引快速判断数组的相似性(根据数组元素的相交个数, 或者相交个数在整体集合的占比). 如果数组的每个元素代表一个特征标签, 在电商、短视频、社交等业务场景可以给商品、视频、用户贴标签, 根据标签(数组)的相似性, 圈选目标人群、圈选感兴趣的内容等.

用法举例.

创建插件

postgres=# create extension smlar ;  
CREATE EXTENSION

创建用户标签表

create table users (  
  id int primary key,  
  nick text,  
  tags int[]  -- 假设使用int数组来表示用户拥有的标签  
);

生成10万用户和随机标签

-- 生成随机标签的函数  
create or replace function gen_rand_arr (tags int, tags_catacity int) returns int[] as $$  
  select array(select ceil(random() * tags_catacity)::int from generate_series(1, tags));  
$$ language sql strict;  
  
postgres=# select gen_rand_arr(10,1000);  
               gen_rand_arr                  
-------------------------------------------  
 {280,703,893,376,153,753,458,119,141,133}  
(1 row)  
  
-- 假设每人10-100个标签, 标签总数1000个.  
insert into users select generate_series(1,100000), md5(random()::text), gen_rand_arr(9 + ceil(random()*91)::int, 1000) ;

根据某个标签圈选10个目标用户

postgres=# select show_smlar_limit();  
 show_smlar_limit   
------------------  
              0.6  
(1 row)  
  
postgres=# show smlar.type;    
 smlar.type   
------------  
 cosine  
(1 row)  
  
postgres=# \do+  
                                   List of operators  
 Schema | Name | Left arg type | Right arg type | Result type | Function | Description   
--------+------+---------------+----------------+-------------+----------+-------------  
 public | %    | anyarray      | anyarray       | boolean     | smlar_op |   
(1 row)  
  
postgres=# select ''''||tags::text||'''' as v1 from users where id=1 \gset  
  
postgres=# \echo :v1  
'{74,668,277,6,673,238,400,611,573,737,559,615,266,502,114,401,223,393,104,116,768,257,869,226,375,9,358,752,413,166,481,486,833,758,491,506,996,891,116,568,628,675,183,893,177,296,294,400,688,398,516,456,654,384,681,29,392,39,780,805,204,261,290,37,19,781,542,14,672,658,581,299,332,763,192,508,59,485,908,747,883,423,202}'  
  
postgres=# select id, smlar(tags, :v1), tags % :v1 from users where tags % :v1 limit 10;  
 id | smlar | ?column?   
----+-------+----------  
  1 |     1 | t  
(1 row)  
  
Time: 454.579 ms  
  
-- 修改相似度阈值, 超过20% 相似的就返回  
postgres=# select set_smlar_limit(0.2);  
 set_smlar_limit   
-----------------  
             0.2  
(1 row)  
  
postgres=# select id, smlar(tags, :v1), tags % :v1 from users where tags % :v1 limit 10;  
  id   |  smlar   | ?column?   
-------+----------+----------  
     1 |        1 | t  
 13805 | 0.226335 | t  
 21515 | 0.209751 | t  
 32683 | 0.221305 | t  
 74386 | 0.208514 | t  
 81936 | 0.211684 | t  
 95575 | 0.200222 | t  
(7 rows)  
  
Time: 480.429 ms

创建标签字段索引

create index on users using gin (tags _int4_sml_ops);

使用索引加速根据某个标签圈选10个目标用户

postgres=# select set_smlar_limit(0.6);  
 set_smlar_limit   
-----------------  
             0.6  
(1 row)  
  
postgres=# select id, smlar(tags, :v1), tags % :v1 from users where tags % :v1 limit 10;  
 id | smlar | ?column?   
----+-------+----------  
  1 |     1 | t  
(1 row)
Time: 46.256 ms

4、地理信息搜索插件: PostGIS

开源项目地址: https://postgis.net/

PostGIS是地理信息、时空数据库, 包括空间类型、轨迹等类型, 基于这些类型的索引, 操作符等. 应用广泛, 例如地图、基于地理位置的社交、共享出行、外卖等应用.

用法举例.

创建插件

postgres=# create extension postgis;  
CREATE EXTENSION

创建POI表, 表示地图上的餐饮店位置

create table tbl_poi (  
  id int primary key,  
  info text,  
  pos geometry  
);

生成10万条记录, 经纬度圈定在一个范围内随机生成

insert into tbl_poi select generate_series(1,100000), md5(random()::text),   
  ST_SetSRID(ST_MakePoint(120+random(), 35+random()),4326);

输入一个经纬度, 由近到远排序输出10条记录

select id,info,pos <-> ST_SetSRID(ST_MakePoint(120, 35),4326) as distance   
  from tbl_poi order by pos <-> ST_SetSRID(ST_MakePoint(120, 35),4326) limit 10;  
  
  id   |               info               |      distance         
-------+----------------------------------+---------------------  
 87207 | 6c3be5ba80405c1bcbc41ddeabe4845e | 0.00380898314902217  
  2395 | bea776d9ea01fe71aa7217588ca0e340 | 0.00475504602701775  
 83860 | 3346ea9e673776bef89e8bfb90176e5f | 0.00476231353857793  
 20391 | 4b6188e5b45e96b299d2bbdf0dfccc88 |  0.0051285288529832  
 65722 | 75cb60337e72d23e27590ee36b6d8553 | 0.00679551058653407  
 70932 | e5203b78d0294740e036106f70b2e58b | 0.00869448513564811  
 42792 | 9aba360479ba28c8a98107af6f45cb62 | 0.00936264324089115  
 38434 | 3c2565401c1b4e1d010d9326c6913e07 | 0.00937400159588758  
 30599 | 2d5d0ce802bffb2a269312338532def2 | 0.00991881579948414  
 80021 | 018dbf83a82342e4fa02fa9b50301fca |  0.0110926771171926  
(10 rows)  
  
Time: 62.047 ms

创建索引

create index on tbl_poi using gist (pos);

使用索引加速, 输入一个经纬度, 由近到远排序输出10条记录

id   |               info               |      distance         
-------+----------------------------------+---------------------  
 87207 | 6c3be5ba80405c1bcbc41ddeabe4845e | 0.00380898314902217  
  2395 | bea776d9ea01fe71aa7217588ca0e340 | 0.00475504602701775  
 83860 | 3346ea9e673776bef89e8bfb90176e5f | 0.00476231353857793  
 20391 | 4b6188e5b45e96b299d2bbdf0dfccc88 |  0.0051285288529832  
 65722 | 75cb60337e72d23e27590ee36b6d8553 | 0.00679551058653407  
 70932 | e5203b78d0294740e036106f70b2e58b | 0.00869448513564811  
 42792 | 9aba360479ba28c8a98107af6f45cb62 | 0.00936264324089115  
 38434 | 3c2565401c1b4e1d010d9326c6913e07 | 0.00937400159588758  
 30599 | 2d5d0ce802bffb2a269312338532def2 | 0.00991881579948414  
 80021 | 018dbf83a82342e4fa02fa9b50301fca |  0.0110926771171926  
(10 rows)  
  
Time: 6.205 ms

5、中文分词插件: pg_jieba

开源项目地址: https://github.com/jaiminpan/pg_jieba

中文分词, 顾名思义是将中文文本切分为有意义的词语, 用户就可以通过词语矢量进行中文匹配检索.

用法举例.

编译pg_jieba

git clone --depth=1 https://github.com/jaiminpan/pg_jieba    
    
cd pg_jieba    
    
# initilized sub-project    
git submodule update --init --recursive    
    
mkdir build    
cd build    
    
cmake -DCMAKE_PREFIX_PATH=/home/postgres/tmp_basedir_polardb_pg_1100_bld ..    
    
make    
make install

安装pg_jieba插件

create extension pg_jieba ;

测试中文分词

select * from to_tsquery('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久，我就会升职加薪，当上CEO，走上人生巅峰。');    
select * from to_tsvector('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久，我就会升职加薪，当上CEO，走上人生巅峰。');    
select * from ts_token_type('jieba');    
select * from ts_debug('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久，我就会升职加薪，当上CEO，走上人生巅峰。');

postgres=# create extension pg_jieba ;    
CREATE EXTENSION    
postgres=# select * from to_tsquery('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久，我就会升职加薪，当上CEO，走上人生巅峰。');    
                                                            to_tsquery                                                                
----------------------------------------------------------------------------------------------------------------------------------    
 '拖拉机' & '学院' & '手扶拖拉机' & '专业' & '不用' & '多久' & '会' & '升职' & '加薪' & '当上' & 'ceo' & '走上' & '人生' & '巅峰'    
(1 row)    
    
postgres=# select * from to_tsvector('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久，我就会升职加薪，当上CEO，走上人生巅峰。');    
                                                                to_tsvector                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------    
 'ceo':18 '不用':8 '专业':5 '人生':21 '会':13 '加薪':15 '升职':14 '多久':9 '学院':3 '巅峰':22 '当上':17 '手扶拖拉机':4 '拖拉机':2 '走上':20    
(1 row)    
    
postgres=# select * from ts_token_type('jieba');    
 tokid | alias |         description             
-------+-------+-----------------------------    
     1 | eng   | letter    
     2 | nz    | other proper noun    
     3 | n     | noun    
     4 | m     | numeral    
     5 | i     | idiom    
     6 | l     | temporary idiom    
     7 | d     | adverb    
     8 | s     | space    
     9 | t     | time    
    10 | mq    | numeral-classifier compound    
    11 | nr    | person's name    
    12 | j     | abbreviate    
    13 | a     | adjective    
    14 | r     | pronoun    
    15 | b     | difference    
    16 | f     | direction noun    
    17 | nrt   | nrt    
    18 | v     | verb    
    19 | z     | z    
    20 | ns    | location    
    21 | q     | quantity    
    22 | vn    | vn    
    23 | c     | conjunction    
    24 | nt    | organization    
    25 | u     | auxiliary    
    26 | o     | onomatopoeia    
    27 | zg    | zg    
    28 | nrfg  | nrfg    
    29 | df    | df    
    30 | p     | prepositional    
    31 | g     | morpheme    
    32 | y     | modal verbs    
    33 | ad    | ad    
    34 | vg    | vg    
    35 | ng    | ng    
    36 | x     | unknown    
    37 | ul    | ul    
    38 | k     | k    
    39 | ag    | ag    
    40 | dg    | dg    
    41 | rr    | rr    
    42 | rg    | rg    
    43 | an    | an    
    44 | vq    | vq    
    45 | e     | exclamation    
    46 | uv    | uv    
    47 | tg    | tg    
    48 | mg    | mg    
    49 | ud    | ud    
    50 | vi    | vi    
    51 | vd    | vd    
    52 | uj    | uj    
    53 | uz    | uz    
    54 | h     | h    
    55 | ug    | ug    
    56 | rz    | rz    
(56 rows)    
    
postgres=#  select * from ts_debug('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久，我就会升职加薪，当上CEO，走上人生巅峰。');    
 alias |  description  |   token    | dictionaries | dictionary |   lexemes        
-------+---------------+------------+--------------+------------+--------------    
 v     | verb          | 是         | {jieba_stem} | jieba_stem | {}    
 n     | noun          | 拖拉机     | {jieba_stem} | jieba_stem | {拖拉机}    
 n     | noun          | 学院       | {jieba_stem} | jieba_stem | {学院}    
 n     | noun          | 手扶拖拉机 | {jieba_stem} | jieba_stem | {手扶拖拉机}    
 n     | noun          | 专业       | {jieba_stem} | jieba_stem | {专业}    
 uj    | uj            | 的         | {jieba_stem} | jieba_stem | {}    
 x     | unknown       | 。         | {jieba_stem} | jieba_stem | {}    
 v     | verb          | 不用       | {jieba_stem} | jieba_stem | {不用}    
 m     | numeral       | 多久       | {jieba_stem} | jieba_stem | {多久}    
 x     | unknown       | ，         | {jieba_stem} | jieba_stem | {}    
 r     | pronoun       | 我         | {jieba_stem} | jieba_stem | {}    
 d     | adverb        | 就         | {jieba_stem} | jieba_stem | {}    
 v     | verb          | 会         | {jieba_stem} | jieba_stem | {会}    
 v     | verb          | 升职       | {jieba_stem} | jieba_stem | {升职}    
 nr    | person's name | 加薪       | {jieba_stem} | jieba_stem | {加薪}    
 x     | unknown       | ，         | {jieba_stem} | jieba_stem | {}    
 t     | time          | 当上       | {jieba_stem} | jieba_stem | {当上}    
 eng   | letter        | CEO        | {jieba_stem} | jieba_stem | {ceo}    
 x     | unknown       | ，         | {jieba_stem} | jieba_stem | {}    
 v     | verb          | 走上       | {jieba_stem} | jieba_stem | {走上}    
 n     | noun          | 人生       | {jieba_stem} | jieba_stem | {人生}    
 n     | noun          | 巅峰       | {jieba_stem} | jieba_stem | {巅峰}    
 x     | unknown       | 。         | {jieba_stem} | jieba_stem | {}    
(23 rows)

生成10万条随机分词数据

create or replace function gen_rand_ts(tslen int) returns tsvector as $$        
  select array_to_tsvector(array_agg(substring(md5(random()::text),1,8))) from generate_series(1,$1);        
$$ language sql strict;      
    
create table test (id int, vec tsvector);    
    
insert into test select generate_series(1,100000), gen_rand_ts(24);

创建索引

create index on test using gin (vec);

查询测试

postgres=# select * from test limit 10;    
 id |                                                                                                                                   vec                                                                   
                                                                        
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
--------------------------------------------------------------------    
  1 | '0eed2ca6' '149ae774' '260d6cae' '2bde3230' '38ce089c' '3fdfb67c' '40bf233a' '41825567' '52de4ebb' '5708b49d' '63bdd9ea' '650f2dbf' '6d35d142' '7c711c0d' '7e4e028f' '913802bf' 'a8a14013' 'b6aa8ba4    
' 'd9f595e6' 'dc054607' 'dee1a2f7' 'e5b6d7e8' 'eacb6356' 'eee81aaf'    
  2 | '00bf8fbc' '117b7b5c' '1e4c8295' '2d379ff7' '2e263dcb' '48967fe5' '4f20db40' '5f7aefcd' '616cbb8e' '81d4e152' '876b2318' '8c18f4c3' '8e732b6f' '94f6b13b' '9c53cb8e' 'aedca11c' 'b56c7ed4' 'c5008853    
' 'cc407ea8' 'd4f3d5a1' 'd63ca731' 'd87514ec' 'f9626af4' 'fa5b7458'    
  3 | '0e3b6147' '13674c4d' '16463e9b' '32894aca' '3a15d964' '453c9a26' '54664d82' '5cb0e40d' '62c8ca30' '6d0ebc3a' '6ee0a517' '71ccfeb5' '7e75a9d5' '7f61f401' '87b5f2cb' '8f1c6274' '976dff7f' '9b7a6758    
' 'af9c624e' 'e5422d57' 'ed7bb9d4' 'edc039a2' 'efe1e5fa' 'f9db8132'    
  4 | '118bf21c' '2087d303' '2579c220' '3733357a' '503b50ec' '56104ea2' '573b9ea9' '58a665af' '59250bad' '86abf8a9' '8a3b5a72' '8d8bb478' 'a16b8bd8' 'ac966a06' 'af4eabd8' 'b09ccbb5' 'b2d7aac4' 'b5134f1b    
' 'b5228857' 'b6836add' 'bcafbce0' 'd1ca5a3a' 'e8588e37' 'f6ffe6b0'    
  5 | '01876ad5' '07a8a579' '0a33ce9e' '0b5bbdd4' '10b00efe' '118fae91' '1c12acee' '2d74f4eb' '2d99481c' '41483d1c' '6864b85e' '7ba1937f' '8a6ccb01' '9c1ae58b' 'a251fd3d' 'a936eecd' 'b560d231' 'baa6927f    
' 'd78f04c6' 'dabff656' 'e5d975c0' 'f0598071' 'f819b029' 'fb202c1a'    
  6 | '1c6eea85' '23f37dd9' '28151030' '319fa87f' '447ddc9d' '45dcc30a' '5269c7c2' '77184ff9' '792793c2' '81f63a78' '87b67199' '8ddc346f' '9dbc6f02' 'a4130ee7' 'a4b21300' 'a8ae9afe' 'ae54596a' 'b01e580a    
' 'c17caa99' 'c7784bd5' 'd27a19ce' 'df21c10f' 'e383a9d0' 'fde1f572'    
  7 | '1c6e6d6e' '209c45cf' '23415a93' '292ba393' '3d64d313' '49cf134a' '4a1a1f0d' '4c7e54a7' '4e74180a' '5054e77e' '5882f01f' '59c25e04' '69eb2f87' '6f2ed6bb' '7c830771' '81c415f5' '975f413a' 'a3dc8375    
' 'a5a38d13' 'b1f83c28' 'bb62f740' 'c8bab4d1' 'd947163c' 'f3a81f80'    
  8 | '0800c7b2' '0ffbe32e' '19f84945' '1c001bd3' '1f3f5826' '2e13cca1' '36ca5372' '3abc8149' '516878e9' '534357fc' '67cb7af9' '69a7849d' '8c134ad3' '8d87ed42' '96069ef5' '98bfcdbe' 'b4b0ffa1' 'bc61912a    
' 'ddf1d8e6' 'e07722ea' 'e68ffbbf' 'f0751b01' 'f12cb4b9' 'fe0a7c4c'    
  9 | '14588466' '1b16dfff' '25339aa7' '4874dc00' '4c6bb5bf' '510c8f7b' '59cbfb21' '70372c94' '7db5e3c2' '85f68385' '8b0e7746' '9596e2d0' '997ca4d3' '9f4df7dc' 'b1726109' 'c42ae6e4' 'dc759b2d' 'e378d2d5    
' 'e956bc2b' 'ea5c6ed2' 'f0e58f77' 'f24f74b1' 'fa6df884' 'fa8edffb'    
 10 | '0188c7ac' '09a75236' '15fb2eee' '1dc80e6e' '2f543594' '3559f46a' '4369adcd' '477410ed' '5df678d0' '799bc453' '80ad7901' '81871ec1' '92faa899' '94c9cf0f' '971d699f' 'a002241a' 'a1636465' 'aee34bbb    
' 'b08f2f0c' 'b697c161' 'b8f290b9' 'd0acf8b4' 'd3beb05b' 'f8ca2a66'    
(10 rows)    
    
postgres=# explain select * from test where vec @@ '52de4ebb & 41825567'::tsquery ;    
                                 QUERY PLAN                                     
----------------------------------------------------------------------------    
 Bitmap Heap Scan on test  (cost=36.02..43.91 rows=2 width=300)    
   Recheck Cond: (vec @@ '''52de4ebb'' & ''41825567'''::tsquery)    
   ->  Bitmap Index Scan on test_vec_idx  (cost=0.00..36.02 rows=2 width=0)    
         Index Cond: (vec @@ '''52de4ebb'' & ''41825567'''::tsquery)    
(4 rows)    
    
postgres=# select * from test where vec @@ '52de4ebb & 41825567'::tsquery ;    
 id |                                                                                                                                   vec                                                                   
                                                                        
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
--------------------------------------------------------------------    
  1 | '0eed2ca6' '149ae774' '260d6cae' '2bde3230' '38ce089c' '3fdfb67c' '40bf233a' '41825567' '52de4ebb' '5708b49d' '63bdd9ea' '650f2dbf' '6d35d142' '7c711c0d' '7e4e028f' '913802bf' 'a8a14013' 'b6aa8ba4    
' 'd9f595e6' 'dc054607' 'dee1a2f7' 'e5b6d7e8' 'eacb6356' 'eee81aaf'    
(1 row)    
    
Time: 0.768 ms

不使用索引测试

postgres=# set enable_bitmapscan =off;    
SET    
Time: 0.887 ms    
postgres=# explain select * from test where vec @@ '52de4ebb & 41825567'::tsquery ;    
                        QUERY PLAN                             
-----------------------------------------------------------    
 Seq Scan on test  (cost=0.00..5417.00 rows=2 width=300)    
   Filter: (vec @@ '''52de4ebb'' & ''41825567'''::tsquery)    
(2 rows)    
    
Time: 1.136 ms    
    
    
postgres=# select * from test where vec @@ '52de4ebb & 41825567'::tsquery ;    
 id |                                                                                                                                   vec                                                                   
                                                                        
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
--------------------------------------------------------------------    
  1 | '0eed2ca6' '149ae774' '260d6cae' '2bde3230' '38ce089c' '3fdfb67c' '40bf233a' '41825567' '52de4ebb' '5708b49d' '63bdd9ea' '650f2dbf' '6d35d142' '7c711c0d' '7e4e028f' '913802bf' 'a8a14013' 'b6aa8ba4    
' 'd9f595e6' 'dc054607' 'dee1a2f7' 'e5b6d7e8' 'eacb6356' 'eee81aaf'    
(1 row)    
    
Time: 51.815 ms

10万条文本向量, 搜索命中1条. 性能参考:

GIN索引 0.768 ms VS 全表扫描 49.055 ms

自定义词典? 参考 https://github.com/jaiminpan/pg_jieba

## USER DEFINED DICTIONARY
#### Dictionary Format
+ Words  weight  type
+ Words  type
+ Words
  云计算
  韩玉鉴赏
  蓝翔 nz
  区块链 10 nz
Reference jieba_user.dict
#### How to use your own dictionary
  cd /PATH/TO/POSTGRESQL_INSTALL/share/postgresql/tsearch_data
  OR
  cd /PATH/TO/POSTGRESQL_INSTALL/share/tsearch_data
  cp 'YOUR DICTIONARY' jieba_user.dict
#### Dictionary Sharing
+ [dict.367W.utf8] iLife

6、融合计算插件: duckdb_fdw

开源项目地址: https://github.com/alitrack/duckdb_fdw

duckdb是一款性能非常好的in-process OLAP数据库, 同时支持数据湖架构, 支持访问s3/oss/http等远端数据, 支持列存储/parquet等常见的存储结构. 通过duckdb_fdw, PolarDB也可以具备以上能力.

用法举例.

下载duckdb cli 根据你的环境选一种下载 (aarch64架构)

wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-aarch64.zip  
unzip duckdb_cli-linux-aarch64.zip

下载duckdb cli 根据你的环境选一种下载 (x86_64架构)

wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-amd64.zip  
unzip duckdb_cli-linux-amd64.zip

生成parquet数据文件

./duckdb /var/polardb/db    
  
COPY (select generate_series as id, md5(random()::text) as info,   
now()::timestamp+(generate_series||' second')::interval as crt_time   
from generate_series(1,100)) TO '/var/polardb/t1.parquet' (FORMAT 'PARQUET');    
    
COPY (select generate_series as cid, md5(random()::text) as info,   
now()::timestamp+(generate_series||' second')::interval as crt_time   
from generate_series(1,100)) TO '/var/polardb/t2.parquet' (FORMAT 'PARQUET');    
    
COPY (select (floor(random()*100)+1)::int as gid, (floor(random()*100)+1)::int as cid,   
(random()*10)::int as c1, (random()*100)::int as c2, (random()*1000)::int as c3,   
(random()*10000)::int as c4, (random()*100000)::int as c5 from generate_series(1,1000000))   
TO '/var/polardb/t3.parquet' (FORMAT 'PARQUET');    
    
create view t1 as select * from read_parquet('/var/polardb/t1.parquet');    
create view t2 as select * from read_parquet('/var/polardb/t2.parquet');    
create view t3 as select * from read_parquet('/var/polardb/t3.parquet');    
    
checkpoint;    
  
D DESCRIBE t1 ;  
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐  
│ column_name │ column_type │  null   │   key   │ default │ extra │  
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ int32 │  
├─────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤  
│ id          │ BIGINT      │ YES     │         │         │       │  
│ info        │ VARCHAR     │ YES     │         │         │       │  
│ crt_time    │ TIMESTAMP   │ YES     │         │         │       │  
└─────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘  
D DESCRIBE t2 ;  
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐  
│ column_name │ column_type │  null   │   key   │ default │ extra │  
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ int32 │  
├─────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤  
│ cid         │ BIGINT      │ YES     │         │         │       │  
│ info        │ VARCHAR     │ YES     │         │         │       │  
│ crt_time    │ TIMESTAMP   │ YES     │         │         │       │  
└─────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘  
D DESCRIBE t3 ;  
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐  
│ column_name │ column_type │  null   │   key   │ default │ extra │  
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ int32 │  
├─────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤  
│ gid         │ INTEGER     │ YES     │         │         │       │  
│ cid         │ INTEGER     │ YES     │         │         │       │  
│ c1          │ INTEGER     │ YES     │         │         │       │  
│ c2          │ INTEGER     │ YES     │         │         │       │  
│ c3          │ INTEGER     │ YES     │         │         │       │  
│ c4          │ INTEGER     │ YES     │         │         │       │  
│ c5          │ INTEGER     │ YES     │         │         │       │  
└─────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘  
  
.quit

编译duckdb_fdw v1.1.3插件

# 注释的不需要执行. 如果是以前的duckdb_fdw版本 可能需要按注释的命令来操作!!!!
cd /var/polardb
# git clone --depth 1 -b duckdb1.1.0 https://github.com/digoal/duckdb_fdw
# cd duckdb_fdw 
wget https://github.com/alitrack/duckdb_fdw/archive/refs/tags/v1.1.3.tar.gz
tar -zxvf v1.1.3.tar.gz
cd duckdb_fdw-1.1.3 
mkdir libduckdb     
cd libduckdb  
wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/libduckdb-src.zip  
sudo apt-get install -y unzip  
unzip libduckdb-src.zip    
unalias cp    
cp -f duckdb.h ../    
cp -f duckdb.hpp ../ 
# 需要给容器多分配一点内存/swap, 否则容器OOM
# 注意 clang++ -c 编译项 需要与 duckdb_fdw src 的 Makefile PG_CXXFLAGS 一致.
#   区分linux / macos , 区分 arm(aarch64) / x86架构, 请仔细判断. 
# macos 编译参数 -std=c++11 
# aarch64 编译参数 -std=c++11 
# x86 编译参数 -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 
clang++ -c -fPIC -std=c++11 duckdb.cpp -o duckdb.o     
clang++ -shared -o libduckdb.so *.o     
# cp -f libduckdb.so $(pg_config --libdir)     
# cp -f libduckdb.so ../
cp -f libduckdb.so ../libduckdb.1.1.3.so  
cd ../     
# USE_PGXS=1 make uninstall    
# USE_PGXS=1 make clean    
# USE_PGXS=1 make distclean       
# USE_PGXS=1 make install
DUCKDB_VERSION=1.1.3 USE_PGXS=1 make uninstall    
DUCKDB_VERSION=1.1.3 USE_PGXS=1 make clean    
DUCKDB_VERSION=1.1.3 USE_PGXS=1 make distclean    
DUCKDB_VERSION=1.1.3 USE_PGXS=1 make install

加载duckdb_fdw插件

create extension duckdb_fdw version "1.1.3";

创建foreign server

CREATE SERVER DuckDB_server FOREIGN DATA WRAPPER duckdb_fdw OPTIONS   
  (database '/var/polardb/db');

导入foreign table

-- 一次性导入所有duckdb的表到foreign table
IMPORT FOREIGN SCHEMA public FROM SERVER DuckDB_server INTO public;
-- 或 挨个创建foreign table
create foreign table t1 (id int8, info text, crt_time timestamp) 
  server duckdb_server OPTIONS (table 't1');
create foreign table t2 (cid int8, info text, crt_time timestamp) 
  server duckdb_server OPTIONS (table 't2');
create foreign table t3 (gid int, cid int, c1 int, c2 int, c3 int, c4 int, c5 int) 
  server duckdb_server OPTIONS (table 't3');  
    
postgres=# \det    
     List of foreign tables    
 Schema | Table |    Server    
--------+-------+---------------    
 public | t1    | duckdb_server    
 public | t2    | duckdb_server    
 public | t3    | duckdb_server    
(3 rows)

为duckdb_fdw安装parquet插件

postgres=# SELECT duckdb_execute('duckdb_server', 'install parquet');
 duckdb_execute 
----------------
 
(1 row)
postgres=# SELECT duckdb_execute('duckdb_server', 'load parquet');
 duckdb_execute 
----------------
 
(1 row)

查询parquet数据性能

postgres=# explain verbose select count(distinct gid) from t3;    
                         QUERY PLAN    
-------------------------------------------------------------    
 Foreign Scan  (cost=1.00..1.00 rows=1 width=8)    
   Output: (count(DISTINCT gid))    
   SQLite query: SELECT count(DISTINCT "gid") FROM main."t3"    
(3 rows)    
    
postgres=# \timing    
Timing is on.    
    
select count(distinct gid) from t3;    
 count    
-------    
   100    
(1 row)    
Time: 13.556 ms    
    
select count(distinct gid),count(*) from t3;    
 count |  count    
-------+---------    
   100 | 1000000    
(1 row)    
Time: 21.836 ms    
    
explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id;    
                                                            QUERY PLAN    
-----------------------------------------------------------------------------------------------------------------------------------    
 Foreign Scan  (cost=1.00..1.00 rows=1 width=16)    
   Output: (count(DISTINCT t3.gid)), (count(*))    
   SQLite query: SELECT count(DISTINCT r1."gid"), count(*) FROM (main."t3" r1 INNER JOIN main."t1" r2 ON (((r1."gid" = r2."id"))))    
(3 rows)    
    
select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id;    
 count |  count    
-------+---------    
   100 | 1000000    
(1 row)    
Time: 15.536 ms    
    
select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id join t2 on t3.cid=t2.cid;    
 count |  count    
-------+---------    
   100 | 1000000    
(1 row)    
Time: 18.570 ms    
    
explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id join t2 on t3.cid=t2.cid;    
 Foreign Scan  (cost=1.00..1.00 rows=1 width=16)    
   Output: (count(DISTINCT t3.gid)), (count(*))    
   SQLite query: SELECT count(DISTINCT r1."gid"), count(*) FROM ((main."t3" r1 INNER JOIN main."t1" r2 ON (((r1."gid" = r2."id")))) INNER JOIN main."t2" r4 ON (((r1."cid" = r4."cid"))))    
(3 rows)

将数据导入PolarDB本地, 做同样的查询看一下时间.

postgres=# create table lt1 as select * from t1;    
SELECT 100    
postgres=# create table lt2 as select * from t2;    
SELECT 100    
postgres=# create table lt3 as select * from t3;    
SELECT 10000000    
    
postgres=# \timing    
Timing is on.    
postgres=# select count(distinct gid) from lt3;    
 count    
-------    
   100    
(1 row)    
Time: 1142.800 ms (00:01.143)

duckdb_fdw+Parquet 比PolarDB本地行存储表快了近100倍.

更多用法请参考

如果遇到问题可以提issue给duckdb_fdw或PolarDB开源项目

7、读写分离工具: pgpool-II

开源项目地址: https://www.pgpool.net/

pgpool-II 是一款开源的支持读写分离的连接池, 通过pgpool-II, 应用程序访问PolarDB 时, 可以根据SQL的特征自动路由到RW, RO节点, 实现对应用透明的读写分离功能.

用法举例.

部署pgpool-II 软件

cd /var/polardb    
wget https://www.pgpool.net/mediawiki/download.php?f=pgpool-II-4.5.5.tar.gz -O pgpool-II-4.5.5.tar.gz    
    
tar -zxvf pgpool-II-4.5.5.tar.gz    
    
cd pgpool-II-4.5.5    
    
./configure --prefix=/usr/local/pgpool4.5.5 --with-openssl    
    
make -j 4   
sudo make install

配置动态库和默认路径

sudo vi /etc/ld.so.conf    
# addd    
/usr/local/pgpool4.5.5/lib    
    
# 更新动态链接库  
sudo ldconfig    
   
# 设置shell环境变量初始化文件  
echo "export PATH=/usr/local/pgpool4.5.5/bin:\$PATH" >>  ~/.bashrc   
    
# 设置当前环境变量  
. ~/.bashrc

配置pgpool-II

polardb 3节点配置如下, 注意rw节点和2个ro节点的port, 这里5433是rw:

[postgres@1373488a35ab ~]$ psql -p 5432 -c "select pg_is_in_recovery();"    
 pg_is_in_recovery     
-------------------    
 t    
(1 row)    
    
[postgres@1373488a35ab ~]$ psql -p 5433 -c "select pg_is_in_recovery();"    
 pg_is_in_recovery     
-------------------    
 f    
(1 row)    
    
[postgres@1373488a35ab ~]$ psql -p 5434 -c "select pg_is_in_recovery();"    
 pg_is_in_recovery     
-------------------    
 t    
(1 row)

polardb与aurora类似, 共享存储集群模式, 无需pgpool来管理HA.

https://www.pgpool.net/docs/latest/en/html/example-aurora.html

配置pgpool.conf

cd /usr/local/pgpool4.5.5/etc    
    
sudo vi pgpool.conf
# 注意rw节点和ro节点的port和前面查到的要对应上, 不要配错了. 
    
listen_addresses = '0.0.0.0'    
port = 9999    
unix_socket_directories = '/tmp'    
pcp_listen_addresses = 'localhost'    
pcp_port = 9898    
pcp_socket_dir = '/tmp'    
log_destination = 'stderr'    
logging_collector = on    
log_directory = '/tmp/pgpool_logs'    
pid_file_name = '/var/run/pgpool/pgpool.pid'    
logdir = '/tmp'    
    
backend_clustering_mode = 'streaming_replication'    
load_balance_mode = on    
sr_check_period = 0    
health_check_period = 0    
failover_on_backend_shutdown=off    
failover_on_backend_error=off    
enable_pool_hba = on    
    
backend_hostname0 = '127.0.0.1'    
backend_port0 = '5433'    
backend_weight0 = 1    
backend_application_name0 = 'polardb_primray'    
backend_flag0 = 'ALWAYS_PRIMARY|DISALLOW_TO_FAILOVER'    
    
backend_hostname1 = '127.0.0.1'    
backend_port1 = '5432'    
backend_weight1 = 2    
backend_application_name1 = 'polardb_reader1'    
backend_flag1 = 'DISALLOW_TO_FAILOVER'    
    
backend_hostname2 = '127.0.0.1'    
backend_port2 = '5434'    
backend_weight2 = 2    
backend_application_name2 = 'polardb_reader2'    
backend_flag2 = 'DISALLOW_TO_FAILOVER'

认证方法password,md5,scram请参考: 《PolarDB 100 问 | 密码明明正确怎么报密码错误?》

配置pool_hba.conf

sudo vi pool_hba.conf    
# add    
host all all 0.0.0.0/0 md5

配置pgpool数据库用户密码文件pool_passwd

[postgres@1373488a35ab etc]$ sudo $(which pg_md5) --md5auth --username=digoal pwd123    
    
[postgres@1373488a35ab etc]$ cat /usr/local/pgpool4.5.5/etc/pool_passwd     
digoal:md531a770cec82aa37e217bb6e46c3f9d55    
    
    
    
-- 实际上就是pwd+username的md5值    
postgres=# select md5('pwd123digoal');    
               md5                    
----------------------------------    
 31a770cec82aa37e217bb6e46c3f9d55    
(1 row)

在数据库中创建相应pgpool-II的映射用户

psql -p 5433  
  
postgres=# create user digoal superuser encrypted password 'pwd123' login;    
CREATE ROLE

配置pcp管理用户密码文件pcp.conf

postgres=# select md5('pwd123');    
               md5                    
----------------------------------    
 45cb41b32dcfb917ccd8614f1536d6da    
(1 row)    
    
    
    
cd /usr/local/pgpool4.5.5/etc    
sudo vi pcp.conf    
  
pcpadm:45cb41b32dcfb917ccd8614f1536d6da

准备pgpool运行时pid文件目录和日志目录

sudo mkdir /var/run/pgpool    
sudo mkdir /tmp/pgpool_logs

启动pgpool

sudo $(which pgpool)

使用pcp管理命令查看pgpool中间件状态

pcp_node_info -U pcpadm -p 9898 -W   
Password:    输入 pwd123  
127.0.0.1 5433 1 0.200000 waiting unknown primary unknown 0 none none 2024-01-26 19:40:20  
127.0.0.1 5432 1 0.400000 waiting unknown standby unknown 0 none none 2024-01-26 19:40:20  
127.0.0.1 5434 1 0.400000 waiting unknown standby unknown 0 none none 2024-01-26 19:40:20

[postgres@1373488a35ab etc]$ pcp_node_count -U pcpadm -p 9898 -W   
Password:     输入 pwd123  
3

pcp_pool_status  -U pcpadm -h localhost -p 9898 -W   
Password:    输入 pwd123  
...    
name : backend_data_directory2  
value:   
desc : data directory for backend #2  
  
name : backend_status2  
value: waiting  
desc : status of backend #2  
  
name : standby_delay2  
value: 0  
desc : standby delay of backend #2  
  
name : backend_flag2  
value: DISALLOW_TO_FAILOVER  
desc : backend #2 flag  
  
name : backend_application_name2  
value: polardb_reader2  
desc : application_name for backend #2  
...

使用pgpool代理链接polardb

export PGPASSWORD=pwd123    
export PGDATABASE=postgres    
psql -p 9999 -U digoal -c "select * from pg_stat_activity where pid=pg_backend_pid();"

测试pgpool读写分离

pgbench -i -s 1 -h 127.0.0.1 -p 9999 -U digoal postgres

pgbench -n -r -P 1 -c 8 -j 8 -T 120 -S -h 127.0.0.1 -p 9999 -U digoal postgres

通过另一会话观察压测中的连接:

[postgres@1373488a35ab ~]$ psql -p 5432 -c "select count(*) from pg_stat_activity where application_name='pgbench';"    
 count     
-------    
     8    
(1 row)    
    
[postgres@1373488a35ab ~]$ psql -p 5433 -c "select count(*) from pg_stat_activity where application_name='pgbench';"    
 count     
-------    
     8    
(1 row)    
    
[postgres@1373488a35ab ~]$ psql -p 5434 -c "select count(*) from pg_stat_activity where application_name='pgbench';"    
 count     
-------    
     8    
(1 row)

更多信息请参考《PolarDB 开源版使用pgpool-II实现透明读写分离》

更详细的用法请参考各个插件开源官网.

PolarDB 开源基础教程系列 6 开源插件扩展

安装与使用PostgreSQL开源插件/工具

本节阅读导航

1、当前环境已安装并支持哪些插件

2、AI外脑插件: vector

3、营销场景目标人群圈选插件: smlar

4、地理信息搜索插件: PostGIS

5、中文分词插件: pg_jieba

6、融合计算插件: duckdb_fdw

7、读写分离工具: pgpool-II

PolarDB PostgreSQL 版

热门文章

最新文章

相关课程

相关电子书

相关实验场景