PolarDB 开源基础教程系列 6 开源插件扩展

本文涉及的产品
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
云原生数据库 PolarDB MySQL 版,通用型 2核8GB 50GB
简介: 1、当前环境已安装并支持哪些插件2、AI外脑插件: vector3、营销场景目标人群圈选插件: smlar4、地理信息搜索插件: PostGIS5、中文分词插件: pg_jieba6、融合计算插件: duckdb_fdw7、读写分离工具: pgpool-II

安装与使用PostgreSQL开源插件/工具

b站视频链接

Youtube视频链接

实验环境:

mkdir ~/pb11_1  
cd ~/pb11_1  
docker run -d -it -P --shm-size=1g \
--cap-add=SYS_PTRACE --cap-add SYS_ADMIN \
--privileged=true --name pb11_1 \
-v ${PWD}:/var/polardb \
registry.cn-hangzhou.aliyuncs.com/polardb_pg/polardb_pg_local_instance:11 bash

本节阅读导航

1、当前环境已安装并支持哪些插件

2、AI外脑插件: vector

3、营销场景目标人群圈选插件: smlar

4、地理信息搜索插件: PostGIS

5、中文分词插件: pg_jieba

6、融合计算插件: duckdb_fdw

7、读写分离工具: pgpool-II

1、当前环境已安装并支持哪些插件

通过pg_available_extensions视图可以查看当前数据库支持哪些插件, 安装了哪些插件?

postgres=# select * from pg_available_extensions;  
             name             | default_version | installed_version |                                                       comment                                                         
------------------------------+-----------------+-------------------+---------------------------------------------------------------------------------------------------------------------  
 postgres_fdw                 | 1.0             |                   | foreign-data wrapper for remote PostgreSQL servers  
 postgis                      | 3.3.2           |                   | PostGIS geometry and geography spatial types and functions  
 polar_monitor_preload        | 1.1             |                   | examine the polardb information  
 jsonb_plpython3u             | 1.0             |                   | transform between jsonb and plpython3u  
 ltree                        | 1.1             |                   | data type for hierarchical tree-like structures  
 tablefunc                    | 1.0             |                   | functions that manipulate whole tables, including crosstab  
 polar_csn                    | 1.0             |                   | polar_csn  
 plperl                       | 1.0             |                   | PL/Perl procedural language  
 polar_worker                 | 1.0             |                   | polar_worker  
 btree_gin                    | 1.3             |                   | support for indexing common datatypes in GIN  
 polar_stat_sql               | 1.3             |                   | Kernel statistics gathering, and sql plan nodes information gathering  
 address_standardizer         | 3.3.2           |                   | Used to parse an address into constituent elements. Generally used to support geocoding address normalization step.  
 btree_gist                   | 1.5             |                   | support for indexing common datatypes in GiST  
 earthdistance                | 1.1             |                   | calculate great-circle distances on the surface of the Earth  
 pltclu                       | 1.0             |                   | PL/TclU untrusted procedural language  
 amcheck                      | 1.1             |                   | functions for verifying relation integrity  
 ltree_plpython3u             | 1.0             |                   | transform between ltree and plpython3u  
 isn                          | 1.2             |                   | data types for international product numbering standards  
 tcn                          | 1.0             |                   | Triggered change notifications  
 pg_visibility                | 1.2             |                   | examine the visibility map (VM) and page-level visibility info  
 address_standardizer_data_us | 3.3.2           |                   | Address Standardizer US dataset example  
 sslinfo                      | 1.2             |                   | information about SSL certificates  
 polar_parameter_check        | 1.0             |                   | kernel extension for parameter validation  
 intagg                       | 1.1             |                   | integer aggregator and enumerator (obsolete)  
 insert_username              | 1.0             |                   | functions for tracking who changed a table  
 moddatetime                  | 1.0             |                   | functions for tracking last modification time  
 pageinspect                  | 1.7             |                   | inspect the contents of database pages at a low level  
 pg_trgm                      | 1.4             |                   | text similarity measurement and index searching based on trigrams  
 pg_prewarm                   | 1.2             |                   | prewarm relation data  
 ltree_plpython2u             | 1.0             |                   | transform between ltree and plpython2u  
 refint                       | 1.0             |                   | functions for implementing referential integrity (obsolete)  
 hstore_plpython3u            | 1.0             |                   | transform between hstore and plpython3u  
 postgis_tiger_geocoder       | 3.3.2           |                   | PostGIS tiger geocoder and reverse geocoder  
 adminpack                    | 2.0             |                   | administrative functions for PostgreSQL  
 pg_freespacemap              | 1.2             |                   | examine the free space map (FSM)  
 fuzzystrmatch                | 1.1             |                   | determine similarities and distance between strings  
 polar_tde_utils              | 1.0             |                   | Internal extension for TDE  
 lo                           | 1.1             |                   | Large Object maintenance  
 citext                       | 1.5             |                   | data type for case-insensitive character strings  
 bloom                        | 1.0             |                   | bloom access method - signature file based index  
 polar_vfs                    | 1.0             |                   | polar_vfs  
 pgrowlocks                   | 1.2             |                   | show row-level locking information  
 polar_monitor                | 1.2             |                   | examine the polardb information  
 timescaledb                  | 2.3.1           |                   | Enables scalable inserts and complex queries for time-series data  
 pltcl                        | 1.0             |                   | PL/Tcl procedural language  
 hstore_plperlu               | 1.0             |                   | transform between hstore and plperlu  
 hstore                       | 1.5             |                   | data type for storing sets of (key, value) pairs  
 timetravel                   | 1.0             |                   | functions for implementing time travel  
 plperlu                      | 1.0             |                   | PL/PerlU untrusted procedural language  
 smlar                        | 1.0             |                   | compute similary of any one-dimensional arrays  
 hstore_plperl                | 1.0             |                   | transform between hstore and plperl  
 jsonb_plpythonu              | 1.0             |                   | transform between jsonb and plpythonu  
 dict_int                     | 1.0             |                   | text search dictionary template for integers  
 jsonb_plperlu                | 1.0             |                   | transform between jsonb and plperlu  
 uuid-ossp                    | 1.1             |                   | generate universally unique identifiers (UUIDs)  
 autoinc                      | 1.0             |                   | functions for autoincrementing fields  
 postgis_topology             | 3.3.2           |                   | PostGIS topology spatial types and functions  
 dict_xsyn                    | 1.0             |                   | text search dictionary template for extended synonym processing  
 pgstattuple                  | 1.5             |                   | show tuple-level statistics  
 tsm_system_rows              | 1.0             |                   | TABLESAMPLE method which accepts number of rows as a limit  
 file_fdw                     | 1.0             |                   | foreign-data wrapper for flat file access  
 jsonb_plperl                 | 1.0             |                   | transform between jsonb and plperl  
 vector                       | 0.5.1           |                   | vector data type and ivfflat and hnsw access methods  
 dblink                       | 1.2             |                   | connect to other PostgreSQL databases from within a database  
 seg                          | 1.3             |                   | data type for representing line segments or floating-point intervals  
 pgcrypto                     | 1.3             |                   | cryptographic functions  
 plpython3u                   | 1.0             |                   | PL/Python3U untrusted procedural language  
 jsonb_plpython2u             | 1.0             |                   | transform between jsonb and plpython2u  
 plpgsql                      | 1.0             | 1.0               | PL/pgSQL procedural language  
 hstore_plpythonu             | 1.0             |                   | transform between hstore and plpythonu  
 ltree_plpythonu              | 1.0             |                   | transform between ltree and plpythonu  
 unaccent                     | 1.1             |                   | text search dictionary that removes accents  
 cube                         | 1.4             |                   | data type for multidimensional cubes  
 polar_px                     | 1.0             |                   | Parallel Execution extension  
 tsm_system_time              | 1.0             |                   | TABLESAMPLE method which accepts time in milliseconds as a limit  
 intarray                     | 1.2             |                   | functions, operators, and index support for 1-D arrays of integers  
 postgis_raster               | 3.3.2           |                   | PostGIS raster types and functions  
 hstore_plpython2u            | 1.0             |                   | transform between hstore and plpython2u  
 pg_buffercache               | 1.3             |                   | examine the shared buffer cache  
 pg_stat_statements           | 1.6             |                   | track execution statistics of all SQL statements executed  
 xml2                         | 1.1             |                   | XPath querying and XSLT  
(81 rows)

2、AI外脑插件: vector

开源项目地址: https://github.com/pgvector/pgvector

向量索引的原理、召回与参数的关系、向量索引优化详情, 请参考如下:

vector插件顾名思义是一种支持向量数据类型的插件, 每一个向量由一组浮点数组成, 代表一个特征值, 通常用来进行基于特征相似性的检索, 例如相似图像, 相似文本, 相似音频, 相似视频等. 和大模型结合起来, 可以用于存储知识库/标准答案/优质答案的向量, 作为大模型的外脑, 在对话过程中, 从向量中提取与问题相关的文本, 发给大模型可以让大模型具备更好的解题能力. 例如

  • 在私有知识库领域, 由于大模型未训练过私有知识素材, 所以大模型无法处理私有领域的问题, 有了向量数据库, 可以在问答过程中给大模型投喂相关知识, 让大模型可以具备回复未知领域知识的能力.
  • 也可用来解决大模型幻觉问题, 通过提取问题的相似文本, 让大模型在限定的相似文本内容范围内进行回复, 从而解决大模型幻觉问题. 例如将史书的文本切分后录入向量数据库, 在回答问题时, 从向量数据库提取与问题相近的史书文本, 限定大模型在这些文本范围内回答问题, 从而得到与史书匹配的答案.

vector除了提供向量类型, 还支持了配套的索引, 操作符等来加速向量相似的排序操作.

用法举例.

创建插件

postgres=# create extension vector ;   
CREATE EXTENSION

创建向量表

-- 在实际使用过程中, 从非结构化数据转换为向量, 可以通过云厂商API, 或者使用开源工具  
create table tbl_vec (  
  id int primary key,  
  info text,  
  vec vector(512)  
);

生成512维度模拟向量10万条

-- 生成随机向量的函数  
create or replace function gen_rand_vec(int) returns vector as $$  
  select array_to_vector(array_agg((random()*100)::real), $1, true) from generate_series(1,$1);  
$$ language sql strict;   
  
insert into tbl_vec select generate_series(1,100000), md5(random()::text), gen_rand_vec(512);

实际使用中需要使用embedding模型来生成向量, 例如 https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings

curl http://localhost:11434/api/embed -d '{
  "model": "mxbai-embed-large:latest",
  "input": "Why is the sky blue?"
}'
# 表结构建议
create table t_vec (
  id int,
  model_name text,  -- embedding 模型名:size
  content text,    -- 原始内容(例如文本、视频、音频等)
  content_vec vector(N),  -- 经过embedding模型, 将原始内容转换为对应向量. 
  other ...
);
# 建议一张表内的数据采用同一个embedding模型进行转换, 保证所有原始数据都被映射到相同的向量空间中.

算子

postgres=# \do+  
                                                List of operators  
 Schema | Name | Left arg type | Right arg type |   Result type    |           Function            | Description   
--------+------+---------------+----------------+------------------+-------------------------------+-------------  
 public | *    | vector        | vector         | vector           | vector_mul                    |   
 public | +    | vector        | vector         | vector           | vector_add                    |   
 public | -    | vector        | vector         | vector           | vector_sub                    |   
 public | <    | vector        | vector         | boolean          | vector_lt                     |   
 public | <#>  | vector        | vector         | double precision | vector_negative_inner_product |   
 public | <->  | vector        | vector         | double precision | l2_distance                   |   
 public | <=   | vector        | vector         | boolean          | vector_le                     |   
 public | <=>  | vector        | vector         | double precision | cosine_distance               |   
 public | <>   | vector        | vector         | boolean          | vector_ne                     |   
 public | =    | vector        | vector         | boolean          | vector_eq                     |   
 public | >    | vector        | vector         | boolean          | vector_gt                     |   
 public | >=   | vector        | vector         | boolean          | vector_ge                     |   
(12 rows)

搜索某个向量的cosine_distance相似向量

postgres=# select ''''||gen_rand_vec(512)::text||'''' as v1 \gset  
  
postgres=# \echo :v1  
'[42.3123,49.094,36.027,97.4979,51.5076,51.8525,89.3701,98.1134,41.6966,88.2519,98.8663,38.4313,68.9565,82.9641,76.3079,32.8708,93.7287,13.7805,59.3289,89.8906,54.0369,39.8219,16.9884,94.0551,9.97198,95.7318,10.5824,25.3164,90.8695,49.5991,55.4999,33.1818,98.6931,91.5269,30.6798,50.2007,43.3794,20.0499,48.3141,85.0761,8.30179,47.1804,23.5074,77.2583,30.1445,99.8153,10.1291,23.8732,13.5958,69.458,13.7638,67.6327,9.27985,30.7522,61.6878,19.2518,26.4839,72.2703,44.5683,17.3534,21.8694,0.0681802,50.5353,20.5625,91.5951,81.215,70.7632,34.9745,1.26487,19.0773,20.0506,9.56666,66.2576,43.558,86.8249,96.4021,43.3733,96.954,20.2753,56.9691,66.412,34.0391,24.6018,75.6919,64.7913,86.2896,94.9437,91.2752,58.5599,39.512,8.62865,80.4293,39.5801,59.1639,0.991781,31.1752,40.3789,71.755,66.1498,41.6438,90.8322,86.2004,51.2104,57.0899,29.7584,38.0354,53.492,73.1317,34.9894,73.7673,30.1009,1.40146,7.80648,54.7027,77.0933,72.5978,40.9923,72.037,63.873,99.5522,11.549,72.5016,79.9815,51.1291,31.6656,80.9733,82.3044,72.0445,52.7283,48.4542,13.6883,43.5605,34.6546,64.8987,0.650352,64.413,2.9341,54.1424,37.5447,37.9235,27.9097,67.6456,39.325,35.7162,22.3482,16.4183,8.31396,63.3406,88.4553,72.187,62.8928,0.00434043,44.6886,42.8743,51.1335,76.3542,23.8476,33.4378,48.3986,76.5758,81.892,62.0869,20.1363,16.5466,26.9856,20.7867,80.9596,29.9197,74.929,18.5043,67.8432,2.83872,86.1499,7.16822,38.5549,8.49812,23.5865,46.8689,71.8387,12.0419,19.0558,34.7314,12.0462,63.7444,77.6057,63.1797,40.0986,1.45328,96.6175,88.4972,78.0291,78.5096,50.5841,98.1654,95.0561,77.5697,18.9521,76.0157,7.48943,93.8811,94.52,75.3327,96.7198,80.6698,82.5009,35.2747,89.168,6.08741,82.1436,61.0066,18.1293,1.19938,95.7381,30.1755,64.9438,73.3438,93.3552,5.04243,74.7971,89.9728,93.5397,52.8262,68.4823,44.1238,50.9916,63.5384,21.6935,69.9436,39.5541,29.183,63.8247,34.0741,4.51563,60.5445,14.7439,87.0165,95.8192,3.91189,93.1039,77.9628,64.9185,11.2332,79.1622,60.6566,41.4087,44.106,34.0004,34.7639,49.1484,8.79743,24.7367,42.6881,61.6236,93.219,86.8119,12.6152,56.7574,8.50542,82.5588,96.3116,37.6884,46.3836,30.3857,42.204,6.92808,45.1296,29.2205,2.74729,49.0415,22.3244,80.7101,13.96,33.5576,59.8722,74.6166,74.9664,3.97819,8.61693,9.73029,53.1266,17.4144,34.467,95.8147,79.038,27.686,82.6265,91.6531,84.4434,91.132,74.212,80.7549,28.8203,20.5955,11.1406,71.0244,27.5236,56.2702,0.244898,30.2709,5.31167,22.5693,10.981,19.2717,56.127,70.8532,93.8882,31.0934,74.8314,2.50517,40.8236,27.958,19.9195,75.2906,23.7726,98.9575,2.97656,6.39915,90.6106,87.4199,97.5311,64.8226,68.1749,26.3514,85.4181,79.3155,97.3758,12.9417,35.5857,97.6207,43.2126,40.8973,20.19,54.1935,60.169,76.317,25.0467,54.0572,7.41039,99.8781,56.5624,48.234,27.8361,76.4819,23.5246,51.6087,75.4394,26.5012,58.0078,66.05,13.9211,55.5389,30.8726,82.096,81.8904,16.2907,61.4114,79.2662,29.2324,96.9971,76.8869,72.445,37.8944,97.0769,26.6385,98.0634,73.3939,51.6852,52.1206,80.8043,51.5633,8.68304,29.0384,79.3994,85.165,52.563,31.0081,60.6044,79.0642,89.0159,26.6544,92.9853,44.5548,57.527,75.0813,26.4451,73.8177,36.4927,5.7113,3.05006,33.4898,82.5982,75.495,71.3843,79.6751,2.13354,69.4477,53.069,53.8188,21.5683,33.8733,5.38213,30.2514,62.9117,84.7815,15.4163,15.4747,15.7896,76.0207,94.5388,4.80547,2.67508,87.5241,49.3603,60.2021,62.6054,75.8054,34.0197,99.0982,81.5167,37.0698,32.588,64.1149,12.5648,3.97224,43.79,14.6984,73.4199,96.859,68.5172,94.9882,30.7323,73.8993,25.2396,93.644,58.6808,40.6559,9.11869,74.4704,16.6766,3.65755,79.2759,19.3517,91.1817,28.6361,79.5537,53.7871,4.44154,13.5735,52.8853,85.9583,50.6433,85.4733,50.0731,63.2081,89.4455,93.8631,77.9065,62.8654,90.7221,46.4236,57.8536,21.4544,20.3229,83.0932,15.0984,79.0037,23.7491,24.2171,53.4741,40.4257,27.8746,32.7499,59.7774,19.0563,61.3861,39.3311,72.8434,65.8276,52.9046,25.7287,51.7859,3.54789,11.202,1.85898,66.756,0.647476,95.7221,44.6624,63.5129,86.4441,91.086,21.3665,7.89851,11.4089,4.45974,22.9969,90.4126,28.2089,47.214,43.8867,68.6346,75.0886,76.6366,28.412,94.1449,38.0227,67.7431,66.9884,3.85029,20.6477,92.7171,55.6361]'  
  
postgres=# \timing on  
Timing is on.   
postgres=# select id,info, vec <=> :v1 from tbl_vec order by vec <=> :v1 limit 10;  
  id   |               info               |     ?column?        
-------+----------------------------------+-------------------  
 33990 | 73357616b73392961610fd4056df7872 | 0.200381241291853  
 65206 | f7126017dee115ba292f5fc1e6fe094c | 0.202510531635712  
 28629 | 47a751432609ac468a021c9af3666c8b | 0.202872040131985  
 78789 | 8ca283c6f2719e0ecfb5ed7f33285bfd |  0.20292308001207  
 25061 | d6d02501fef41890b6e999bd8ef45d4e | 0.203712549333763  
 65012 | 21dee0e74a8f547d4a8c4e9e9cea42b2 |  0.20454229165218  
 82237 | ad49dd1204e19e04da0428fb28ce5c64 | 0.204661291873772  
 27672 | 8aa2bea9608899ab8e13866dd2387f47 |  0.20499257967393  
 26030 | 1c94ae0fe923bd775278aff7dd7650d0 | 0.205011457373235  
  5666 | 237ae0154cf6ca0e87a90fe85c20765b | 0.205843957675826  
(10 rows)  
  
Time: 140.098 ms

创建向量索引

postgres=# set maintenance_work_mem ='512MB';  
  
postgres=# CREATE INDEX ON tbl_vec USING hnsw (vec vector_cosine_ops);  
CREATE INDEX  
Time: 126018.043 ms (02:06.018)

使用索引加速搜索某个向量的cosine_distance相似向量

postgres=# select id,info, vec <=> :v1 from tbl_vec order by vec <=> :v1 limit 10;  
  id   |               info               |     ?column?        
-------+----------------------------------+-------------------  
 28629 | 47a751432609ac468a021c9af3666c8b | 0.202872040131985  
  3843 | 16e453bdc6e7cce238b649d0d162c7c7 | 0.206543485852313  
 57218 | 0fa2b675124b3f1c3bd1fc26203c5eaf |  0.20931599952677  
 58681 | 562db2b3b7851727fa2b8f1ba15788fc | 0.211436702245081  
 84409 | d5432e2c5c010e49fbf8927f8d881365 | 0.211825938887412  
 67532 | 90a3511dc90684f22ef9756af142dc7a | 0.211948480104953  
 27195 | 6b6d2b3157eae64168b8d419da0fb32b | 0.212202785190134  
 49533 | 4b31ad2839436ce5bdc86358ec3bf2f2 | 0.212349440226144  
 62096 | d2adff2b3322fd98419e3c7cef737b05 | 0.212489664916925  
 47447 | 0df1dcdea170add8cd3f987e79638ae6 | 0.213359929199801  
(10 rows)  
  
Time: 9.736 ms

3、营销场景目标人群圈选插件: smlar

开源项目地址: https://github.com/jirutka/smlar

smlar插件是GIN索引的一种功能扩展, 可以通过索引快速判断数组的相似性(根据数组元素的相交个数, 或者相交个数在整体集合的占比). 如果数组的每个元素代表一个特征标签, 在电商、短视频、社交等业务场景可以给商品、视频、用户贴标签, 根据标签(数组)的相似性, 圈选目标人群、圈选感兴趣的内容等.

用法举例.

创建插件

postgres=# create extension smlar ;  
CREATE EXTENSION

创建用户标签表

create table users (  
  id int primary key,  
  nick text,  
  tags int[]  -- 假设使用int数组来表示用户拥有的标签  
);

生成10万用户和随机标签

-- 生成随机标签的函数  
create or replace function gen_rand_arr (tags int, tags_catacity int) returns int[] as $$  
  select array(select ceil(random() * tags_catacity)::int from generate_series(1, tags));  
$$ language sql strict;  
  
postgres=# select gen_rand_arr(10,1000);  
               gen_rand_arr                  
-------------------------------------------  
 {280,703,893,376,153,753,458,119,141,133}  
(1 row)  
  
-- 假设每人10-100个标签, 标签总数1000个.  
insert into users select generate_series(1,100000), md5(random()::text), gen_rand_arr(9 + ceil(random()*91)::int, 1000) ;

根据某个标签圈选10个目标用户

postgres=# select show_smlar_limit();  
 show_smlar_limit   
------------------  
              0.6  
(1 row)  
  
postgres=# show smlar.type;    
 smlar.type   
------------  
 cosine  
(1 row)  
  
postgres=# \do+  
                                   List of operators  
 Schema | Name | Left arg type | Right arg type | Result type | Function | Description   
--------+------+---------------+----------------+-------------+----------+-------------  
 public | %    | anyarray      | anyarray       | boolean     | smlar_op |   
(1 row)  
  
postgres=# select ''''||tags::text||'''' as v1 from users where id=1 \gset  
  
postgres=# \echo :v1  
'{74,668,277,6,673,238,400,611,573,737,559,615,266,502,114,401,223,393,104,116,768,257,869,226,375,9,358,752,413,166,481,486,833,758,491,506,996,891,116,568,628,675,183,893,177,296,294,400,688,398,516,456,654,384,681,29,392,39,780,805,204,261,290,37,19,781,542,14,672,658,581,299,332,763,192,508,59,485,908,747,883,423,202}'  
  
postgres=# select id, smlar(tags, :v1), tags % :v1 from users where tags % :v1 limit 10;  
 id | smlar | ?column?   
----+-------+----------  
  1 |     1 | t  
(1 row)  
  
Time: 454.579 ms  
  
-- 修改相似度阈值, 超过20% 相似的就返回  
postgres=# select set_smlar_limit(0.2);  
 set_smlar_limit   
-----------------  
             0.2  
(1 row)  
  
postgres=# select id, smlar(tags, :v1), tags % :v1 from users where tags % :v1 limit 10;  
  id   |  smlar   | ?column?   
-------+----------+----------  
     1 |        1 | t  
 13805 | 0.226335 | t  
 21515 | 0.209751 | t  
 32683 | 0.221305 | t  
 74386 | 0.208514 | t  
 81936 | 0.211684 | t  
 95575 | 0.200222 | t  
(7 rows)  
  
Time: 480.429 ms

创建标签字段索引

create index on users using gin (tags _int4_sml_ops);

使用索引加速根据某个标签圈选10个目标用户

postgres=# select set_smlar_limit(0.6);  
 set_smlar_limit   
-----------------  
             0.6  
(1 row)  
  
postgres=# select id, smlar(tags, :v1), tags % :v1 from users where tags % :v1 limit 10;  
 id | smlar | ?column?   
----+-------+----------  
  1 |     1 | t  
(1 row)
Time: 46.256 ms

4、地理信息搜索插件: PostGIS

开源项目地址: https://postgis.net/

PostGIS是地理信息、时空数据库, 包括空间类型、轨迹等类型, 基于这些类型的索引, 操作符等. 应用广泛, 例如地图、基于地理位置的社交、共享出行、外卖等应用.

用法举例.

创建插件

postgres=# create extension postgis;  
CREATE EXTENSION

创建POI表, 表示地图上的餐饮店位置

create table tbl_poi (  
  id int primary key,  
  info text,  
  pos geometry  
);

生成10万条记录, 经纬度圈定在一个范围内随机生成

insert into tbl_poi select generate_series(1,100000), md5(random()::text),   
  ST_SetSRID(ST_MakePoint(120+random(), 35+random()),4326);

输入一个经纬度, 由近到远排序输出10条记录

select id,info,pos <-> ST_SetSRID(ST_MakePoint(120, 35),4326) as distance   
  from tbl_poi order by pos <-> ST_SetSRID(ST_MakePoint(120, 35),4326) limit 10;  
  
  id   |               info               |      distance         
-------+----------------------------------+---------------------  
 87207 | 6c3be5ba80405c1bcbc41ddeabe4845e | 0.00380898314902217  
  2395 | bea776d9ea01fe71aa7217588ca0e340 | 0.00475504602701775  
 83860 | 3346ea9e673776bef89e8bfb90176e5f | 0.00476231353857793  
 20391 | 4b6188e5b45e96b299d2bbdf0dfccc88 |  0.0051285288529832  
 65722 | 75cb60337e72d23e27590ee36b6d8553 | 0.00679551058653407  
 70932 | e5203b78d0294740e036106f70b2e58b | 0.00869448513564811  
 42792 | 9aba360479ba28c8a98107af6f45cb62 | 0.00936264324089115  
 38434 | 3c2565401c1b4e1d010d9326c6913e07 | 0.00937400159588758  
 30599 | 2d5d0ce802bffb2a269312338532def2 | 0.00991881579948414  
 80021 | 018dbf83a82342e4fa02fa9b50301fca |  0.0110926771171926  
(10 rows)  
  
Time: 62.047 ms

创建索引

create index on tbl_poi using gist (pos);

使用索引加速, 输入一个经纬度, 由近到远排序输出10条记录

id   |               info               |      distance         
-------+----------------------------------+---------------------  
 87207 | 6c3be5ba80405c1bcbc41ddeabe4845e | 0.00380898314902217  
  2395 | bea776d9ea01fe71aa7217588ca0e340 | 0.00475504602701775  
 83860 | 3346ea9e673776bef89e8bfb90176e5f | 0.00476231353857793  
 20391 | 4b6188e5b45e96b299d2bbdf0dfccc88 |  0.0051285288529832  
 65722 | 75cb60337e72d23e27590ee36b6d8553 | 0.00679551058653407  
 70932 | e5203b78d0294740e036106f70b2e58b | 0.00869448513564811  
 42792 | 9aba360479ba28c8a98107af6f45cb62 | 0.00936264324089115  
 38434 | 3c2565401c1b4e1d010d9326c6913e07 | 0.00937400159588758  
 30599 | 2d5d0ce802bffb2a269312338532def2 | 0.00991881579948414  
 80021 | 018dbf83a82342e4fa02fa9b50301fca |  0.0110926771171926  
(10 rows)  
  
Time: 6.205 ms

5、中文分词插件: pg_jieba

开源项目地址: https://github.com/jaiminpan/pg_jieba

中文分词, 顾名思义是将中文文本切分为有意义的词语, 用户就可以通过词语矢量进行中文匹配检索.

用法举例.

编译pg_jieba

git clone --depth=1 https://github.com/jaiminpan/pg_jieba    
    
cd pg_jieba    
    
# initilized sub-project    
git submodule update --init --recursive    
    
mkdir build    
cd build    
    
cmake -DCMAKE_PREFIX_PATH=/home/postgres/tmp_basedir_polardb_pg_1100_bld ..    
    
make    
make install

安装pg_jieba插件

create extension pg_jieba ;

测试中文分词

select * from to_tsquery('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。');    
select * from to_tsvector('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。');    
select * from ts_token_type('jieba');    
select * from ts_debug('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。');
postgres=# create extension pg_jieba ;    
CREATE EXTENSION    
postgres=# select * from to_tsquery('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。');    
                                                            to_tsquery                                                                
----------------------------------------------------------------------------------------------------------------------------------    
 '拖拉机' & '学院' & '手扶拖拉机' & '专业' & '不用' & '多久' & '会' & '升职' & '加薪' & '当上' & 'ceo' & '走上' & '人生' & '巅峰'    
(1 row)    
    
postgres=# select * from to_tsvector('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。');    
                                                                to_tsvector                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------    
 'ceo':18 '不用':8 '专业':5 '人生':21 '会':13 '加薪':15 '升职':14 '多久':9 '学院':3 '巅峰':22 '当上':17 '手扶拖拉机':4 '拖拉机':2 '走上':20    
(1 row)    
    
postgres=# select * from ts_token_type('jieba');    
 tokid | alias |         description             
-------+-------+-----------------------------    
     1 | eng   | letter    
     2 | nz    | other proper noun    
     3 | n     | noun    
     4 | m     | numeral    
     5 | i     | idiom    
     6 | l     | temporary idiom    
     7 | d     | adverb    
     8 | s     | space    
     9 | t     | time    
    10 | mq    | numeral-classifier compound    
    11 | nr    | person's name    
    12 | j     | abbreviate    
    13 | a     | adjective    
    14 | r     | pronoun    
    15 | b     | difference    
    16 | f     | direction noun    
    17 | nrt   | nrt    
    18 | v     | verb    
    19 | z     | z    
    20 | ns    | location    
    21 | q     | quantity    
    22 | vn    | vn    
    23 | c     | conjunction    
    24 | nt    | organization    
    25 | u     | auxiliary    
    26 | o     | onomatopoeia    
    27 | zg    | zg    
    28 | nrfg  | nrfg    
    29 | df    | df    
    30 | p     | prepositional    
    31 | g     | morpheme    
    32 | y     | modal verbs    
    33 | ad    | ad    
    34 | vg    | vg    
    35 | ng    | ng    
    36 | x     | unknown    
    37 | ul    | ul    
    38 | k     | k    
    39 | ag    | ag    
    40 | dg    | dg    
    41 | rr    | rr    
    42 | rg    | rg    
    43 | an    | an    
    44 | vq    | vq    
    45 | e     | exclamation    
    46 | uv    | uv    
    47 | tg    | tg    
    48 | mg    | mg    
    49 | ud    | ud    
    50 | vi    | vi    
    51 | vd    | vd    
    52 | uj    | uj    
    53 | uz    | uz    
    54 | h     | h    
    55 | ug    | ug    
    56 | rz    | rz    
(56 rows)    
    
postgres=#  select * from ts_debug('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。');    
 alias |  description  |   token    | dictionaries | dictionary |   lexemes        
-------+---------------+------------+--------------+------------+--------------    
 v     | verb          | 是         | {jieba_stem} | jieba_stem | {}    
 n     | noun          | 拖拉机     | {jieba_stem} | jieba_stem | {拖拉机}    
 n     | noun          | 学院       | {jieba_stem} | jieba_stem | {学院}    
 n     | noun          | 手扶拖拉机 | {jieba_stem} | jieba_stem | {手扶拖拉机}    
 n     | noun          | 专业       | {jieba_stem} | jieba_stem | {专业}    
 uj    | uj            | 的         | {jieba_stem} | jieba_stem | {}    
 x     | unknown       | 。         | {jieba_stem} | jieba_stem | {}    
 v     | verb          | 不用       | {jieba_stem} | jieba_stem | {不用}    
 m     | numeral       | 多久       | {jieba_stem} | jieba_stem | {多久}    
 x     | unknown       | ,         | {jieba_stem} | jieba_stem | {}    
 r     | pronoun       | 我         | {jieba_stem} | jieba_stem | {}    
 d     | adverb        | 就         | {jieba_stem} | jieba_stem | {}    
 v     | verb          | 会         | {jieba_stem} | jieba_stem | {会}    
 v     | verb          | 升职       | {jieba_stem} | jieba_stem | {升职}    
 nr    | person's name | 加薪       | {jieba_stem} | jieba_stem | {加薪}    
 x     | unknown       | ,         | {jieba_stem} | jieba_stem | {}    
 t     | time          | 当上       | {jieba_stem} | jieba_stem | {当上}    
 eng   | letter        | CEO        | {jieba_stem} | jieba_stem | {ceo}    
 x     | unknown       | ,         | {jieba_stem} | jieba_stem | {}    
 v     | verb          | 走上       | {jieba_stem} | jieba_stem | {走上}    
 n     | noun          | 人生       | {jieba_stem} | jieba_stem | {人生}    
 n     | noun          | 巅峰       | {jieba_stem} | jieba_stem | {巅峰}    
 x     | unknown       | 。         | {jieba_stem} | jieba_stem | {}    
(23 rows)

生成10万条随机分词数据

create or replace function gen_rand_ts(tslen int) returns tsvector as $$        
  select array_to_tsvector(array_agg(substring(md5(random()::text),1,8))) from generate_series(1,$1);        
$$ language sql strict;      
    
create table test (id int, vec tsvector);    
    
insert into test select generate_series(1,100000), gen_rand_ts(24);

创建索引

create index on test using gin (vec);

查询测试

postgres=# select * from test limit 10;    
 id |                                                                                                                                   vec                                                                   
                                                                        
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
--------------------------------------------------------------------    
  1 | '0eed2ca6' '149ae774' '260d6cae' '2bde3230' '38ce089c' '3fdfb67c' '40bf233a' '41825567' '52de4ebb' '5708b49d' '63bdd9ea' '650f2dbf' '6d35d142' '7c711c0d' '7e4e028f' '913802bf' 'a8a14013' 'b6aa8ba4    
' 'd9f595e6' 'dc054607' 'dee1a2f7' 'e5b6d7e8' 'eacb6356' 'eee81aaf'    
  2 | '00bf8fbc' '117b7b5c' '1e4c8295' '2d379ff7' '2e263dcb' '48967fe5' '4f20db40' '5f7aefcd' '616cbb8e' '81d4e152' '876b2318' '8c18f4c3' '8e732b6f' '94f6b13b' '9c53cb8e' 'aedca11c' 'b56c7ed4' 'c5008853    
' 'cc407ea8' 'd4f3d5a1' 'd63ca731' 'd87514ec' 'f9626af4' 'fa5b7458'    
  3 | '0e3b6147' '13674c4d' '16463e9b' '32894aca' '3a15d964' '453c9a26' '54664d82' '5cb0e40d' '62c8ca30' '6d0ebc3a' '6ee0a517' '71ccfeb5' '7e75a9d5' '7f61f401' '87b5f2cb' '8f1c6274' '976dff7f' '9b7a6758    
' 'af9c624e' 'e5422d57' 'ed7bb9d4' 'edc039a2' 'efe1e5fa' 'f9db8132'    
  4 | '118bf21c' '2087d303' '2579c220' '3733357a' '503b50ec' '56104ea2' '573b9ea9' '58a665af' '59250bad' '86abf8a9' '8a3b5a72' '8d8bb478' 'a16b8bd8' 'ac966a06' 'af4eabd8' 'b09ccbb5' 'b2d7aac4' 'b5134f1b    
' 'b5228857' 'b6836add' 'bcafbce0' 'd1ca5a3a' 'e8588e37' 'f6ffe6b0'    
  5 | '01876ad5' '07a8a579' '0a33ce9e' '0b5bbdd4' '10b00efe' '118fae91' '1c12acee' '2d74f4eb' '2d99481c' '41483d1c' '6864b85e' '7ba1937f' '8a6ccb01' '9c1ae58b' 'a251fd3d' 'a936eecd' 'b560d231' 'baa6927f    
' 'd78f04c6' 'dabff656' 'e5d975c0' 'f0598071' 'f819b029' 'fb202c1a'    
  6 | '1c6eea85' '23f37dd9' '28151030' '319fa87f' '447ddc9d' '45dcc30a' '5269c7c2' '77184ff9' '792793c2' '81f63a78' '87b67199' '8ddc346f' '9dbc6f02' 'a4130ee7' 'a4b21300' 'a8ae9afe' 'ae54596a' 'b01e580a    
' 'c17caa99' 'c7784bd5' 'd27a19ce' 'df21c10f' 'e383a9d0' 'fde1f572'    
  7 | '1c6e6d6e' '209c45cf' '23415a93' '292ba393' '3d64d313' '49cf134a' '4a1a1f0d' '4c7e54a7' '4e74180a' '5054e77e' '5882f01f' '59c25e04' '69eb2f87' '6f2ed6bb' '7c830771' '81c415f5' '975f413a' 'a3dc8375    
' 'a5a38d13' 'b1f83c28' 'bb62f740' 'c8bab4d1' 'd947163c' 'f3a81f80'    
  8 | '0800c7b2' '0ffbe32e' '19f84945' '1c001bd3' '1f3f5826' '2e13cca1' '36ca5372' '3abc8149' '516878e9' '534357fc' '67cb7af9' '69a7849d' '8c134ad3' '8d87ed42' '96069ef5' '98bfcdbe' 'b4b0ffa1' 'bc61912a    
' 'ddf1d8e6' 'e07722ea' 'e68ffbbf' 'f0751b01' 'f12cb4b9' 'fe0a7c4c'    
  9 | '14588466' '1b16dfff' '25339aa7' '4874dc00' '4c6bb5bf' '510c8f7b' '59cbfb21' '70372c94' '7db5e3c2' '85f68385' '8b0e7746' '9596e2d0' '997ca4d3' '9f4df7dc' 'b1726109' 'c42ae6e4' 'dc759b2d' 'e378d2d5    
' 'e956bc2b' 'ea5c6ed2' 'f0e58f77' 'f24f74b1' 'fa6df884' 'fa8edffb'    
 10 | '0188c7ac' '09a75236' '15fb2eee' '1dc80e6e' '2f543594' '3559f46a' '4369adcd' '477410ed' '5df678d0' '799bc453' '80ad7901' '81871ec1' '92faa899' '94c9cf0f' '971d699f' 'a002241a' 'a1636465' 'aee34bbb    
' 'b08f2f0c' 'b697c161' 'b8f290b9' 'd0acf8b4' 'd3beb05b' 'f8ca2a66'    
(10 rows)    
    
postgres=# explain select * from test where vec @@ '52de4ebb & 41825567'::tsquery ;    
                                 QUERY PLAN                                     
----------------------------------------------------------------------------    
 Bitmap Heap Scan on test  (cost=36.02..43.91 rows=2 width=300)    
   Recheck Cond: (vec @@ '''52de4ebb'' & ''41825567'''::tsquery)    
   ->  Bitmap Index Scan on test_vec_idx  (cost=0.00..36.02 rows=2 width=0)    
         Index Cond: (vec @@ '''52de4ebb'' & ''41825567'''::tsquery)    
(4 rows)    
    
postgres=# select * from test where vec @@ '52de4ebb & 41825567'::tsquery ;    
 id |                                                                                                                                   vec                                                                   
                                                                        
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
--------------------------------------------------------------------    
  1 | '0eed2ca6' '149ae774' '260d6cae' '2bde3230' '38ce089c' '3fdfb67c' '40bf233a' '41825567' '52de4ebb' '5708b49d' '63bdd9ea' '650f2dbf' '6d35d142' '7c711c0d' '7e4e028f' '913802bf' 'a8a14013' 'b6aa8ba4    
' 'd9f595e6' 'dc054607' 'dee1a2f7' 'e5b6d7e8' 'eacb6356' 'eee81aaf'    
(1 row)    
    
Time: 0.768 ms

不使用索引测试

postgres=# set enable_bitmapscan =off;    
SET    
Time: 0.887 ms    
postgres=# explain select * from test where vec @@ '52de4ebb & 41825567'::tsquery ;    
                        QUERY PLAN                             
-----------------------------------------------------------    
 Seq Scan on test  (cost=0.00..5417.00 rows=2 width=300)    
   Filter: (vec @@ '''52de4ebb'' & ''41825567'''::tsquery)    
(2 rows)    
    
Time: 1.136 ms    
    
    
postgres=# select * from test where vec @@ '52de4ebb & 41825567'::tsquery ;    
 id |                                                                                                                                   vec                                                                   
                                                                        
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
--------------------------------------------------------------------    
  1 | '0eed2ca6' '149ae774' '260d6cae' '2bde3230' '38ce089c' '3fdfb67c' '40bf233a' '41825567' '52de4ebb' '5708b49d' '63bdd9ea' '650f2dbf' '6d35d142' '7c711c0d' '7e4e028f' '913802bf' 'a8a14013' 'b6aa8ba4    
' 'd9f595e6' 'dc054607' 'dee1a2f7' 'e5b6d7e8' 'eacb6356' 'eee81aaf'    
(1 row)    
    
Time: 51.815 ms

10万条文本向量, 搜索命中1条. 性能参考:

  • GIN索引 0.768 ms VS 全表扫描 49.055 ms

自定义词典? 参考 https://github.com/jaiminpan/pg_jieba

## USER DEFINED DICTIONARY
#### Dictionary Format
+ Words  weight  type
+ Words  type
+ Words
  云计算
  韩玉鉴赏
  蓝翔 nz
  区块链 10 nz
Reference jieba_user.dict
#### How to use your own dictionary
  cd /PATH/TO/POSTGRESQL_INSTALL/share/postgresql/tsearch_data
  OR
  cd /PATH/TO/POSTGRESQL_INSTALL/share/tsearch_data
  cp 'YOUR DICTIONARY' jieba_user.dict
#### Dictionary Sharing
+ [dict.367W.utf8] iLife

更多用法请参考 《配置 jieba结巴分词 for PolarDB 实现数据库高性能文本分词搜索》

6、融合计算插件: duckdb_fdw

开源项目地址: https://github.com/alitrack/duckdb_fdw

duckdb是一款性能非常好的in-process OLAP数据库, 同时支持数据湖架构, 支持访问s3/oss/http等远端数据, 支持列存储/parquet等常见的存储结构. 通过duckdb_fdw, PolarDB也可以具备以上能力.

用法举例.

下载duckdb cli 根据你的环境选一种下载 (aarch64架构)

wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-aarch64.zip  
unzip duckdb_cli-linux-aarch64.zip

下载duckdb cli 根据你的环境选一种下载 (x86_64架构)

wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-amd64.zip  
unzip duckdb_cli-linux-amd64.zip

生成parquet数据文件

./duckdb /var/polardb/db    
  
COPY (select generate_series as id, md5(random()::text) as info,   
now()::timestamp+(generate_series||' second')::interval as crt_time   
from generate_series(1,100)) TO '/var/polardb/t1.parquet' (FORMAT 'PARQUET');    
    
COPY (select generate_series as cid, md5(random()::text) as info,   
now()::timestamp+(generate_series||' second')::interval as crt_time   
from generate_series(1,100)) TO '/var/polardb/t2.parquet' (FORMAT 'PARQUET');    
    
COPY (select (floor(random()*100)+1)::int as gid, (floor(random()*100)+1)::int as cid,   
(random()*10)::int as c1, (random()*100)::int as c2, (random()*1000)::int as c3,   
(random()*10000)::int as c4, (random()*100000)::int as c5 from generate_series(1,1000000))   
TO '/var/polardb/t3.parquet' (FORMAT 'PARQUET');    
    
create view t1 as select * from read_parquet('/var/polardb/t1.parquet');    
create view t2 as select * from read_parquet('/var/polardb/t2.parquet');    
create view t3 as select * from read_parquet('/var/polardb/t3.parquet');    
    
checkpoint;    
  
D DESCRIBE t1 ;  
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐  
│ column_name │ column_type │  null   │   key   │ default │ extra │  
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ int32 │  
├─────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤  
│ id          │ BIGINT      │ YES     │         │         │       │  
│ info        │ VARCHAR     │ YES     │         │         │       │  
│ crt_time    │ TIMESTAMP   │ YES     │         │         │       │  
└─────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘  
D DESCRIBE t2 ;  
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐  
│ column_name │ column_type │  null   │   key   │ default │ extra │  
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ int32 │  
├─────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤  
│ cid         │ BIGINT      │ YES     │         │         │       │  
│ info        │ VARCHAR     │ YES     │         │         │       │  
│ crt_time    │ TIMESTAMP   │ YES     │         │         │       │  
└─────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘  
D DESCRIBE t3 ;  
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐  
│ column_name │ column_type │  null   │   key   │ default │ extra │  
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ int32 │  
├─────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤  
│ gid         │ INTEGER     │ YES     │         │         │       │  
│ cid         │ INTEGER     │ YES     │         │         │       │  
│ c1          │ INTEGER     │ YES     │         │         │       │  
│ c2          │ INTEGER     │ YES     │         │         │       │  
│ c3          │ INTEGER     │ YES     │         │         │       │  
│ c4          │ INTEGER     │ YES     │         │         │       │  
│ c5          │ INTEGER     │ YES     │         │         │       │  
└─────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘  
  
.quit

编译duckdb_fdw v1.1.3插件

# 注释的不需要执行. 如果是以前的duckdb_fdw版本 可能需要按注释的命令来操作!!!!
cd /var/polardb
# git clone --depth 1 -b duckdb1.1.0 https://github.com/digoal/duckdb_fdw
# cd duckdb_fdw 
wget https://github.com/alitrack/duckdb_fdw/archive/refs/tags/v1.1.3.tar.gz
tar -zxvf v1.1.3.tar.gz
cd duckdb_fdw-1.1.3 
mkdir libduckdb     
cd libduckdb  
wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/libduckdb-src.zip  
sudo apt-get install -y unzip  
unzip libduckdb-src.zip    
unalias cp    
cp -f duckdb.h ../    
cp -f duckdb.hpp ../ 
# 需要给容器多分配一点内存/swap, 否则容器OOM
# 注意 clang++ -c 编译项 需要与 duckdb_fdw src 的 Makefile PG_CXXFLAGS 一致.
#   区分linux / macos , 区分 arm(aarch64) / x86架构, 请仔细判断. 
# macos 编译参数 -std=c++11 
# aarch64 编译参数 -std=c++11 
# x86 编译参数 -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 
clang++ -c -fPIC -std=c++11 duckdb.cpp -o duckdb.o     
clang++ -shared -o libduckdb.so *.o     
# cp -f libduckdb.so $(pg_config --libdir)     
# cp -f libduckdb.so ../
cp -f libduckdb.so ../libduckdb.1.1.3.so  
cd ../     
# USE_PGXS=1 make uninstall    
# USE_PGXS=1 make clean    
# USE_PGXS=1 make distclean       
# USE_PGXS=1 make install
DUCKDB_VERSION=1.1.3 USE_PGXS=1 make uninstall    
DUCKDB_VERSION=1.1.3 USE_PGXS=1 make clean    
DUCKDB_VERSION=1.1.3 USE_PGXS=1 make distclean    
DUCKDB_VERSION=1.1.3 USE_PGXS=1 make install

加载duckdb_fdw插件

create extension duckdb_fdw version "1.1.3";

创建foreign server

CREATE SERVER DuckDB_server FOREIGN DATA WRAPPER duckdb_fdw OPTIONS   
  (database '/var/polardb/db');

导入foreign table

-- 一次性导入所有duckdb的表到foreign table
IMPORT FOREIGN SCHEMA public FROM SERVER DuckDB_server INTO public;
-- 或 挨个创建foreign table
create foreign table t1 (id int8, info text, crt_time timestamp) 
  server duckdb_server OPTIONS (table 't1');
create foreign table t2 (cid int8, info text, crt_time timestamp) 
  server duckdb_server OPTIONS (table 't2');
create foreign table t3 (gid int, cid int, c1 int, c2 int, c3 int, c4 int, c5 int) 
  server duckdb_server OPTIONS (table 't3');  
    
postgres=# \det    
     List of foreign tables    
 Schema | Table |    Server    
--------+-------+---------------    
 public | t1    | duckdb_server    
 public | t2    | duckdb_server    
 public | t3    | duckdb_server    
(3 rows)

为duckdb_fdw安装parquet插件

postgres=# SELECT duckdb_execute('duckdb_server', 'install parquet');
 duckdb_execute 
----------------
 
(1 row)
postgres=# SELECT duckdb_execute('duckdb_server', 'load parquet');
 duckdb_execute 
----------------
 
(1 row)

查询parquet数据性能

postgres=# explain verbose select count(distinct gid) from t3;    
                         QUERY PLAN    
-------------------------------------------------------------    
 Foreign Scan  (cost=1.00..1.00 rows=1 width=8)    
   Output: (count(DISTINCT gid))    
   SQLite query: SELECT count(DISTINCT "gid") FROM main."t3"    
(3 rows)    
    
postgres=# \timing    
Timing is on.    
    
select count(distinct gid) from t3;    
 count    
-------    
   100    
(1 row)    
Time: 13.556 ms    
    
select count(distinct gid),count(*) from t3;    
 count |  count    
-------+---------    
   100 | 1000000    
(1 row)    
Time: 21.836 ms    
    
explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id;    
                                                            QUERY PLAN    
-----------------------------------------------------------------------------------------------------------------------------------    
 Foreign Scan  (cost=1.00..1.00 rows=1 width=16)    
   Output: (count(DISTINCT t3.gid)), (count(*))    
   SQLite query: SELECT count(DISTINCT r1."gid"), count(*) FROM (main."t3" r1 INNER JOIN main."t1" r2 ON (((r1."gid" = r2."id"))))    
(3 rows)    
    
select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id;    
 count |  count    
-------+---------    
   100 | 1000000    
(1 row)    
Time: 15.536 ms    
    
select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id join t2 on t3.cid=t2.cid;    
 count |  count    
-------+---------    
   100 | 1000000    
(1 row)    
Time: 18.570 ms    
    
explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id join t2 on t3.cid=t2.cid;    
 Foreign Scan  (cost=1.00..1.00 rows=1 width=16)    
   Output: (count(DISTINCT t3.gid)), (count(*))    
   SQLite query: SELECT count(DISTINCT r1."gid"), count(*) FROM ((main."t3" r1 INNER JOIN main."t1" r2 ON (((r1."gid" = r2."id")))) INNER JOIN main."t2" r4 ON (((r1."cid" = r4."cid"))))    
(3 rows)

将数据导入PolarDB本地, 做同样的查询看一下时间.

postgres=# create table lt1 as select * from t1;    
SELECT 100    
postgres=# create table lt2 as select * from t2;    
SELECT 100    
postgres=# create table lt3 as select * from t3;    
SELECT 10000000    
    
postgres=# \timing    
Timing is on.    
postgres=# select count(distinct gid) from lt3;    
 count    
-------    
   100    
(1 row)    
Time: 1142.800 ms (00:01.143)

duckdb_fdw+Parquet 比PolarDB本地行存储表快了近100倍.

更多用法请参考

如果遇到问题可以提issue给duckdb_fdw或PolarDB开源项目

7、读写分离工具: pgpool-II

开源项目地址: https://www.pgpool.net/

pgpool-II 是一款开源的支持读写分离的连接池, 通过pgpool-II, 应用程序访问PolarDB 时, 可以根据SQL的特征自动路由到RW, RO节点, 实现对应用透明的读写分离功能.

用法举例.

部署pgpool-II 软件

cd /var/polardb    
wget https://www.pgpool.net/mediawiki/download.php?f=pgpool-II-4.5.5.tar.gz -O pgpool-II-4.5.5.tar.gz    
    
tar -zxvf pgpool-II-4.5.5.tar.gz    
    
cd pgpool-II-4.5.5    
    
./configure --prefix=/usr/local/pgpool4.5.5 --with-openssl    
    
make -j 4   
sudo make install

配置动态库和默认路径

sudo vi /etc/ld.so.conf    
# addd    
/usr/local/pgpool4.5.5/lib    
    
# 更新动态链接库  
sudo ldconfig    
   
# 设置shell环境变量初始化文件  
echo "export PATH=/usr/local/pgpool4.5.5/bin:\$PATH" >>  ~/.bashrc   
    
# 设置当前环境变量  
. ~/.bashrc

配置pgpool-II

polardb 3节点配置如下, 注意rw节点和2个ro节点的port, 这里5433是rw:

[postgres@1373488a35ab ~]$ psql -p 5432 -c "select pg_is_in_recovery();"    
 pg_is_in_recovery     
-------------------    
 t    
(1 row)    
    
[postgres@1373488a35ab ~]$ psql -p 5433 -c "select pg_is_in_recovery();"    
 pg_is_in_recovery     
-------------------    
 f    
(1 row)    
    
[postgres@1373488a35ab ~]$ psql -p 5434 -c "select pg_is_in_recovery();"    
 pg_is_in_recovery     
-------------------    
 t    
(1 row)

polardb与aurora类似, 共享存储集群模式, 无需pgpool来管理HA.

配置pgpool.conf

cd /usr/local/pgpool4.5.5/etc    
    
sudo vi pgpool.conf
# 注意rw节点和ro节点的port和前面查到的要对应上, 不要配错了. 
    
listen_addresses = '0.0.0.0'    
port = 9999    
unix_socket_directories = '/tmp'    
pcp_listen_addresses = 'localhost'    
pcp_port = 9898    
pcp_socket_dir = '/tmp'    
log_destination = 'stderr'    
logging_collector = on    
log_directory = '/tmp/pgpool_logs'    
pid_file_name = '/var/run/pgpool/pgpool.pid'    
logdir = '/tmp'    
    
backend_clustering_mode = 'streaming_replication'    
load_balance_mode = on    
sr_check_period = 0    
health_check_period = 0    
failover_on_backend_shutdown=off    
failover_on_backend_error=off    
enable_pool_hba = on    
    
backend_hostname0 = '127.0.0.1'    
backend_port0 = '5433'    
backend_weight0 = 1    
backend_application_name0 = 'polardb_primray'    
backend_flag0 = 'ALWAYS_PRIMARY|DISALLOW_TO_FAILOVER'    
    
backend_hostname1 = '127.0.0.1'    
backend_port1 = '5432'    
backend_weight1 = 2    
backend_application_name1 = 'polardb_reader1'    
backend_flag1 = 'DISALLOW_TO_FAILOVER'    
    
backend_hostname2 = '127.0.0.1'    
backend_port2 = '5434'    
backend_weight2 = 2    
backend_application_name2 = 'polardb_reader2'    
backend_flag2 = 'DISALLOW_TO_FAILOVER'

认证方法password,md5,scram请参考: 《PolarDB 100 问 | 密码明明正确怎么报密码错误?》

配置pool_hba.conf

sudo vi pool_hba.conf    
# add    
host all all 0.0.0.0/0 md5

配置pgpool数据库用户密码文件pool_passwd

[postgres@1373488a35ab etc]$ sudo $(which pg_md5) --md5auth --username=digoal pwd123    
    
[postgres@1373488a35ab etc]$ cat /usr/local/pgpool4.5.5/etc/pool_passwd     
digoal:md531a770cec82aa37e217bb6e46c3f9d55    
    
    
    
-- 实际上就是pwd+username的md5值    
postgres=# select md5('pwd123digoal');    
               md5                    
----------------------------------    
 31a770cec82aa37e217bb6e46c3f9d55    
(1 row)

在数据库中创建相应pgpool-II的映射用户

psql -p 5433  
  
postgres=# create user digoal superuser encrypted password 'pwd123' login;    
CREATE ROLE

配置pcp管理用户密码文件pcp.conf

postgres=# select md5('pwd123');    
               md5                    
----------------------------------    
 45cb41b32dcfb917ccd8614f1536d6da    
(1 row)    
    
    
    
cd /usr/local/pgpool4.5.5/etc    
sudo vi pcp.conf    
  
pcpadm:45cb41b32dcfb917ccd8614f1536d6da

准备pgpool运行时pid文件目录和日志目录

sudo mkdir /var/run/pgpool    
sudo mkdir /tmp/pgpool_logs

启动pgpool

sudo $(which pgpool)

使用pcp管理命令查看pgpool中间件状态

pcp_node_info -U pcpadm -p 9898 -W   
Password:    输入 pwd123  
127.0.0.1 5433 1 0.200000 waiting unknown primary unknown 0 none none 2024-01-26 19:40:20  
127.0.0.1 5432 1 0.400000 waiting unknown standby unknown 0 none none 2024-01-26 19:40:20  
127.0.0.1 5434 1 0.400000 waiting unknown standby unknown 0 none none 2024-01-26 19:40:20
[postgres@1373488a35ab etc]$ pcp_node_count -U pcpadm -p 9898 -W   
Password:     输入 pwd123  
3
pcp_pool_status  -U pcpadm -h localhost -p 9898 -W   
Password:    输入 pwd123  
...    
name : backend_data_directory2  
value:   
desc : data directory for backend #2  
  
name : backend_status2  
value: waiting  
desc : status of backend #2  
  
name : standby_delay2  
value: 0  
desc : standby delay of backend #2  
  
name : backend_flag2  
value: DISALLOW_TO_FAILOVER  
desc : backend #2 flag  
  
name : backend_application_name2  
value: polardb_reader2  
desc : application_name for backend #2  
...

使用pgpool代理链接polardb

export PGPASSWORD=pwd123    
export PGDATABASE=postgres    
psql -p 9999 -U digoal -c "select * from pg_stat_activity where pid=pg_backend_pid();"

测试pgpool读写分离

pgbench -i -s 1 -h 127.0.0.1 -p 9999 -U digoal postgres
pgbench -n -r -P 1 -c 8 -j 8 -T 120 -S -h 127.0.0.1 -p 9999 -U digoal postgres

通过另一会话观察压测中的连接:

[postgres@1373488a35ab ~]$ psql -p 5432 -c "select count(*) from pg_stat_activity where application_name='pgbench';"    
 count     
-------    
     8    
(1 row)    
    
[postgres@1373488a35ab ~]$ psql -p 5433 -c "select count(*) from pg_stat_activity where application_name='pgbench';"    
 count     
-------    
     8    
(1 row)    
    
[postgres@1373488a35ab ~]$ psql -p 5434 -c "select count(*) from pg_stat_activity where application_name='pgbench';"    
 count     
-------    
     8    
(1 row)

更多信息请参考 《PolarDB 开源版 使用pgpool-II实现透明读写分离》

更详细的用法请参考各个插件开源官网.

更多开源插件参考digoal github: PolarDB 开源生态插件心选 - 这些插件让业务战斗力提升100倍!!!

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
1天前
|
存储 关系型数据库 分布式数据库
PolarDB 开源基础教程系列 1 架构解读
PolarDB 是阿里云研发的云原生分布式数据库,基于 PostgreSQL 开源版本,旨在解决传统数据库在大规模数据和高并发场景下的性能和扩展性问题。其主要特点包括: 1. **存储计算分离架构**:通过将计算与存储分离,实现极致弹性、共享一份数据以降低成本、透明读写分离。 2. **HTAP 架构**:支持混合事务处理和分析处理(HTAP),能够在同一系统中高效执行 OLTP 和 OLAP 查询。 3. **优化的日志复制机制**:采用只复制元数据的方式减少网络传输量,优化页面回放和 DDL 锁回放过程。 4. **并行查询与索引创建**:引入 MPP 分布式执行引擎。
22 7
|
1天前
|
关系型数据库 分布式数据库 数据安全/隐私保护
PolarDB 开源基础教程系列 5 高级特性体验
PolarDB 特性解读与体验涵盖多项关键技术,包括预读/预扩展、Shared Server(建议使用连接池)、闪回表和闪回日志、弹性跨机并行查询(ePQ)及TDE透明数据加密。预读/预扩展通过批量I/O操作显著提升Vacuum、SeqScan等场景性能;Shared Server优化高并发短连接处理;闪回功能可恢复表至指定时间点;ePQ支持跨机并行查询以提高复杂查询效率;TDE确保数据存储层的安全加密。
12 1
|
1天前
|
弹性计算 关系型数据库 分布式数据库
PolarDB 开源基础教程系列 3 安装部署
PolarDB 安装部署指南涵盖了从单机版到集群版的详细步骤。内容包括: 1. **单机版安装**:通过Docker镜像`polardb_pg_local_instance`,提供快速搭建PolarDB学习或开发环境的方法,支持多种操作系统。 2. **集群版安装(ECS + ESSD)**:在阿里云ECS上部署两台计算节点,并使用ESSD作为共享存储,通过PFS(PolarDB File System)实现存算分离和读写分离的集群架构。
13 2
|
1天前
|
存储 关系型数据库 分布式数据库
PolarDB 开源基础教程系列 8 数据库生态
PolarDB是一款开源的云原生分布式数据库,源自阿里云商业产品。为降低使用门槛,PolarDB携手伙伴打造了完整的开源生态,涵盖操作系统、芯片、存储、集成管控、监控、审计、开发者工具、数据同步、超融合计算、ISV软件、开源插件、人才培养、社区合作及大型用户合作等领域。通过这些合作伙伴,PolarDB提供了丰富的功能和服务,支持多种硬件和软件环境,满足不同用户的需求。更多信息请访问[PolarDB开源官方网站](https://openpolardb.com/home)。
19 4
|
7月前
|
存储 关系型数据库 分布式数据库
开发者评测指南针对PolarDB开源版本的部署安装
开发者评测指南针对PolarDB开源版本的部署安装
188 2
|
7月前
|
监控 关系型数据库 分布式数据库
PolarDB开源项目成熟,强调插件开发与第三方工具集成,打造丰富生态。
【7月更文挑战第3天】PolarDB开源项目成熟,强调插件开发与第三方工具集成,打造丰富生态。插件开发涉及需求分析、接口设计、编码、测试和文档撰写。示例展示了性能监控插件的Go代码实现。此外,与DMS的数据迁移工具及Prometheus+Grafana监控系统的集成示例,展示了其易用性。PolarDB通过开放接口鼓励开发者参与生态建设,共同推动数据库技术进步。
93 1
|
9月前
|
关系型数据库 MySQL 分布式数据库
快速体验开源PolarDB -X 部署安装
在CentOS上部署PolarDB-X标准版集群的体验包括三步:安装python3和docker(如果未预装),然后使用venv创建环境,安装pxd并验证。接着,通过`pxd tryout -t standard`部署集群,该过程需拉取大量镜像,可能耗时且占用数GB空间,建议事先清理空间并了解资源需求。部署后,可查询集群状态和健康信息。最终,使用`pxd cleanup`清理。过程中因磁盘空间不足遇到问题,建议体验前提供系统配置需求,并允许用户自定义MySQL参数。
|
9月前
|
监控 关系型数据库 分布式数据库
【PolarDB开源】PolarDB开源生态构建:插件开发与第三方工具集成
【5月更文挑战第23天】PolarDB开源项目成熟,生态成为开发者关注点。其插件机制和接口设计允许添加自定义功能,无需修改核心代码,促进扩展建设。本文涵盖插件开发流程和第三方工具集成实践,如性能监控插件示例和数据迁移工具、监控系统集成。PolarDB通过开放生态与标准化接口,激发开发者潜力,共同推动数据库技术创新。
126 0
|
9月前
|
关系型数据库 MySQL 分布式数据库
【PolarDB开源】PolarDB开源社区动态:最新版本功能亮点与更新解读
【5月更文挑战第27天】PolarDB最新版发布,提升内核稳定性和性能,采用X-Paxos增强一致性与容灾能力。新版本支持分布式CDC升级,实现数据实时同步和离线分析。引入PITR功能,确保故障后快速恢复至指定时间点。提供MySQL一键导入,降低迁移成本。示例代码展示从MySQL到PolarDB-X的数据导入。随着技术发展,PolarDB将在企业数据处理中扮演更重要角色。
268 0
|
9月前
|
人工智能 NoSQL atlas
官宣!MongoDB Atlas再添新功能!进一步简化构建现代应用程序
利用数据简化高响应性应用程序的构建过程
3200 0