安装与使用PostgreSQL开源插件/工具
实验环境:
mkdir ~/pb11_1 cd ~/pb11_1 docker run -d -it -P --shm-size=1g \ --cap-add=SYS_PTRACE --cap-add SYS_ADMIN \ --privileged=true --name pb11_1 \ -v ${PWD}:/var/polardb \ registry.cn-hangzhou.aliyuncs.com/polardb_pg/polardb_pg_local_instance:11 bash
本节阅读导航
1、当前环境已安装并支持哪些插件
通过pg_available_extensions视图可以查看当前数据库支持哪些插件, 安装了哪些插件?
postgres=# select * from pg_available_extensions; name | default_version | installed_version | comment ------------------------------+-----------------+-------------------+--------------------------------------------------------------------------------------------------------------------- postgres_fdw | 1.0 | | foreign-data wrapper for remote PostgreSQL servers postgis | 3.3.2 | | PostGIS geometry and geography spatial types and functions polar_monitor_preload | 1.1 | | examine the polardb information jsonb_plpython3u | 1.0 | | transform between jsonb and plpython3u ltree | 1.1 | | data type for hierarchical tree-like structures tablefunc | 1.0 | | functions that manipulate whole tables, including crosstab polar_csn | 1.0 | | polar_csn plperl | 1.0 | | PL/Perl procedural language polar_worker | 1.0 | | polar_worker btree_gin | 1.3 | | support for indexing common datatypes in GIN polar_stat_sql | 1.3 | | Kernel statistics gathering, and sql plan nodes information gathering address_standardizer | 3.3.2 | | Used to parse an address into constituent elements. Generally used to support geocoding address normalization step. btree_gist | 1.5 | | support for indexing common datatypes in GiST earthdistance | 1.1 | | calculate great-circle distances on the surface of the Earth pltclu | 1.0 | | PL/TclU untrusted procedural language amcheck | 1.1 | | functions for verifying relation integrity ltree_plpython3u | 1.0 | | transform between ltree and plpython3u isn | 1.2 | | data types for international product numbering standards tcn | 1.0 | | Triggered change notifications pg_visibility | 1.2 | | examine the visibility map (VM) and page-level visibility info address_standardizer_data_us | 3.3.2 | | Address Standardizer US dataset example sslinfo | 1.2 | | information about SSL certificates polar_parameter_check | 1.0 | | kernel extension for parameter validation intagg | 1.1 | | integer aggregator and enumerator (obsolete) insert_username | 1.0 | | functions for tracking who changed a table moddatetime | 1.0 | | functions for tracking last modification time pageinspect | 1.7 | | inspect the contents of database pages at a low level pg_trgm | 1.4 | | text similarity measurement and index searching based on trigrams pg_prewarm | 1.2 | | prewarm relation data ltree_plpython2u | 1.0 | | transform between ltree and plpython2u refint | 1.0 | | functions for implementing referential integrity (obsolete) hstore_plpython3u | 1.0 | | transform between hstore and plpython3u postgis_tiger_geocoder | 3.3.2 | | PostGIS tiger geocoder and reverse geocoder adminpack | 2.0 | | administrative functions for PostgreSQL pg_freespacemap | 1.2 | | examine the free space map (FSM) fuzzystrmatch | 1.1 | | determine similarities and distance between strings polar_tde_utils | 1.0 | | Internal extension for TDE lo | 1.1 | | Large Object maintenance citext | 1.5 | | data type for case-insensitive character strings bloom | 1.0 | | bloom access method - signature file based index polar_vfs | 1.0 | | polar_vfs pgrowlocks | 1.2 | | show row-level locking information polar_monitor | 1.2 | | examine the polardb information timescaledb | 2.3.1 | | Enables scalable inserts and complex queries for time-series data pltcl | 1.0 | | PL/Tcl procedural language hstore_plperlu | 1.0 | | transform between hstore and plperlu hstore | 1.5 | | data type for storing sets of (key, value) pairs timetravel | 1.0 | | functions for implementing time travel plperlu | 1.0 | | PL/PerlU untrusted procedural language smlar | 1.0 | | compute similary of any one-dimensional arrays hstore_plperl | 1.0 | | transform between hstore and plperl jsonb_plpythonu | 1.0 | | transform between jsonb and plpythonu dict_int | 1.0 | | text search dictionary template for integers jsonb_plperlu | 1.0 | | transform between jsonb and plperlu uuid-ossp | 1.1 | | generate universally unique identifiers (UUIDs) autoinc | 1.0 | | functions for autoincrementing fields postgis_topology | 3.3.2 | | PostGIS topology spatial types and functions dict_xsyn | 1.0 | | text search dictionary template for extended synonym processing pgstattuple | 1.5 | | show tuple-level statistics tsm_system_rows | 1.0 | | TABLESAMPLE method which accepts number of rows as a limit file_fdw | 1.0 | | foreign-data wrapper for flat file access jsonb_plperl | 1.0 | | transform between jsonb and plperl vector | 0.5.1 | | vector data type and ivfflat and hnsw access methods dblink | 1.2 | | connect to other PostgreSQL databases from within a database seg | 1.3 | | data type for representing line segments or floating-point intervals pgcrypto | 1.3 | | cryptographic functions plpython3u | 1.0 | | PL/Python3U untrusted procedural language jsonb_plpython2u | 1.0 | | transform between jsonb and plpython2u plpgsql | 1.0 | 1.0 | PL/pgSQL procedural language hstore_plpythonu | 1.0 | | transform between hstore and plpythonu ltree_plpythonu | 1.0 | | transform between ltree and plpythonu unaccent | 1.1 | | text search dictionary that removes accents cube | 1.4 | | data type for multidimensional cubes polar_px | 1.0 | | Parallel Execution extension tsm_system_time | 1.0 | | TABLESAMPLE method which accepts time in milliseconds as a limit intarray | 1.2 | | functions, operators, and index support for 1-D arrays of integers postgis_raster | 3.3.2 | | PostGIS raster types and functions hstore_plpython2u | 1.0 | | transform between hstore and plpython2u pg_buffercache | 1.3 | | examine the shared buffer cache pg_stat_statements | 1.6 | | track execution statistics of all SQL statements executed xml2 | 1.1 | | XPath querying and XSLT (81 rows)
2、AI外脑插件: vector
开源项目地址: https://github.com/pgvector/pgvector
向量索引的原理、召回与参数的关系、向量索引优化详情, 请参考如下:
- 《数据库筑基课 - 向量类型》
- 《PostgreSQL 阿里云rds pg发布高维向量索引,支持图像识别、人脸识别 - pase 插件, 以及ivfflat,hnsw搜索算法说明》
- 《头大! 索引扫描和全表扫描结果不一样, 这向量数据库还能用? 教你一招大幅提升召回率(recall)》
- 《向量搜索优化3板斧: 空间、性能、召回(recall)》
vector插件顾名思义是一种支持向量数据类型的插件, 每一个向量由一组浮点数组成, 代表一个特征值, 通常用来进行基于特征相似性的检索, 例如相似图像, 相似文本, 相似音频, 相似视频等. 和大模型结合起来, 可以用于存储知识库/标准答案/优质答案的向量, 作为大模型的外脑, 在对话过程中, 从向量中提取与问题相关的文本, 发给大模型可以让大模型具备更好的解题能力. 例如
- 在私有知识库领域, 由于大模型未训练过私有知识素材, 所以大模型无法处理私有领域的问题, 有了向量数据库, 可以在问答过程中给大模型投喂相关知识, 让大模型可以具备回复未知领域知识的能力.
- 也可用来解决大模型幻觉问题, 通过提取问题的相似文本, 让大模型在限定的相似文本内容范围内进行回复, 从而解决大模型幻觉问题. 例如将史书的文本切分后录入向量数据库, 在回答问题时, 从向量数据库提取与问题相近的史书文本, 限定大模型在这些文本范围内回答问题, 从而得到与史书匹配的答案.
vector除了提供向量类型, 还支持了配套的索引, 操作符等来加速向量相似的排序操作.
用法举例.
创建插件
postgres=# create extension vector ; CREATE EXTENSION
创建向量表
-- 在实际使用过程中, 从非结构化数据转换为向量, 可以通过云厂商API, 或者使用开源工具 create table tbl_vec ( id int primary key, info text, vec vector(512) );
生成512维度模拟向量10万条
-- 生成随机向量的函数 create or replace function gen_rand_vec(int) returns vector as $$ select array_to_vector(array_agg((random()*100)::real), $1, true) from generate_series(1,$1); $$ language sql strict; insert into tbl_vec select generate_series(1,100000), md5(random()::text), gen_rand_vec(512);
实际使用中需要使用embedding模型来生成向量, 例如 https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings
curl http://localhost:11434/api/embed -d '{ "model": "mxbai-embed-large:latest", "input": "Why is the sky blue?" }' # 表结构建议 create table t_vec ( id int, model_name text, -- embedding 模型名:size content text, -- 原始内容(例如文本、视频、音频等) content_vec vector(N), -- 经过embedding模型, 将原始内容转换为对应向量. other ... ); # 建议一张表内的数据采用同一个embedding模型进行转换, 保证所有原始数据都被映射到相同的向量空间中.
算子
postgres=# \do+ List of operators Schema | Name | Left arg type | Right arg type | Result type | Function | Description --------+------+---------------+----------------+------------------+-------------------------------+------------- public | * | vector | vector | vector | vector_mul | public | + | vector | vector | vector | vector_add | public | - | vector | vector | vector | vector_sub | public | < | vector | vector | boolean | vector_lt | public | <#> | vector | vector | double precision | vector_negative_inner_product | public | <-> | vector | vector | double precision | l2_distance | public | <= | vector | vector | boolean | vector_le | public | <=> | vector | vector | double precision | cosine_distance | public | <> | vector | vector | boolean | vector_ne | public | = | vector | vector | boolean | vector_eq | public | > | vector | vector | boolean | vector_gt | public | >= | vector | vector | boolean | vector_ge | (12 rows)
搜索某个向量的cosine_distance相似向量
postgres=# select ''''||gen_rand_vec(512)::text||'''' as v1 \gset postgres=# \echo :v1 '[42.3123,49.094,36.027,97.4979,51.5076,51.8525,89.3701,98.1134,41.6966,88.2519,98.8663,38.4313,68.9565,82.9641,76.3079,32.8708,93.7287,13.7805,59.3289,89.8906,54.0369,39.8219,16.9884,94.0551,9.97198,95.7318,10.5824,25.3164,90.8695,49.5991,55.4999,33.1818,98.6931,91.5269,30.6798,50.2007,43.3794,20.0499,48.3141,85.0761,8.30179,47.1804,23.5074,77.2583,30.1445,99.8153,10.1291,23.8732,13.5958,69.458,13.7638,67.6327,9.27985,30.7522,61.6878,19.2518,26.4839,72.2703,44.5683,17.3534,21.8694,0.0681802,50.5353,20.5625,91.5951,81.215,70.7632,34.9745,1.26487,19.0773,20.0506,9.56666,66.2576,43.558,86.8249,96.4021,43.3733,96.954,20.2753,56.9691,66.412,34.0391,24.6018,75.6919,64.7913,86.2896,94.9437,91.2752,58.5599,39.512,8.62865,80.4293,39.5801,59.1639,0.991781,31.1752,40.3789,71.755,66.1498,41.6438,90.8322,86.2004,51.2104,57.0899,29.7584,38.0354,53.492,73.1317,34.9894,73.7673,30.1009,1.40146,7.80648,54.7027,77.0933,72.5978,40.9923,72.037,63.873,99.5522,11.549,72.5016,79.9815,51.1291,31.6656,80.9733,82.3044,72.0445,52.7283,48.4542,13.6883,43.5605,34.6546,64.8987,0.650352,64.413,2.9341,54.1424,37.5447,37.9235,27.9097,67.6456,39.325,35.7162,22.3482,16.4183,8.31396,63.3406,88.4553,72.187,62.8928,0.00434043,44.6886,42.8743,51.1335,76.3542,23.8476,33.4378,48.3986,76.5758,81.892,62.0869,20.1363,16.5466,26.9856,20.7867,80.9596,29.9197,74.929,18.5043,67.8432,2.83872,86.1499,7.16822,38.5549,8.49812,23.5865,46.8689,71.8387,12.0419,19.0558,34.7314,12.0462,63.7444,77.6057,63.1797,40.0986,1.45328,96.6175,88.4972,78.0291,78.5096,50.5841,98.1654,95.0561,77.5697,18.9521,76.0157,7.48943,93.8811,94.52,75.3327,96.7198,80.6698,82.5009,35.2747,89.168,6.08741,82.1436,61.0066,18.1293,1.19938,95.7381,30.1755,64.9438,73.3438,93.3552,5.04243,74.7971,89.9728,93.5397,52.8262,68.4823,44.1238,50.9916,63.5384,21.6935,69.9436,39.5541,29.183,63.8247,34.0741,4.51563,60.5445,14.7439,87.0165,95.8192,3.91189,93.1039,77.9628,64.9185,11.2332,79.1622,60.6566,41.4087,44.106,34.0004,34.7639,49.1484,8.79743,24.7367,42.6881,61.6236,93.219,86.8119,12.6152,56.7574,8.50542,82.5588,96.3116,37.6884,46.3836,30.3857,42.204,6.92808,45.1296,29.2205,2.74729,49.0415,22.3244,80.7101,13.96,33.5576,59.8722,74.6166,74.9664,3.97819,8.61693,9.73029,53.1266,17.4144,34.467,95.8147,79.038,27.686,82.6265,91.6531,84.4434,91.132,74.212,80.7549,28.8203,20.5955,11.1406,71.0244,27.5236,56.2702,0.244898,30.2709,5.31167,22.5693,10.981,19.2717,56.127,70.8532,93.8882,31.0934,74.8314,2.50517,40.8236,27.958,19.9195,75.2906,23.7726,98.9575,2.97656,6.39915,90.6106,87.4199,97.5311,64.8226,68.1749,26.3514,85.4181,79.3155,97.3758,12.9417,35.5857,97.6207,43.2126,40.8973,20.19,54.1935,60.169,76.317,25.0467,54.0572,7.41039,99.8781,56.5624,48.234,27.8361,76.4819,23.5246,51.6087,75.4394,26.5012,58.0078,66.05,13.9211,55.5389,30.8726,82.096,81.8904,16.2907,61.4114,79.2662,29.2324,96.9971,76.8869,72.445,37.8944,97.0769,26.6385,98.0634,73.3939,51.6852,52.1206,80.8043,51.5633,8.68304,29.0384,79.3994,85.165,52.563,31.0081,60.6044,79.0642,89.0159,26.6544,92.9853,44.5548,57.527,75.0813,26.4451,73.8177,36.4927,5.7113,3.05006,33.4898,82.5982,75.495,71.3843,79.6751,2.13354,69.4477,53.069,53.8188,21.5683,33.8733,5.38213,30.2514,62.9117,84.7815,15.4163,15.4747,15.7896,76.0207,94.5388,4.80547,2.67508,87.5241,49.3603,60.2021,62.6054,75.8054,34.0197,99.0982,81.5167,37.0698,32.588,64.1149,12.5648,3.97224,43.79,14.6984,73.4199,96.859,68.5172,94.9882,30.7323,73.8993,25.2396,93.644,58.6808,40.6559,9.11869,74.4704,16.6766,3.65755,79.2759,19.3517,91.1817,28.6361,79.5537,53.7871,4.44154,13.5735,52.8853,85.9583,50.6433,85.4733,50.0731,63.2081,89.4455,93.8631,77.9065,62.8654,90.7221,46.4236,57.8536,21.4544,20.3229,83.0932,15.0984,79.0037,23.7491,24.2171,53.4741,40.4257,27.8746,32.7499,59.7774,19.0563,61.3861,39.3311,72.8434,65.8276,52.9046,25.7287,51.7859,3.54789,11.202,1.85898,66.756,0.647476,95.7221,44.6624,63.5129,86.4441,91.086,21.3665,7.89851,11.4089,4.45974,22.9969,90.4126,28.2089,47.214,43.8867,68.6346,75.0886,76.6366,28.412,94.1449,38.0227,67.7431,66.9884,3.85029,20.6477,92.7171,55.6361]' postgres=# \timing on Timing is on. postgres=# select id,info, vec <=> :v1 from tbl_vec order by vec <=> :v1 limit 10; id | info | ?column? -------+----------------------------------+------------------- 33990 | 73357616b73392961610fd4056df7872 | 0.200381241291853 65206 | f7126017dee115ba292f5fc1e6fe094c | 0.202510531635712 28629 | 47a751432609ac468a021c9af3666c8b | 0.202872040131985 78789 | 8ca283c6f2719e0ecfb5ed7f33285bfd | 0.20292308001207 25061 | d6d02501fef41890b6e999bd8ef45d4e | 0.203712549333763 65012 | 21dee0e74a8f547d4a8c4e9e9cea42b2 | 0.20454229165218 82237 | ad49dd1204e19e04da0428fb28ce5c64 | 0.204661291873772 27672 | 8aa2bea9608899ab8e13866dd2387f47 | 0.20499257967393 26030 | 1c94ae0fe923bd775278aff7dd7650d0 | 0.205011457373235 5666 | 237ae0154cf6ca0e87a90fe85c20765b | 0.205843957675826 (10 rows) Time: 140.098 ms
创建向量索引
postgres=# set maintenance_work_mem ='512MB'; postgres=# CREATE INDEX ON tbl_vec USING hnsw (vec vector_cosine_ops); CREATE INDEX Time: 126018.043 ms (02:06.018)
使用索引加速搜索某个向量的cosine_distance相似向量
postgres=# select id,info, vec <=> :v1 from tbl_vec order by vec <=> :v1 limit 10; id | info | ?column? -------+----------------------------------+------------------- 28629 | 47a751432609ac468a021c9af3666c8b | 0.202872040131985 3843 | 16e453bdc6e7cce238b649d0d162c7c7 | 0.206543485852313 57218 | 0fa2b675124b3f1c3bd1fc26203c5eaf | 0.20931599952677 58681 | 562db2b3b7851727fa2b8f1ba15788fc | 0.211436702245081 84409 | d5432e2c5c010e49fbf8927f8d881365 | 0.211825938887412 67532 | 90a3511dc90684f22ef9756af142dc7a | 0.211948480104953 27195 | 6b6d2b3157eae64168b8d419da0fb32b | 0.212202785190134 49533 | 4b31ad2839436ce5bdc86358ec3bf2f2 | 0.212349440226144 62096 | d2adff2b3322fd98419e3c7cef737b05 | 0.212489664916925 47447 | 0df1dcdea170add8cd3f987e79638ae6 | 0.213359929199801 (10 rows) Time: 9.736 ms
3、营销场景目标人群圈选插件: smlar
开源项目地址: https://github.com/jirutka/smlar
smlar插件是GIN索引的一种功能扩展, 可以通过索引快速判断数组的相似性(根据数组元素的相交个数, 或者相交个数在整体集合的占比). 如果数组的每个元素代表一个特征标签, 在电商、短视频、社交等业务场景可以给商品、视频、用户贴标签, 根据标签(数组)的相似性, 圈选目标人群、圈选感兴趣的内容等.
用法举例.
创建插件
postgres=# create extension smlar ; CREATE EXTENSION
创建用户标签表
create table users ( id int primary key, nick text, tags int[] -- 假设使用int数组来表示用户拥有的标签 );
生成10万用户和随机标签
-- 生成随机标签的函数 create or replace function gen_rand_arr (tags int, tags_catacity int) returns int[] as $$ select array(select ceil(random() * tags_catacity)::int from generate_series(1, tags)); $$ language sql strict; postgres=# select gen_rand_arr(10,1000); gen_rand_arr ------------------------------------------- {280,703,893,376,153,753,458,119,141,133} (1 row) -- 假设每人10-100个标签, 标签总数1000个. insert into users select generate_series(1,100000), md5(random()::text), gen_rand_arr(9 + ceil(random()*91)::int, 1000) ;
根据某个标签圈选10个目标用户
postgres=# select show_smlar_limit(); show_smlar_limit ------------------ 0.6 (1 row) postgres=# show smlar.type; smlar.type ------------ cosine (1 row) postgres=# \do+ List of operators Schema | Name | Left arg type | Right arg type | Result type | Function | Description --------+------+---------------+----------------+-------------+----------+------------- public | % | anyarray | anyarray | boolean | smlar_op | (1 row) postgres=# select ''''||tags::text||'''' as v1 from users where id=1 \gset postgres=# \echo :v1 '{74,668,277,6,673,238,400,611,573,737,559,615,266,502,114,401,223,393,104,116,768,257,869,226,375,9,358,752,413,166,481,486,833,758,491,506,996,891,116,568,628,675,183,893,177,296,294,400,688,398,516,456,654,384,681,29,392,39,780,805,204,261,290,37,19,781,542,14,672,658,581,299,332,763,192,508,59,485,908,747,883,423,202}' postgres=# select id, smlar(tags, :v1), tags % :v1 from users where tags % :v1 limit 10; id | smlar | ?column? ----+-------+---------- 1 | 1 | t (1 row) Time: 454.579 ms -- 修改相似度阈值, 超过20% 相似的就返回 postgres=# select set_smlar_limit(0.2); set_smlar_limit ----------------- 0.2 (1 row) postgres=# select id, smlar(tags, :v1), tags % :v1 from users where tags % :v1 limit 10; id | smlar | ?column? -------+----------+---------- 1 | 1 | t 13805 | 0.226335 | t 21515 | 0.209751 | t 32683 | 0.221305 | t 74386 | 0.208514 | t 81936 | 0.211684 | t 95575 | 0.200222 | t (7 rows) Time: 480.429 ms
创建标签字段索引
create index on users using gin (tags _int4_sml_ops);
使用索引加速根据某个标签圈选10个目标用户
postgres=# select set_smlar_limit(0.6); set_smlar_limit ----------------- 0.6 (1 row) postgres=# select id, smlar(tags, :v1), tags % :v1 from users where tags % :v1 limit 10; id | smlar | ?column? ----+-------+---------- 1 | 1 | t (1 row) Time: 46.256 ms
4、地理信息搜索插件: PostGIS
开源项目地址: https://postgis.net/
PostGIS是地理信息、时空数据库, 包括空间类型、轨迹等类型, 基于这些类型的索引, 操作符等. 应用广泛, 例如地图、基于地理位置的社交、共享出行、外卖等应用.
用法举例.
创建插件
postgres=# create extension postgis; CREATE EXTENSION
创建POI表, 表示地图上的餐饮店位置
create table tbl_poi ( id int primary key, info text, pos geometry );
生成10万条记录, 经纬度圈定在一个范围内随机生成
insert into tbl_poi select generate_series(1,100000), md5(random()::text), ST_SetSRID(ST_MakePoint(120+random(), 35+random()),4326);
输入一个经纬度, 由近到远排序输出10条记录
select id,info,pos <-> ST_SetSRID(ST_MakePoint(120, 35),4326) as distance from tbl_poi order by pos <-> ST_SetSRID(ST_MakePoint(120, 35),4326) limit 10; id | info | distance -------+----------------------------------+--------------------- 87207 | 6c3be5ba80405c1bcbc41ddeabe4845e | 0.00380898314902217 2395 | bea776d9ea01fe71aa7217588ca0e340 | 0.00475504602701775 83860 | 3346ea9e673776bef89e8bfb90176e5f | 0.00476231353857793 20391 | 4b6188e5b45e96b299d2bbdf0dfccc88 | 0.0051285288529832 65722 | 75cb60337e72d23e27590ee36b6d8553 | 0.00679551058653407 70932 | e5203b78d0294740e036106f70b2e58b | 0.00869448513564811 42792 | 9aba360479ba28c8a98107af6f45cb62 | 0.00936264324089115 38434 | 3c2565401c1b4e1d010d9326c6913e07 | 0.00937400159588758 30599 | 2d5d0ce802bffb2a269312338532def2 | 0.00991881579948414 80021 | 018dbf83a82342e4fa02fa9b50301fca | 0.0110926771171926 (10 rows) Time: 62.047 ms
创建索引
create index on tbl_poi using gist (pos);
使用索引加速, 输入一个经纬度, 由近到远排序输出10条记录
id | info | distance -------+----------------------------------+--------------------- 87207 | 6c3be5ba80405c1bcbc41ddeabe4845e | 0.00380898314902217 2395 | bea776d9ea01fe71aa7217588ca0e340 | 0.00475504602701775 83860 | 3346ea9e673776bef89e8bfb90176e5f | 0.00476231353857793 20391 | 4b6188e5b45e96b299d2bbdf0dfccc88 | 0.0051285288529832 65722 | 75cb60337e72d23e27590ee36b6d8553 | 0.00679551058653407 70932 | e5203b78d0294740e036106f70b2e58b | 0.00869448513564811 42792 | 9aba360479ba28c8a98107af6f45cb62 | 0.00936264324089115 38434 | 3c2565401c1b4e1d010d9326c6913e07 | 0.00937400159588758 30599 | 2d5d0ce802bffb2a269312338532def2 | 0.00991881579948414 80021 | 018dbf83a82342e4fa02fa9b50301fca | 0.0110926771171926 (10 rows) Time: 6.205 ms
5、中文分词插件: pg_jieba
开源项目地址: https://github.com/jaiminpan/pg_jieba
中文分词, 顾名思义是将中文文本切分为有意义的词语, 用户就可以通过词语矢量进行中文匹配检索.
用法举例.
编译pg_jieba
git clone --depth=1 https://github.com/jaiminpan/pg_jieba cd pg_jieba # initilized sub-project git submodule update --init --recursive mkdir build cd build cmake -DCMAKE_PREFIX_PATH=/home/postgres/tmp_basedir_polardb_pg_1100_bld .. make make install
安装pg_jieba插件
create extension pg_jieba ;
测试中文分词
select * from to_tsquery('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。'); select * from to_tsvector('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。'); select * from ts_token_type('jieba'); select * from ts_debug('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。');
postgres=# create extension pg_jieba ; CREATE EXTENSION postgres=# select * from to_tsquery('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。'); to_tsquery ---------------------------------------------------------------------------------------------------------------------------------- '拖拉机' & '学院' & '手扶拖拉机' & '专业' & '不用' & '多久' & '会' & '升职' & '加薪' & '当上' & 'ceo' & '走上' & '人生' & '巅峰' (1 row) postgres=# select * from to_tsvector('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。'); to_tsvector -------------------------------------------------------------------------------------------------------------------------------------------- 'ceo':18 '不用':8 '专业':5 '人生':21 '会':13 '加薪':15 '升职':14 '多久':9 '学院':3 '巅峰':22 '当上':17 '手扶拖拉机':4 '拖拉机':2 '走上':20 (1 row) postgres=# select * from ts_token_type('jieba'); tokid | alias | description -------+-------+----------------------------- 1 | eng | letter 2 | nz | other proper noun 3 | n | noun 4 | m | numeral 5 | i | idiom 6 | l | temporary idiom 7 | d | adverb 8 | s | space 9 | t | time 10 | mq | numeral-classifier compound 11 | nr | person's name 12 | j | abbreviate 13 | a | adjective 14 | r | pronoun 15 | b | difference 16 | f | direction noun 17 | nrt | nrt 18 | v | verb 19 | z | z 20 | ns | location 21 | q | quantity 22 | vn | vn 23 | c | conjunction 24 | nt | organization 25 | u | auxiliary 26 | o | onomatopoeia 27 | zg | zg 28 | nrfg | nrfg 29 | df | df 30 | p | prepositional 31 | g | morpheme 32 | y | modal verbs 33 | ad | ad 34 | vg | vg 35 | ng | ng 36 | x | unknown 37 | ul | ul 38 | k | k 39 | ag | ag 40 | dg | dg 41 | rr | rr 42 | rg | rg 43 | an | an 44 | vq | vq 45 | e | exclamation 46 | uv | uv 47 | tg | tg 48 | mg | mg 49 | ud | ud 50 | vi | vi 51 | vd | vd 52 | uj | uj 53 | uz | uz 54 | h | h 55 | ug | ug 56 | rz | rz (56 rows) postgres=# select * from ts_debug('jiebacfg', '是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。'); alias | description | token | dictionaries | dictionary | lexemes -------+---------------+------------+--------------+------------+-------------- v | verb | 是 | {jieba_stem} | jieba_stem | {} n | noun | 拖拉机 | {jieba_stem} | jieba_stem | {拖拉机} n | noun | 学院 | {jieba_stem} | jieba_stem | {学院} n | noun | 手扶拖拉机 | {jieba_stem} | jieba_stem | {手扶拖拉机} n | noun | 专业 | {jieba_stem} | jieba_stem | {专业} uj | uj | 的 | {jieba_stem} | jieba_stem | {} x | unknown | 。 | {jieba_stem} | jieba_stem | {} v | verb | 不用 | {jieba_stem} | jieba_stem | {不用} m | numeral | 多久 | {jieba_stem} | jieba_stem | {多久} x | unknown | , | {jieba_stem} | jieba_stem | {} r | pronoun | 我 | {jieba_stem} | jieba_stem | {} d | adverb | 就 | {jieba_stem} | jieba_stem | {} v | verb | 会 | {jieba_stem} | jieba_stem | {会} v | verb | 升职 | {jieba_stem} | jieba_stem | {升职} nr | person's name | 加薪 | {jieba_stem} | jieba_stem | {加薪} x | unknown | , | {jieba_stem} | jieba_stem | {} t | time | 当上 | {jieba_stem} | jieba_stem | {当上} eng | letter | CEO | {jieba_stem} | jieba_stem | {ceo} x | unknown | , | {jieba_stem} | jieba_stem | {} v | verb | 走上 | {jieba_stem} | jieba_stem | {走上} n | noun | 人生 | {jieba_stem} | jieba_stem | {人生} n | noun | 巅峰 | {jieba_stem} | jieba_stem | {巅峰} x | unknown | 。 | {jieba_stem} | jieba_stem | {} (23 rows)
生成10万条随机分词数据
create or replace function gen_rand_ts(tslen int) returns tsvector as $$ select array_to_tsvector(array_agg(substring(md5(random()::text),1,8))) from generate_series(1,$1); $$ language sql strict; create table test (id int, vec tsvector); insert into test select generate_series(1,100000), gen_rand_ts(24);
创建索引
create index on test using gin (vec);
查询测试
postgres=# select * from test limit 10; id | vec ----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------- 1 | '0eed2ca6' '149ae774' '260d6cae' '2bde3230' '38ce089c' '3fdfb67c' '40bf233a' '41825567' '52de4ebb' '5708b49d' '63bdd9ea' '650f2dbf' '6d35d142' '7c711c0d' '7e4e028f' '913802bf' 'a8a14013' 'b6aa8ba4 ' 'd9f595e6' 'dc054607' 'dee1a2f7' 'e5b6d7e8' 'eacb6356' 'eee81aaf' 2 | '00bf8fbc' '117b7b5c' '1e4c8295' '2d379ff7' '2e263dcb' '48967fe5' '4f20db40' '5f7aefcd' '616cbb8e' '81d4e152' '876b2318' '8c18f4c3' '8e732b6f' '94f6b13b' '9c53cb8e' 'aedca11c' 'b56c7ed4' 'c5008853 ' 'cc407ea8' 'd4f3d5a1' 'd63ca731' 'd87514ec' 'f9626af4' 'fa5b7458' 3 | '0e3b6147' '13674c4d' '16463e9b' '32894aca' '3a15d964' '453c9a26' '54664d82' '5cb0e40d' '62c8ca30' '6d0ebc3a' '6ee0a517' '71ccfeb5' '7e75a9d5' '7f61f401' '87b5f2cb' '8f1c6274' '976dff7f' '9b7a6758 ' 'af9c624e' 'e5422d57' 'ed7bb9d4' 'edc039a2' 'efe1e5fa' 'f9db8132' 4 | '118bf21c' '2087d303' '2579c220' '3733357a' '503b50ec' '56104ea2' '573b9ea9' '58a665af' '59250bad' '86abf8a9' '8a3b5a72' '8d8bb478' 'a16b8bd8' 'ac966a06' 'af4eabd8' 'b09ccbb5' 'b2d7aac4' 'b5134f1b ' 'b5228857' 'b6836add' 'bcafbce0' 'd1ca5a3a' 'e8588e37' 'f6ffe6b0' 5 | '01876ad5' '07a8a579' '0a33ce9e' '0b5bbdd4' '10b00efe' '118fae91' '1c12acee' '2d74f4eb' '2d99481c' '41483d1c' '6864b85e' '7ba1937f' '8a6ccb01' '9c1ae58b' 'a251fd3d' 'a936eecd' 'b560d231' 'baa6927f ' 'd78f04c6' 'dabff656' 'e5d975c0' 'f0598071' 'f819b029' 'fb202c1a' 6 | '1c6eea85' '23f37dd9' '28151030' '319fa87f' '447ddc9d' '45dcc30a' '5269c7c2' '77184ff9' '792793c2' '81f63a78' '87b67199' '8ddc346f' '9dbc6f02' 'a4130ee7' 'a4b21300' 'a8ae9afe' 'ae54596a' 'b01e580a ' 'c17caa99' 'c7784bd5' 'd27a19ce' 'df21c10f' 'e383a9d0' 'fde1f572' 7 | '1c6e6d6e' '209c45cf' '23415a93' '292ba393' '3d64d313' '49cf134a' '4a1a1f0d' '4c7e54a7' '4e74180a' '5054e77e' '5882f01f' '59c25e04' '69eb2f87' '6f2ed6bb' '7c830771' '81c415f5' '975f413a' 'a3dc8375 ' 'a5a38d13' 'b1f83c28' 'bb62f740' 'c8bab4d1' 'd947163c' 'f3a81f80' 8 | '0800c7b2' '0ffbe32e' '19f84945' '1c001bd3' '1f3f5826' '2e13cca1' '36ca5372' '3abc8149' '516878e9' '534357fc' '67cb7af9' '69a7849d' '8c134ad3' '8d87ed42' '96069ef5' '98bfcdbe' 'b4b0ffa1' 'bc61912a ' 'ddf1d8e6' 'e07722ea' 'e68ffbbf' 'f0751b01' 'f12cb4b9' 'fe0a7c4c' 9 | '14588466' '1b16dfff' '25339aa7' '4874dc00' '4c6bb5bf' '510c8f7b' '59cbfb21' '70372c94' '7db5e3c2' '85f68385' '8b0e7746' '9596e2d0' '997ca4d3' '9f4df7dc' 'b1726109' 'c42ae6e4' 'dc759b2d' 'e378d2d5 ' 'e956bc2b' 'ea5c6ed2' 'f0e58f77' 'f24f74b1' 'fa6df884' 'fa8edffb' 10 | '0188c7ac' '09a75236' '15fb2eee' '1dc80e6e' '2f543594' '3559f46a' '4369adcd' '477410ed' '5df678d0' '799bc453' '80ad7901' '81871ec1' '92faa899' '94c9cf0f' '971d699f' 'a002241a' 'a1636465' 'aee34bbb ' 'b08f2f0c' 'b697c161' 'b8f290b9' 'd0acf8b4' 'd3beb05b' 'f8ca2a66' (10 rows) postgres=# explain select * from test where vec @@ '52de4ebb & 41825567'::tsquery ; QUERY PLAN ---------------------------------------------------------------------------- Bitmap Heap Scan on test (cost=36.02..43.91 rows=2 width=300) Recheck Cond: (vec @@ '''52de4ebb'' & ''41825567'''::tsquery) -> Bitmap Index Scan on test_vec_idx (cost=0.00..36.02 rows=2 width=0) Index Cond: (vec @@ '''52de4ebb'' & ''41825567'''::tsquery) (4 rows) postgres=# select * from test where vec @@ '52de4ebb & 41825567'::tsquery ; id | vec ----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------- 1 | '0eed2ca6' '149ae774' '260d6cae' '2bde3230' '38ce089c' '3fdfb67c' '40bf233a' '41825567' '52de4ebb' '5708b49d' '63bdd9ea' '650f2dbf' '6d35d142' '7c711c0d' '7e4e028f' '913802bf' 'a8a14013' 'b6aa8ba4 ' 'd9f595e6' 'dc054607' 'dee1a2f7' 'e5b6d7e8' 'eacb6356' 'eee81aaf' (1 row) Time: 0.768 ms
不使用索引测试
postgres=# set enable_bitmapscan =off; SET Time: 0.887 ms postgres=# explain select * from test where vec @@ '52de4ebb & 41825567'::tsquery ; QUERY PLAN ----------------------------------------------------------- Seq Scan on test (cost=0.00..5417.00 rows=2 width=300) Filter: (vec @@ '''52de4ebb'' & ''41825567'''::tsquery) (2 rows) Time: 1.136 ms postgres=# select * from test where vec @@ '52de4ebb & 41825567'::tsquery ; id | vec ----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------- 1 | '0eed2ca6' '149ae774' '260d6cae' '2bde3230' '38ce089c' '3fdfb67c' '40bf233a' '41825567' '52de4ebb' '5708b49d' '63bdd9ea' '650f2dbf' '6d35d142' '7c711c0d' '7e4e028f' '913802bf' 'a8a14013' 'b6aa8ba4 ' 'd9f595e6' 'dc054607' 'dee1a2f7' 'e5b6d7e8' 'eacb6356' 'eee81aaf' (1 row) Time: 51.815 ms
10万条文本向量, 搜索命中1条. 性能参考:
- GIN索引 0.768 ms VS 全表扫描 49.055 ms
自定义词典? 参考 https://github.com/jaiminpan/pg_jieba
## USER DEFINED DICTIONARY #### Dictionary Format + Words weight type + Words type + Words 云计算 韩玉鉴赏 蓝翔 nz 区块链 10 nz Reference jieba_user.dict #### How to use your own dictionary cd /PATH/TO/POSTGRESQL_INSTALL/share/postgresql/tsearch_data OR cd /PATH/TO/POSTGRESQL_INSTALL/share/tsearch_data cp 'YOUR DICTIONARY' jieba_user.dict #### Dictionary Sharing + [dict.367W.utf8] iLife
更多用法请参考 《配置 jieba结巴分词 for PolarDB 实现数据库高性能文本分词搜索》
6、融合计算插件: duckdb_fdw
开源项目地址: https://github.com/alitrack/duckdb_fdw
duckdb是一款性能非常好的in-process OLAP数据库, 同时支持数据湖架构, 支持访问s3/oss/http等远端数据, 支持列存储/parquet等常见的存储结构. 通过duckdb_fdw, PolarDB也可以具备以上能力.
用法举例.
下载duckdb cli 根据你的环境选一种下载 (aarch64架构)
wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-aarch64.zip unzip duckdb_cli-linux-aarch64.zip
下载duckdb cli 根据你的环境选一种下载 (x86_64架构)
wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-amd64.zip unzip duckdb_cli-linux-amd64.zip
生成parquet数据文件
./duckdb /var/polardb/db COPY (select generate_series as id, md5(random()::text) as info, now()::timestamp+(generate_series||' second')::interval as crt_time from generate_series(1,100)) TO '/var/polardb/t1.parquet' (FORMAT 'PARQUET'); COPY (select generate_series as cid, md5(random()::text) as info, now()::timestamp+(generate_series||' second')::interval as crt_time from generate_series(1,100)) TO '/var/polardb/t2.parquet' (FORMAT 'PARQUET'); COPY (select (floor(random()*100)+1)::int as gid, (floor(random()*100)+1)::int as cid, (random()*10)::int as c1, (random()*100)::int as c2, (random()*1000)::int as c3, (random()*10000)::int as c4, (random()*100000)::int as c5 from generate_series(1,1000000)) TO '/var/polardb/t3.parquet' (FORMAT 'PARQUET'); create view t1 as select * from read_parquet('/var/polardb/t1.parquet'); create view t2 as select * from read_parquet('/var/polardb/t2.parquet'); create view t3 as select * from read_parquet('/var/polardb/t3.parquet'); checkpoint; D DESCRIBE t1 ; ┌─────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │ ├─────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤ │ id │ BIGINT │ YES │ │ │ │ │ info │ VARCHAR │ YES │ │ │ │ │ crt_time │ TIMESTAMP │ YES │ │ │ │ └─────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘ D DESCRIBE t2 ; ┌─────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │ ├─────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤ │ cid │ BIGINT │ YES │ │ │ │ │ info │ VARCHAR │ YES │ │ │ │ │ crt_time │ TIMESTAMP │ YES │ │ │ │ └─────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘ D DESCRIBE t3 ; ┌─────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │ ├─────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤ │ gid │ INTEGER │ YES │ │ │ │ │ cid │ INTEGER │ YES │ │ │ │ │ c1 │ INTEGER │ YES │ │ │ │ │ c2 │ INTEGER │ YES │ │ │ │ │ c3 │ INTEGER │ YES │ │ │ │ │ c4 │ INTEGER │ YES │ │ │ │ │ c5 │ INTEGER │ YES │ │ │ │ └─────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘ .quit
编译duckdb_fdw v1.1.3插件
# 注释的不需要执行. 如果是以前的duckdb_fdw版本 可能需要按注释的命令来操作!!!! cd /var/polardb # git clone --depth 1 -b duckdb1.1.0 https://github.com/digoal/duckdb_fdw # cd duckdb_fdw wget https://github.com/alitrack/duckdb_fdw/archive/refs/tags/v1.1.3.tar.gz tar -zxvf v1.1.3.tar.gz cd duckdb_fdw-1.1.3 mkdir libduckdb cd libduckdb wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/libduckdb-src.zip sudo apt-get install -y unzip unzip libduckdb-src.zip unalias cp cp -f duckdb.h ../ cp -f duckdb.hpp ../ # 需要给容器多分配一点内存/swap, 否则容器OOM # 注意 clang++ -c 编译项 需要与 duckdb_fdw src 的 Makefile PG_CXXFLAGS 一致. # 区分linux / macos , 区分 arm(aarch64) / x86架构, 请仔细判断. # macos 编译参数 -std=c++11 # aarch64 编译参数 -std=c++11 # x86 编译参数 -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 clang++ -c -fPIC -std=c++11 duckdb.cpp -o duckdb.o clang++ -shared -o libduckdb.so *.o # cp -f libduckdb.so $(pg_config --libdir) # cp -f libduckdb.so ../ cp -f libduckdb.so ../libduckdb.1.1.3.so cd ../ # USE_PGXS=1 make uninstall # USE_PGXS=1 make clean # USE_PGXS=1 make distclean # USE_PGXS=1 make install DUCKDB_VERSION=1.1.3 USE_PGXS=1 make uninstall DUCKDB_VERSION=1.1.3 USE_PGXS=1 make clean DUCKDB_VERSION=1.1.3 USE_PGXS=1 make distclean DUCKDB_VERSION=1.1.3 USE_PGXS=1 make install
加载duckdb_fdw插件
create extension duckdb_fdw version "1.1.3";
创建foreign server
CREATE SERVER DuckDB_server FOREIGN DATA WRAPPER duckdb_fdw OPTIONS (database '/var/polardb/db');
导入foreign table
-- 一次性导入所有duckdb的表到foreign table IMPORT FOREIGN SCHEMA public FROM SERVER DuckDB_server INTO public; -- 或 挨个创建foreign table create foreign table t1 (id int8, info text, crt_time timestamp) server duckdb_server OPTIONS (table 't1'); create foreign table t2 (cid int8, info text, crt_time timestamp) server duckdb_server OPTIONS (table 't2'); create foreign table t3 (gid int, cid int, c1 int, c2 int, c3 int, c4 int, c5 int) server duckdb_server OPTIONS (table 't3'); postgres=# \det List of foreign tables Schema | Table | Server --------+-------+--------------- public | t1 | duckdb_server public | t2 | duckdb_server public | t3 | duckdb_server (3 rows)
为duckdb_fdw安装parquet插件
postgres=# SELECT duckdb_execute('duckdb_server', 'install parquet'); duckdb_execute ---------------- (1 row) postgres=# SELECT duckdb_execute('duckdb_server', 'load parquet'); duckdb_execute ---------------- (1 row)
查询parquet数据性能
postgres=# explain verbose select count(distinct gid) from t3; QUERY PLAN ------------------------------------------------------------- Foreign Scan (cost=1.00..1.00 rows=1 width=8) Output: (count(DISTINCT gid)) SQLite query: SELECT count(DISTINCT "gid") FROM main."t3" (3 rows) postgres=# \timing Timing is on. select count(distinct gid) from t3; count ------- 100 (1 row) Time: 13.556 ms select count(distinct gid),count(*) from t3; count | count -------+--------- 100 | 1000000 (1 row) Time: 21.836 ms explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Foreign Scan (cost=1.00..1.00 rows=1 width=16) Output: (count(DISTINCT t3.gid)), (count(*)) SQLite query: SELECT count(DISTINCT r1."gid"), count(*) FROM (main."t3" r1 INNER JOIN main."t1" r2 ON (((r1."gid" = r2."id")))) (3 rows) select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id; count | count -------+--------- 100 | 1000000 (1 row) Time: 15.536 ms select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id join t2 on t3.cid=t2.cid; count | count -------+--------- 100 | 1000000 (1 row) Time: 18.570 ms explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.id join t2 on t3.cid=t2.cid; Foreign Scan (cost=1.00..1.00 rows=1 width=16) Output: (count(DISTINCT t3.gid)), (count(*)) SQLite query: SELECT count(DISTINCT r1."gid"), count(*) FROM ((main."t3" r1 INNER JOIN main."t1" r2 ON (((r1."gid" = r2."id")))) INNER JOIN main."t2" r4 ON (((r1."cid" = r4."cid")))) (3 rows)
将数据导入PolarDB本地, 做同样的查询看一下时间.
postgres=# create table lt1 as select * from t1; SELECT 100 postgres=# create table lt2 as select * from t2; SELECT 100 postgres=# create table lt3 as select * from t3; SELECT 10000000 postgres=# \timing Timing is on. postgres=# select count(distinct gid) from lt3; count ------- 100 (1 row) Time: 1142.800 ms (00:01.143)
duckdb_fdw+Parquet 比PolarDB本地行存储表快了近100倍.
更多用法请参考
- 《DuckDB 对比 PolarDB for PostgreSQL 全文检索功能》
- 《PolarDB-PG | PostgreSQL + duckdb_fdw + 阿里云OSS 实现高效低价的海量数据冷热存储分离》
- 《PolarDB 开源版通过 duckdb_fdw 支持 parquet 列存数据文件以及高效OLAP》
如果遇到问题可以提issue给duckdb_fdw或PolarDB开源项目
7、读写分离工具: pgpool-II
开源项目地址: https://www.pgpool.net/
pgpool-II 是一款开源的支持读写分离的连接池, 通过pgpool-II, 应用程序访问PolarDB 时, 可以根据SQL的特征自动路由到RW, RO节点, 实现对应用透明的读写分离功能.
用法举例.
部署pgpool-II 软件
cd /var/polardb wget https://www.pgpool.net/mediawiki/download.php?f=pgpool-II-4.5.5.tar.gz -O pgpool-II-4.5.5.tar.gz tar -zxvf pgpool-II-4.5.5.tar.gz cd pgpool-II-4.5.5 ./configure --prefix=/usr/local/pgpool4.5.5 --with-openssl make -j 4 sudo make install
配置动态库和默认路径
sudo vi /etc/ld.so.conf # addd /usr/local/pgpool4.5.5/lib # 更新动态链接库 sudo ldconfig # 设置shell环境变量初始化文件 echo "export PATH=/usr/local/pgpool4.5.5/bin:\$PATH" >> ~/.bashrc # 设置当前环境变量 . ~/.bashrc
配置pgpool-II
polardb 3节点配置如下, 注意rw节点和2个ro节点的port, 这里5433是rw:
[postgres@1373488a35ab ~]$ psql -p 5432 -c "select pg_is_in_recovery();" pg_is_in_recovery ------------------- t (1 row) [postgres@1373488a35ab ~]$ psql -p 5433 -c "select pg_is_in_recovery();" pg_is_in_recovery ------------------- f (1 row) [postgres@1373488a35ab ~]$ psql -p 5434 -c "select pg_is_in_recovery();" pg_is_in_recovery ------------------- t (1 row)
polardb与aurora类似, 共享存储集群模式, 无需pgpool来管理HA.
配置pgpool.conf
cd /usr/local/pgpool4.5.5/etc sudo vi pgpool.conf # 注意rw节点和ro节点的port和前面查到的要对应上, 不要配错了. listen_addresses = '0.0.0.0' port = 9999 unix_socket_directories = '/tmp' pcp_listen_addresses = 'localhost' pcp_port = 9898 pcp_socket_dir = '/tmp' log_destination = 'stderr' logging_collector = on log_directory = '/tmp/pgpool_logs' pid_file_name = '/var/run/pgpool/pgpool.pid' logdir = '/tmp' backend_clustering_mode = 'streaming_replication' load_balance_mode = on sr_check_period = 0 health_check_period = 0 failover_on_backend_shutdown=off failover_on_backend_error=off enable_pool_hba = on backend_hostname0 = '127.0.0.1' backend_port0 = '5433' backend_weight0 = 1 backend_application_name0 = 'polardb_primray' backend_flag0 = 'ALWAYS_PRIMARY|DISALLOW_TO_FAILOVER' backend_hostname1 = '127.0.0.1' backend_port1 = '5432' backend_weight1 = 2 backend_application_name1 = 'polardb_reader1' backend_flag1 = 'DISALLOW_TO_FAILOVER' backend_hostname2 = '127.0.0.1' backend_port2 = '5434' backend_weight2 = 2 backend_application_name2 = 'polardb_reader2' backend_flag2 = 'DISALLOW_TO_FAILOVER'
认证方法password,md5,scram请参考: 《PolarDB 100 问 | 密码明明正确怎么报密码错误?》
配置pool_hba.conf
sudo vi pool_hba.conf # add host all all 0.0.0.0/0 md5
配置pgpool数据库用户密码文件pool_passwd
[postgres@1373488a35ab etc]$ sudo $(which pg_md5) --md5auth --username=digoal pwd123 [postgres@1373488a35ab etc]$ cat /usr/local/pgpool4.5.5/etc/pool_passwd digoal:md531a770cec82aa37e217bb6e46c3f9d55 -- 实际上就是pwd+username的md5值 postgres=# select md5('pwd123digoal'); md5 ---------------------------------- 31a770cec82aa37e217bb6e46c3f9d55 (1 row)
在数据库中创建相应pgpool-II的映射用户
psql -p 5433 postgres=# create user digoal superuser encrypted password 'pwd123' login; CREATE ROLE
配置pcp管理用户密码文件pcp.conf
postgres=# select md5('pwd123'); md5 ---------------------------------- 45cb41b32dcfb917ccd8614f1536d6da (1 row) cd /usr/local/pgpool4.5.5/etc sudo vi pcp.conf pcpadm:45cb41b32dcfb917ccd8614f1536d6da
准备pgpool运行时pid文件目录和日志目录
sudo mkdir /var/run/pgpool sudo mkdir /tmp/pgpool_logs
启动pgpool
sudo $(which pgpool)
使用pcp管理命令查看pgpool中间件状态
pcp_node_info -U pcpadm -p 9898 -W Password: 输入 pwd123 127.0.0.1 5433 1 0.200000 waiting unknown primary unknown 0 none none 2024-01-26 19:40:20 127.0.0.1 5432 1 0.400000 waiting unknown standby unknown 0 none none 2024-01-26 19:40:20 127.0.0.1 5434 1 0.400000 waiting unknown standby unknown 0 none none 2024-01-26 19:40:20
[postgres@1373488a35ab etc]$ pcp_node_count -U pcpadm -p 9898 -W Password: 输入 pwd123 3
pcp_pool_status -U pcpadm -h localhost -p 9898 -W Password: 输入 pwd123 ... name : backend_data_directory2 value: desc : data directory for backend #2 name : backend_status2 value: waiting desc : status of backend #2 name : standby_delay2 value: 0 desc : standby delay of backend #2 name : backend_flag2 value: DISALLOW_TO_FAILOVER desc : backend #2 flag name : backend_application_name2 value: polardb_reader2 desc : application_name for backend #2 ...
使用pgpool代理链接polardb
export PGPASSWORD=pwd123 export PGDATABASE=postgres psql -p 9999 -U digoal -c "select * from pg_stat_activity where pid=pg_backend_pid();"
测试pgpool读写分离
pgbench -i -s 1 -h 127.0.0.1 -p 9999 -U digoal postgres
pgbench -n -r -P 1 -c 8 -j 8 -T 120 -S -h 127.0.0.1 -p 9999 -U digoal postgres
通过另一会话观察压测中的连接:
[postgres@1373488a35ab ~]$ psql -p 5432 -c "select count(*) from pg_stat_activity where application_name='pgbench';" count ------- 8 (1 row) [postgres@1373488a35ab ~]$ psql -p 5433 -c "select count(*) from pg_stat_activity where application_name='pgbench';" count ------- 8 (1 row) [postgres@1373488a35ab ~]$ psql -p 5434 -c "select count(*) from pg_stat_activity where application_name='pgbench';" count ------- 8 (1 row)
更多信息请参考 《PolarDB 开源版 使用pgpool-II实现透明读写分离》
更详细的用法请参考各个插件开源官网.
更多开源插件参考digoal github: PolarDB 开源生态插件心选 - 这些插件让业务战斗力提升100倍!!!