PolarDB 开源基础教程系列 7.3 应用实践之 精准营销场景

本文涉及的产品
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
云原生数据库 PolarDB MySQL 版,通用型 2核8GB 50GB
简介: 本文介绍了基于用户画像的精准营销技术,重点探讨了如何通过标签组合快速圈选目标人群。实验分为三部分:1. **传统方法**:使用字符串存储标签并进行模糊查询,但性能较差,每次请求都需要扫描全表。2. **实验1**:引入`pg_trgm`插件和GIN索引,显著提升了单个模糊查询条件的性能。3. **实验2**:改用数组类型存储标签,并结合GIN索引加速包含查询,性能进一步提升。4. **实验3**:利用`smlar`插件实现近似度过滤,支持按标签重合数量或比例筛选。

3、营销场景, 根据用户画像的相似度进行目标人群圈选, 实现精准营销

b站视频链接

Youtube视频链接


3、营销场景, 根据用户画像的相似度进行目标人群圈选, 实现精准营销

在营销场景中, 通常会对用户的属性、行为等数据进行统计分析, 生成用户的标签, 也就是常说的用户画像.

标签举例: 男性、女性、年轻人、大学生、90后、司机、白领、健身达人、博士、技术达人、科技产品爱好者、2胎妈妈、老师、浙江省、15天内逛过手机电商店铺、... ...

有了用户画像, 在营销场景中一个重要的营销手段是根据条件选中目标人群, 进行精准营销.

例如圈选出包含这些标签的人群: 白领、科技产品爱好者、浙江省、技术达人、15天内逛过手机电商店铺 .

这个实验的目的是在有画像的基础上, 如何快速根据标签组合进行人群圈选 .

一、准备数据

设计1张标签元数据表, 后面的用户画像表从这张标签表随机抽取标签. 业务查询时也从这里搜索存在的标签并进行圈选条件的组合, 得到对应的标签ID组合.

drop table if exists tbl_tag;    
    
create table tbl_tag (    
  tid int primary key,  -- 标签id    
  tag text,  -- 标签名    
  info text  -- 标签描述    
);

假设有1万个标签, 写入标签元数据表.

insert into tbl_tag select id, md5(id::text), md5(random()::text) from generate_series(1, 10000) id;

创建2个函数, 产生若干的标签. 用来模拟产生每个用户对应的标签数据. 分别返回字符串和数组类型.

第一个函数, 随机提取若干个标签, 始终包含1-100的热门标签8个, 返回用户标签字符串:

create or replace function get_tags_text(int) returns text as $$    
  with a as (select string_agg(tid::text, ',') s from tbl_tag where tid = any (array(select ceil(random()*100)::int from generate_series(1,8) group by 1)))    
  , b as (select string_agg(tid::text, ',') s from tbl_tag where tid = any (array(select ceil(100+random()*9900)::int from generate_series(1,$1) group by 1)))    
  select ','||a.s||','||b.s||',' from a,b;    
$$ language sql strict;

得到类似这样的结果:

postgres=# select get_tags_text(10);    
                            get_tags_text    
----------------------------------------------------------------------    
 ,11,12,39,44,45,59,272,1001,1322,1402,2514,6888,7404,8922,9200,9409,    
(1 row)    
    
postgres=# select get_tags_text(10);    
                             get_tags_text    
------------------------------------------------------------------------    
 ,12,34,52,55,71,79,88,302,582,1847,3056,5156,8231,8542,8572,8747,9727,    
(1 row)

第二个函数, 随机提取若干个标签, 始终包含1-100的热门标签8个, 返回用户标签数组:

create or replace function get_tags_arr(int) returns int[] as $$    
  with a as (select array_agg(tid) s from tbl_tag where tid = any (array(select ceil(random()*100)::int from generate_series(1,8) group by 1)))    
  , b as (select array_agg(tid) s from tbl_tag where tid = any (array(select ceil(100+random()*9900)::int from generate_series(1,$1) group by 1)))    
  select a.s||b.s from a,b;    
$$ language sql strict;

得到类似这样的结果:

postgres=# select * from get_tags_arr(10);    
                                get_tags_arr    
----------------------------------------------------------------------------    
 {13,35,42,61,67,69,76,78,396,2696,3906,4356,5064,5711,7363,9417,9444,9892}    
(1 row)    
    
postgres=# select * from get_tags_arr(10);    
                              get_tags_arr    
-------------------------------------------------------------------------    
 {2,10,20,80,84,85,89,3410,3515,4159,4182,5217,6549,6775,7289,9141,9431}    
(1 row)

二、传统方法设计和实验

传统数据库没有数组类型, 所以需要用字符串存储标签.

创建用户画像表

drop table if exists tbl_users;    
    
create unlogged table tbl_users (  -- 为便于加速生成测试数据, 使用unlogged table    
  uid int primary key,   -- 用户id    
  tags text   -- 该用户拥有的标签 , 使用字符串类型    
);

创建100万个用户, 用户被贴的标签数从32到256个, 随机产生, 其中8个为热门标签(例如性别、年龄段等都属于热门标签).

insert into tbl_users select id, get_tags_text(ceil(24+random()*224)::int) from generate_series(1,1000000) id;

测试如下, 分别搜索包含如下标签组合的用户:

  • 2
  • 2,8
  • 2,2696
  • 2,4356,5064,5711,7363,9417,9444
  • 4356,5064,5711,7363,9417,9444

使用如下SQL:

select uid from tbl_users where tags like '%,2,%';    
    
select uid from tbl_users where tags like '%,2,%' or tags like '%,8,%';    
    
select uid from tbl_users where tags like '%,2,%' or tags like '%,2696,%';    
    
select uid from tbl_users where tags like '%,2,%' or tags like '%,4356,%' or tags like '%,5064,%' or tags like '%,5711,%' or tags like '%,7363,%' or tags like '%,9417,%' or tags like '%,9444,%' ;    
    
select uid from tbl_users where tags like '%,4356,%' or tags like '%,5064,%' or tags like '%,5711,%' or tags like '%,7363,%' or tags like '%,9417,%' or tags like '%,9444,%' ;

查看以上SQL运行的执行计划和耗时如下:

postgres=# explain analyze select uid from tbl_users where tags like '%,2,%';    
                                                    QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------    
 Seq Scan on tbl_users  (cost=0.00..103268.00 rows=80808 width=4) (actual time=0.018..1108.805 rows=77454 loops=1)    
   Filter: (tags ~~ '%,2,%'::text)    
   Rows Removed by Filter: 922546    
 Planning Time: 1.095 ms    
 Execution Time: 1110.267 ms    
(5 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags like '%,2,%' or tags like '%,8,%';    
                                                     QUERY PLAN    
---------------------------------------------------------------------------------------------------------------------    
 Seq Scan on tbl_users  (cost=0.00..105768.00 rows=127232 width=4) (actual time=0.029..2001.379 rows=149132 loops=1)    
   Filter: ((tags ~~ '%,2,%'::text) OR (tags ~~ '%,8,%'::text))    
   Rows Removed by Filter: 850868    
 Planning Time: 1.209 ms    
 Execution Time: 2004.062 ms    
(5 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags like '%,2,%' or tags like '%,2696,%';    
                                                    QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------    
 Seq Scan on tbl_users  (cost=0.00..105768.00 rows=90093 width=4) (actual time=0.035..2058.797 rows=90084 loops=1)    
   Filter: ((tags ~~ '%,2,%'::text) OR (tags ~~ '%,2696,%'::text))    
   Rows Removed by Filter: 909916    
 Planning Time: 1.190 ms    
 Execution Time: 2060.434 ms    
(5 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags like '%,2,%' or tags like '%,4356,%' or tags like '%,5064,%' or tags like '%,5711,%' or tags like '%,7363,%' or tags like '%,9417,%' or tags like '%,9444,%' ;    
                                                                                                       QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
 Seq Scan on tbl_users  (cost=0.00..118268.00 rows=135482 width=4) (actual time=0.024..6765.315 rows=150218 loops=1)    
   Filter: ((tags ~~ '%,2,%'::text) OR (tags ~~ '%,4356,%'::text) OR (tags ~~ '%,5064,%'::text) OR (tags ~~ '%,5711,%'::text) OR (tags ~~ '%,7363,%'::text) OR (tags ~~ '%,9417,%'::text) OR (tags ~~ '%,9444,%'::text))    
   Rows Removed by Filter: 849782    
 Planning Time: 4.344 ms    
 Execution Time: 6767.990 ms    
(5 rows)    
    
postgres=#  explain analyze select uid from tbl_users where tags like '%,4356,%' or tags like '%,5064,%' or tags like '%,5711,%' or tags like '%,7363,%' or tags like '%,9417,%' or tags like '%,9444,%' ;    
                                                                                          QUERY PLAN    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
 Seq Scan on tbl_users  (cost=0.00..115768.00 rows=59480 width=4) (actual time=0.112..6206.775 rows=78827 loops=1)    
   Filter: ((tags ~~ '%,4356,%'::text) OR (tags ~~ '%,5064,%'::text) OR (tags ~~ '%,5711,%'::text) OR (tags ~~ '%,7363,%'::text) OR (tags ~~ '%,9417,%'::text) OR (tags ~~ '%,9444,%'::text))    
   Rows Removed by Filter: 921173    
 Planning Time: 4.223 ms    
 Execution Time: 6208.191 ms    
(5 rows)

三、使用PolarDB|PostgreSQL 特性设计和实验1

传统方法没有用到任何的索引, 每次请求都要扫描用户画像表的所有记录, 计算每一个LIKE的算子, 性能比较差.

为了提升查询性能, 我们可以使用gin索引和pg_trgm插件, 支持字符串内的模糊查询索引加速.

复用传统方法的数据, 创建gin索引, 支持索引加速模糊查询.

create extension pg_trgm;    
    
create index on tbl_users using gin (tags gin_trgm_ops);

使用索引后, 查看执行计划和耗时如下:

postgres=# explain analyze select uid from tbl_users where tags like '%,2,%';    
                                                              QUERY PLAN    
---------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=555.93..58686.88 rows=80808 width=4) (actual time=30.315..76.314 rows=77454 loops=1)    
   Recheck Cond: (tags ~~ '%,2,%'::text)    
   Heap Blocks: exact=53210    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..535.73 rows=80808 width=0) (actual time=22.967..22.967 rows=77454 loops=1)    
         Index Cond: (tags ~~ '%,2,%'::text)    
 Planning Time: 0.991 ms    
 Execution Time: 78.163 ms    
(7 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags like '%,2,%' or tags like '%,8,%';    
                                                                 QUERY PLAN    
---------------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=983.56..87215.27 rows=127232 width=4) (actual time=48.651..811.842 rows=149132 loops=1)    
   Recheck Cond: ((tags ~~ '%,2,%'::text) OR (tags ~~ '%,8,%'::text))    
   Rows Removed by Index Recheck: 299658    
   Heap Blocks: exact=41915 lossy=33158    
   ->  BitmapOr  (cost=983.56..983.56 rows=131313 width=0) (actual time=43.554..43.554 rows=0 loops=1)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..535.73 rows=80808 width=0) (actual time=24.923..24.923 rows=77454 loops=1)    
               Index Cond: (tags ~~ '%,2,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..384.22 rows=50505 width=0) (actual time=18.629..18.629 rows=77054 loops=1)    
               Index Cond: (tags ~~ '%,8,%'::text)    
 Planning Time: 1.496 ms    
 Execution Time: 814.748 ms    
(11 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags like '%,2,%' or tags like '%,2696,%';    
                                                                 QUERY PLAN    
---------------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=958.67..64006.30 rows=90093 width=4) (actual time=75.859..900.779 rows=90084 loops=1)    
   Recheck Cond: ((tags ~~ '%,2,%'::text) OR (tags ~~ '%,2696,%'::text))    
   Rows Removed by Index Recheck: 348263    
   Heap Blocks: exact=39411 lossy=33155    
   ->  BitmapOr  (cost=958.67..958.67 rows=90909 width=0) (actual time=71.980..71.981 rows=0 loops=1)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..535.73 rows=80808 width=0) (actual time=26.486..26.487 rows=77454 loops=1)    
               Index Cond: (tags ~~ '%,2,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..377.89 rows=10101 width=0) (actual time=45.492..45.492 rows=62326 loops=1)    
               Index Cond: (tags ~~ '%,2696,%'::text)    
 Planning Time: 1.479 ms    
 Execution Time: 902.637 ms    
(11 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags like '%,2,%' or tags like '%,4356,%' or tags like '%,5064,%' or tags like '%,5711,%' or tags like '%,7363,%' or tags like '%,9417,%' or tags like '%,9444,%' ;    
                                                                                                          QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=3041.18..100880.75 rows=135482 width=4) (actual time=210.772..4047.148 rows=150218 loops=1)    
   Recheck Cond: ((tags ~~ '%,2,%'::text) OR (tags ~~ '%,4356,%'::text) OR (tags ~~ '%,5064,%'::text) OR (tags ~~ '%,5711,%'::text) OR (tags ~~ '%,7363,%'::text) OR (tags ~~ '%,9417,%'::text) OR (tags ~~ '%,9444,%'::text))    
   Rows Removed by Index Recheck: 422706    
   Heap Blocks: exact=56868 lossy=33226    
   ->  BitmapOr  (cost=3041.18..3041.18 rows=141614 width=0) (actual time=205.898..205.899 rows=0 loops=1)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..535.73 rows=80808 width=0) (actual time=24.656..24.656 rows=77454 loops=1)    
               Index Cond: (tags ~~ '%,2,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..428.40 rows=20202 width=0) (actual time=45.014..45.014 rows=62615 loops=1)    
               Index Cond: (tags ~~ '%,4356,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..377.89 rows=10101 width=0) (actual time=22.680..22.680 rows=39025 loops=1)    
               Index Cond: (tags ~~ '%,5064,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..428.40 rows=20202 width=0) (actual time=28.809..28.809 rows=62697 loops=1)    
               Index Cond: (tags ~~ '%,5711,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..377.89 rows=10101 width=0) (actual time=28.646..28.646 rows=62647 loops=1)    
               Index Cond: (tags ~~ '%,7363,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..327.89 rows=100 width=0) (actual time=28.361..28.361 rows=62172 loops=1)    
               Index Cond: (tags ~~ '%,9417,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..327.89 rows=100 width=0) (actual time=27.729..27.730 rows=62821 loops=1)    
               Index Cond: (tags ~~ '%,9444,%'::text)    
 Planning Time: 4.517 ms    
 Execution Time: 4050.040 ms    
(21 rows)    
    
postgres=#  explain analyze select uid from tbl_users where tags like '%,4356,%' or tags like '%,5064,%' or tags like '%,5711,%' or tags like '%,7363,%' or tags like '%,9417,%' or tags like '%,9444,%' ;    
                                                                                             QUERY PLAN    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=2357.58..50755.97 rows=59480 width=4) (actual time=209.115..3689.534 rows=78827 loops=1)    
   Recheck Cond: ((tags ~~ '%,4356,%'::text) OR (tags ~~ '%,5064,%'::text) OR (tags ~~ '%,5711,%'::text) OR (tags ~~ '%,7363,%'::text) OR (tags ~~ '%,9417,%'::text) OR (tags ~~ '%,9444,%'::text))    
   Rows Removed by Index Recheck: 455241    
   Heap Blocks: exact=55903 lossy=33218    
   ->  BitmapOr  (cost=2357.58..2357.58 rows=60806 width=0) (actual time=204.235..204.236 rows=0 loops=1)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..428.40 rows=20202 width=0) (actual time=57.485..57.485 rows=62615 loops=1)    
               Index Cond: (tags ~~ '%,4356,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..377.89 rows=10101 width=0) (actual time=26.156..26.157 rows=39025 loops=1)    
               Index Cond: (tags ~~ '%,5064,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..428.40 rows=20202 width=0) (actual time=33.539..33.539 rows=62697 loops=1)    
               Index Cond: (tags ~~ '%,5711,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..377.89 rows=10101 width=0) (actual time=30.136..30.136 rows=62647 loops=1)    
               Index Cond: (tags ~~ '%,7363,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..327.89 rows=100 width=0) (actual time=28.794..28.794 rows=62172 loops=1)    
               Index Cond: (tags ~~ '%,9417,%'::text)    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..327.89 rows=100 width=0) (actual time=28.122..28.122 rows=62821 loops=1)    
               Index Cond: (tags ~~ '%,9444,%'::text)    
 Planning Time: 3.860 ms    
 Execution Time: 3691.329 ms    
(19 rows)

四、使用PolarDB|PostgreSQL 特性设计和实验2

很显然你不能满足于前面的模糊查询索引带来的性能提升, 特别是当and条件非常多时, 模糊查询的索引也要被多次扫描并使用bitmap进行合并, 性能不好. (以上方法对于一个模糊查询条件性能提升是非常明显的.)

PolarDB和PostgreSQL都支持数组类型, 用数组存储标签, 支持gin索引可以加速数组的包含查询.

创建用户画像表, 使用数组存储标签字段.

drop table if exists tbl_users;    
    
create unlogged table tbl_users (  -- 为便于加速生成测试数据, 使用unlogged table    
  uid int primary key,   -- 用户id    
  tags int[]   -- 该用户拥有的标签 , 使用数组类型    
);

创建100万个用户, 用户被贴的标签数从32到256个, 随机产生, 其中8个为热门标签(例如性别、年龄段等都属于热门标签).

insert into tbl_users select id, get_tags_arr(ceil(24+random()*224)::int) from generate_series(1,1000000) id;    
    
create index on tbl_users using gin (tags);

搜索包含如下标签组合的用户:

  • 2
  • 2,8
  • 2,2696
  • 2,4356,5064,5711,7363,9417,9444
  • 4356,5064,5711,7363,9417,9444

数组包含(相当于and)匹配的 SQL 语句如下:

select uid from tbl_users where tags @> array[2];    
    
select uid from tbl_users where tags @> array[2,8];    
    
select uid from tbl_users where tags @> array[2,2696];    
    
select uid from tbl_users where tags @> array[2,4356,5064,5711,7363,9417,9444];    
    
select uid from tbl_users where tags @> array[4356,5064,5711,7363,9417,9444];

注意, 如果要和前面or的语义相同, 需要使用数组相交(&&)操作符.

使用数组类型和gin索引后, 查看执行计划和耗时如下:

postgres=# explain analyze select uid from tbl_users where tags @> array[2];    
                                                              QUERY PLAN    
---------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=437.95..53717.07 rows=76333 width=4) (actual time=24.031..69.706 rows=77641 loops=1)    
   Recheck Cond: (tags @> '{2}'::integer[])    
   Heap Blocks: exact=50231    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..418.86 rows=76333 width=0) (actual time=15.026..15.026 rows=77641 loops=1)    
         Index Cond: (tags @> '{2}'::integer[])    
 Planning Time: 1.137 ms    
 Execution Time: 74.015 ms    
(7 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags @> array[2,8];    
                                                             QUERY PLAN    
------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=49.97..6172.63 rows=5847 width=4) (actual time=10.745..18.272 rows=5303 loops=1)    
   Recheck Cond: (tags @> '{2,8}'::integer[])    
   Heap Blocks: exact=5133    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..48.51 rows=5847 width=0) (actual time=10.081..10.081 rows=5303 loops=1)    
         Index Cond: (tags @> '{2,8}'::integer[])    
 Planning Time: 0.256 ms    
 Execution Time: 18.561 ms    
(7 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags @> array[2,2696];    
                                                           QUERY PLAN    
---------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=21.27..443.58 rows=382 width=4) (actual time=2.872..4.662 rows=1003 loops=1)    
   Recheck Cond: (tags @> '{2,2696}'::integer[])    
   Heap Blocks: exact=999    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..21.18 rows=382 width=0) (actual time=2.729..2.729 rows=1003 loops=1)    
         Index Cond: (tags @> '{2,2696}'::integer[])    
 Planning Time: 0.246 ms    
 Execution Time: 4.750 ms    
(7 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags @> array[2,4356,5064,5711,7363,9417,9444];    
                                                         QUERY PLAN    
----------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=64.38..65.50 rows=1 width=4) (actual time=5.476..5.478 rows=0 loops=1)    
   Recheck Cond: (tags @> '{2,4356,5064,5711,7363,9417,9444}'::integer[])    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..64.38 rows=1 width=0) (actual time=5.471..5.472 rows=0 loops=1)    
         Index Cond: (tags @> '{2,4356,5064,5711,7363,9417,9444}'::integer[])    
 Planning Time: 0.223 ms    
 Execution Time: 5.523 ms    
(6 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags @> array[4356,5064,5711,7363,9417,9444];    
                                                         QUERY PLAN    
----------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=55.36..56.47 rows=1 width=4) (actual time=4.476..4.477 rows=0 loops=1)    
   Recheck Cond: (tags @> '{4356,5064,5711,7363,9417,9444}'::integer[])    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..55.36 rows=1 width=0) (actual time=4.471..4.472 rows=0 loops=1)    
         Index Cond: (tags @> '{4356,5064,5711,7363,9417,9444}'::integer[])    
 Planning Time: 0.275 ms    
 Execution Time: 4.528 ms    
(6 rows)

相交查询

postgres=# explain analyze select uid from tbl_users where tags && array[2]; 
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl_users  (cost=58.75..15961.52 rows=5000 width=4) (actual time=25.543..92.855 rows=77389 loops=1)
   Recheck Cond: (tags && '{2}'::integer[])
   Heap Blocks: exact=50691
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..57.50 rows=5000 width=0) (actual time=15.432..15.433 rows=77389 loops=1)
         Index Cond: (tags && '{2}'::integer[])
 Planning Time: 0.111 ms
 Execution Time: 96.430 ms
(7 rows)
Time: 97.390 ms
postgres=# explain analyze select uid from tbl_users where tags && array[2,8]; 
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl_users  (cost=117.31..28219.34 rows=9975 width=4) (actual time=36.031..611.148 rows=149164 loops=1)
   Recheck Cond: (tags && '{2,8}'::integer[])
   Rows Removed by Index Recheck: 341976
   Heap Blocks: exact=36324 lossy=33066
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..114.81 rows=9975 width=0) (actual time=29.971..29.972 rows=149164 loops=1)
         Index Cond: (tags && '{2,8}'::integer[])
 Planning Time: 0.134 ms
 Execution Time: 616.695 ms
(8 rows)
Time: 617.333 ms
postgres=# 
postgres=# explain analyze select uid from tbl_users where tags && array[2,2696]; 
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl_users  (cost=117.31..28219.34 rows=9975 width=4) (actual time=32.342..87.446 rows=90172 loops=1)
   Recheck Cond: (tags && '{2,2696}'::integer[])
   Heap Blocks: exact=55308
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..114.81 rows=9975 width=0) (actual time=21.062..21.062 rows=90172 loops=1)
         Index Cond: (tags && '{2,2696}'::integer[])
 Planning Time: 0.238 ms
 Execution Time: 90.860 ms
(7 rows)
Time: 91.766 ms
postgres=# explain analyze select uid from tbl_users where tags && array[2,4356,5064,5711,7363,9417,9444]; 
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl_users  (cost=403.21..63983.02 rows=34479 width=4) (actual time=41.533..1046.600 rows=149680 loops=1)
   Recheck Cond: (tags && '{2,4356,5064,5711,7363,9417,9444}'::integer[])
   Rows Removed by Index Recheck: 341017
   Heap Blocks: exact=36722 lossy=33065
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..394.60 rows=34479 width=0) (actual time=36.262..36.263 rows=149680 loops=1)
         Index Cond: (tags && '{2,4356,5064,5711,7363,9417,9444}'::integer[])
 Planning Time: 0.145 ms
 Execution Time: 1052.072 ms
(8 rows)
Time: 1052.957 ms (00:01.053)
postgres=# explain analyze select uid from tbl_users where tags && array[4356,5064,5711,7363,9417,9444]; 
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl_users  (cost=345.61..59085.40 rows=29627 width=4) (actual time=29.986..94.553 rows=78383 loops=1)
   Recheck Cond: (tags && '{4356,5064,5711,7363,9417,9444}'::integer[])
   Heap Blocks: exact=51427
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..338.21 rows=29627 width=0) (actual time=21.055..21.055 rows=78383 loops=1)
         Index Cond: (tags && '{4356,5064,5711,7363,9417,9444}'::integer[])
 Planning Time: 0.191 ms
 Execution Time: 97.450 ms
(7 rows)
Time: 98.213 ms

五、使用PolarDB|PostgreSQL 特性设计和实验3

当我们输入一组标签, 如果想放宽圈选条件, 而不仅仅是以上精确包含, 怎么实现? 例如:

  • 包含多少个以上的标签
  • 有百分之多少以上的标签重合

复用上面的数据, 换上smlar插件和索引来实现以上功能.

创建smlar插件

postgres=# create extension smlar ;    
CREATE EXTENSION

换上smlar索引

drop index tbl_users_tags_idx;    
    
create index on tbl_users using gin (tags _int4_sml_ops);

smlar插件的%操作符用来表达数组近似度过滤.

postgres=# explain analyze select count(*) from tbl_users where tags % array[1,2,3];    
                                                                QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------------------------------    
 Aggregate  (cost=1132.46..1132.47 rows=1 width=8) (actual time=75.613..75.614 rows=1 loops=1)    
   ->  Bitmap Heap Scan on tbl_users  (cost=35.25..1129.96 rows=1000 width=0) (actual time=75.609..75.610 rows=0 loops=1)    
         Recheck Cond: (tags % '{1,2,3}'::integer[])    
         Rows Removed by Index Recheck: 15059    
         Heap Blocks: exact=13734    
         ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..35.00 rows=1000 width=0) (actual time=31.466..31.466 rows=15059 loops=1)    
               Index Cond: (tags % '{1,2,3}'::integer[])    
 Planning Time: 0.408 ms    
 Execution Time: 75.687 ms    
(9 rows)

smlar插件支持的参数配置如下, 通过配置这些参数, 我们可以控制按什么算法来计算相似度, 相似度的过滤阈值是多少?

postgres=# select name,setting,enumvals,extra_desc from pg_settings where name ~ 'smlar';    
          name          | setting |        enumvals        |                                 extra_desc    
------------------------+---------+------------------------+-----------------------------------------------------------------------------    
 smlar.idf_plus_one     | off     |                        | Calculate idf by log(1+d/df)    
 smlar.persistent_cache | off     |                        | Cache of global stat is stored in transaction-independent memory    
 smlar.stattable        |         |                        | Named table stores global frequencies of array's elements    
 smlar.tf_method        | n       | {n,log,const}          | TF method: n => number of entries, log => 1+log(n), const => constant value    
 smlar.threshold        | 0.6     |                        | Array's with similarity lower than threshold are not similar by % operation    
 smlar.type             | cosine  | {cosine,tfidf,overlap} | Type of similarity formula: cosine(default), tfidf, overlap    
(6 rows)

接下来我们来实现上述两种近似搜索:

  • 包含多少个以上的标签
  • 有百分之多少以上的标签重合

包含多少个以上的标签, smlar.type = overlap  ,  smlar.threshold = INT

set smlar.type = overlap;    
set smlar.threshold = 1;  -- 精确匹配    
select uid from tbl_users where tags % array[2];    
    
set smlar.type = overlap;    
set smlar.threshold = 1;  -- 匹配到1个以上标签    
select uid from tbl_users where tags % array[2,8];    
    
set smlar.type = overlap;    
set smlar.threshold = 2;  -- 精确匹配    
select uid from tbl_users where tags % array[2,2696];    
    
set smlar.type = overlap;    
set smlar.threshold = 5;  -- 匹配到5个以上标签    
select uid from tbl_users where tags % array[2,4356,5064,5711,7363,9417,9444];    
    
set smlar.type = overlap;    
set smlar.threshold = 6;  -- 精确匹配    
select uid from tbl_users where tags % array[4356,5064,5711,7363,9417,9444];

使用smlar插件, 数组类型和gin索引后, 查看执行计划和耗时如下:

-- 为了实现语义上和&&一样, 那么应该设置: 
set smlar.type = overlap;     
set smlar.threshold = 1;  -- 包含1个元素即可
postgres=# explain analyze select uid from tbl_users where tags % array[2];    
                                                             QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=17.65..1112.36 rows=1000 width=4) (actual time=38.272..306.985 rows=77129 loops=1)    
   Recheck Cond: (tags % '{2}'::integer[])    
   Heap Blocks: exact=50082    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..17.40 rows=1000 width=0) (actual time=26.498..26.498 rows=77129 loops=1)    
         Index Cond: (tags % '{2}'::integer[])    
 Planning Time: 0.414 ms    
 Execution Time: 309.182 ms    
(7 rows)    
    
    
postgres=# explain analyze select uid from tbl_users where tags % array[2,8];    
                                                              QUERY PLAN    
--------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=26.45..1121.16 rows=1000 width=4) (actual time=33.378..790.183 rows=149118 loops=1)    
   Recheck Cond: (tags % '{2,8}'::integer[])    
   Rows Removed by Index Recheck: 351146    
   Heap Blocks: exact=35117 lossy=33064    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..26.20 rows=1000 width=0) (actual time=29.934..29.934 rows=149118 loops=1)    
         Index Cond: (tags % '{2,8}'::integer[])    
 Planning Time: 0.924 ms    
 Execution Time: 794.029 ms    
(8 rows)    
    
    
postgres=# explain analyze select uid from tbl_users where tags % array[2,2696];    
                                                            QUERY PLAN    
----------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=26.45..1121.16 rows=1000 width=4) (actual time=6.287..26.042 rows=1028 loops=1)    
   Recheck Cond: (tags % '{2,2696}'::integer[])    
   Heap Blocks: exact=1019    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..26.20 rows=1000 width=0) (actual time=5.956..5.956 rows=1028 loops=1)    
         Index Cond: (tags % '{2,2696}'::integer[])    
 Planning Time: 0.439 ms    
 Execution Time: 26.218 ms    
(7 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags % array[2,4356,5064,5711,7363,9417,9444];    
                                                           QUERY PLAN    
---------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=70.45..1165.16 rows=1000 width=4) (actual time=13.211..13.212 rows=0 loops=1)    
   Recheck Cond: (tags % '{2,4356,5064,5711,7363,9417,9444}'::integer[])    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..70.20 rows=1000 width=0) (actual time=13.204..13.205 rows=0 loops=1)    
         Index Cond: (tags % '{2,4356,5064,5711,7363,9417,9444}'::integer[])    
 Planning Time: 0.204 ms    
 Execution Time: 13.264 ms    
(6 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags % array[4356,5064,5711,7363,9417,9444];    
                                                           QUERY PLAN    
---------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=61.65..1156.36 rows=1000 width=4) (actual time=11.364..11.366 rows=0 loops=1)    
   Recheck Cond: (tags % '{4356,5064,5711,7363,9417,9444}'::integer[])    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..61.40 rows=1000 width=0) (actual time=11.357..11.358 rows=0 loops=1)    
         Index Cond: (tags % '{4356,5064,5711,7363,9417,9444}'::integer[])    
 Planning Time: 0.264 ms    
 Execution Time: 11.447 ms    
(6 rows)

思考: 必须包含xx标签, 并且yy标签里有n%相交. 这个需求怎么高效解决?

有百分之多少以上的标签重合, smlar.type = cosine    ,  smlar.threshold = FLOAT

set smlar.type = cosine;    
set smlar.threshold = 1;  -- 精确匹配, 目标也必须只包含2, 相当于相等    
select uid from tbl_users where tags % array[2];    
    
set smlar.type = cosine;    
set smlar.threshold = 0.5;  -- 两组标签的交集(重叠标签)占两组标签叠加(并集)后的50%以上    
select uid from tbl_users where tags % array[2,8];    
    
set smlar.type = cosine;    
set smlar.threshold = 1;  -- 精确匹配, 两组标签的交集(重叠标签)占两组标签叠加(并集)后的100%以上    
select uid from tbl_users where tags % array[2,2696];    
    
set smlar.type = cosine;    
set smlar.threshold = 0.7;  -- 两组标签的交集(重叠标签)占两组标签叠加(并集)后的70%以上    
select uid from tbl_users where tags % array[2,4356,5064,5711,7363,9417,9444];    
    
set smlar.type = cosine;    
set smlar.threshold = 0.9;  -- 两组标签的交集(重叠标签)占两组标签叠加(并集)后的90%以上    
select uid from tbl_users where tags % array[4356,5064,5711,7363,9417,9444];

使用smlar插件, 数组类型和gin索引后, 查看执行计划和耗时如下:

postgres=# explain analyze select uid from tbl_users where tags % array[2];    
                                                             QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=17.65..1112.36 rows=1000 width=4) (actual time=301.094..301.094 rows=0 loops=1)    
   Recheck Cond: (tags % '{2}'::integer[])    
   Rows Removed by Index Recheck: 77129    
   Heap Blocks: exact=50082    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..17.40 rows=1000 width=0) (actual time=25.659..25.659 rows=77129 loops=1)    
         Index Cond: (tags % '{2}'::integer[])    
 Planning Time: 0.252 ms    
 Execution Time: 301.135 ms    
(8 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags % array[2,8];    
                                                              QUERY PLAN    
--------------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=26.45..1121.16 rows=1000 width=4) (actual time=799.554..799.554 rows=0 loops=1)    
   Recheck Cond: (tags % '{2,8}'::integer[])    
   Rows Removed by Index Recheck: 500264    
   Heap Blocks: exact=35117 lossy=33064    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..26.20 rows=1000 width=0) (actual time=43.356..43.356 rows=149118 loops=1)    
         Index Cond: (tags % '{2,8}'::integer[])    
 Planning Time: 0.379 ms    
 Execution Time: 799.611 ms    
(8 rows)    
    
postgres=# explain analyze select uid from tbl_users where tags % array[2,2696];    
                                                            QUERY PLAN    
----------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=26.45..1121.16 rows=1000 width=4) (actual time=26.476..26.478 rows=0 loops=1)    
   Recheck Cond: (tags % '{2,2696}'::integer[])    
   Rows Removed by Index Recheck: 1028    
   Heap Blocks: exact=1019    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..26.20 rows=1000 width=0) (actual time=5.242..5.242 rows=1028 loops=1)    
         Index Cond: (tags % '{2,2696}'::integer[])    
 Planning Time: 0.570 ms    
 Execution Time: 26.570 ms    
(8 rows)    
    
    
postgres=# explain analyze select uid from tbl_users where tags % array[2,4356,5064,5711,7363,9417,9444];    
                                                           QUERY PLAN    
---------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=70.45..1165.16 rows=1000 width=4) (actual time=16.722..16.723 rows=0 loops=1)    
   Recheck Cond: (tags % '{2,4356,5064,5711,7363,9417,9444}'::integer[])    
   Rows Removed by Index Recheck: 8    
   Heap Blocks: exact=8    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..70.20 rows=1000 width=0) (actual time=16.586..16.587 rows=8 loops=1)    
         Index Cond: (tags % '{2,4356,5064,5711,7363,9417,9444}'::integer[])    
 Planning Time: 0.276 ms    
 Execution Time: 16.795 ms    
(8 rows)    
    
    
postgres=# explain analyze select uid from tbl_users where tags % array[4356,5064,5711,7363,9417,9444];    
                                                          QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on tbl_users  (cost=61.65..1156.36 rows=1000 width=4) (actual time=9.755..9.757 rows=0 loops=1)    
   Recheck Cond: (tags % '{4356,5064,5711,7363,9417,9444}'::integer[])    
   ->  Bitmap Index Scan on tbl_users_tags_idx  (cost=0.00..61.40 rows=1000 width=0) (actual time=9.748..9.749 rows=0 loops=1)    
         Index Cond: (tags % '{4356,5064,5711,7363,9417,9444}'::integer[])    
 Planning Time: 0.294 ms    
 Execution Time: 9.811 ms    
(6 rows)

六、传统方法与PolarDB|PostgreSQL的对照

方法 SQL1 耗时 ms SQL2 耗时 ms SQL3 耗时 ms SQL4 耗时 ms SQL5 耗时 ms
传统字符串 + 全表扫描 1110.267 2004.062 2060.434 6767.990 6208.191
PolarDB 传统字符串 + 模糊搜索 + gin索引加速 78.163 814.748 902.637 4050.040 3691.329
PolarDB 数组 + gin索引加速 74.015 18.561 4.750 5.523 4.528
PolarDB 数组(重叠个数)相似度搜索 + gin索引加速 309.182 794.029 26.218 13.264 11.447
PolarDB 数组(重叠占比)相似度搜索 + gin索引加速 301.135 799.611 26.570 16.795 9.811

七、知识点

0、pg_trgm prefix和suffix的添加有点细节要说明一下.

https://www.postgresql.org/docs/current/pgtrgm.html

pg_trgm ignores non-word characters (non-alphanumerics) when extracting trigrams from a string.

也就是忽略了数字、字符、字以外的其他无意义字符, 如果有这样的字符把有意义的字符隔开了, 那么将重新使用prefix"  "" "suffix处理.

a-b--c:  a b c被无意义字符隔开, 变成了"a" , "b" , "c", 最后将得到:   
"  a" , " a " , "  b" , " b " , "  c" , " c "  
  
postgres=# select show_trgm('a|b|-c');  
               show_trgm                 
---------------------------------------  
 {"  a","  b","  c"," a "," b "," c "}  
(1 row)  
  
postgres=# select show_trgm('as|b|-c');  
                  show_trgm                    
---------------------------------------------  
 {"  a","  b","  c"," as"," b "," c ","as "}  
(1 row)  
  
相当于单独处理:   
as  
b  
c

1、数组类型

2、gin索引

3、smlar 插件

更多算法参考: https://github.com/jirutka/smlar

《PostgreSQL结合余弦、线性相关算法 在文本、图片、数组相似 等领域的应用 - 2 smlar插件详解》

4、pg_trgm 插件

八、思考

pg_trgm插件对字符串做了什么处理, 可以利用gin索引加速模糊查询加速?

smlar插件是如何通过索引快速判断两个数组的相似性达到阈值的?

为什么多个模糊匹配条件使用and条件后, 性能下降严重?

为什么使用数组类型后, 标签条件越多性能越好?

如果多个模糊匹配条件是or 条件呢? 性能会下降还是提升?

还有什么业务场景会用到数组?

还有哪些业务场景会用到字符串模糊匹配?

还有什么业务场景非常适合使用数组相似的功能?

除了使用标签匹配来圈选相似目标人群, 还可不可以使用其他方式圈选? 例如向量距离?

使用标签匹配时, 如果我们要排除某些标签, 而不是包含某些标签, 应该如何写sql, 性能又会怎么样呢?

为什么使用smlar进行相似度过滤时, 相似度越高性能越好?

SQL圈选性能和返回符合条件的用户记录数有没有关系? 是什么关系?

当使用pg_trgm进行模糊搜索加速时, 如果字符串中包含wchar(例如中文)时性能如果很差要怎么办? 如果需要模糊搜索的字符只有1个或2个字符时性能如果很差要怎么办?

如果要包含某些标签, 并且包含某些标签中的若干个以上.

下面查询, 能实现的效果是: 一定包含2,5. 并且在4356,5064,5711,7363,9417,9444这些标签中包含任意4个或以上.
set smlar.type = overlap;    
set smlar.threshold = 4; 
select uid from tbl_users where tags @> array[2,5] and tags % array[4356,5064,5711,7363,9417,9444];
-- 用到了2个索引opc, 一个是gin array默认的, 一个是smlar插件提供的opc
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
7月前
|
SQL 关系型数据库 Docker
开源PolarDB-X部署体验:挑战与反思
本文记录了部署开源PolarDB-X的体验及挑战。PolarDB-X是一款云原生分布式SQL数据库,具备高性能和可扩展性。部署中遇到的问题包括依赖冲突、文档不清晰及官方文档无法访问等。建议改进文档准确性、增强错误提示和支持、简化部署流程,如采用Docker简化快速入门。尽管部署未成功,但仍对PolarDB-X未来发展持乐观态度。
|
1天前
|
关系型数据库 分布式数据库 PolarDB
PolarDB 开源基础教程系列 7.2 应用实践之 跨境电商场景
本文介绍了如何在跨境电商场景中快速判断商标或品牌侵权,避免因侵权带来的法律纠纷。通过创建品牌表并使用PostgreSQL的pg_trgm插件和GIN索引,实现了高性能的字符串相似匹配功能。与传统方法相比,PolarDB|PostgreSQL的方法不仅提升了上万倍的查询速度,还解决了传统方法难以处理的相似问题检索。具体实现步骤包括创建品牌表、插入随机品牌名、配置pg_trgm插件及索引,并设置相似度阈值进行高效查询。此外,文章还探讨了字符串相似度计算的原理及应用场景,提供了进一步优化和扩展的方向。
25 11
|
9月前
|
关系型数据库 分布式数据库 数据处理
报名预约|PolarDB产品易用性创新与最佳实践在线直播
在线体验PolarDB产品易用性创新,练习阿里云数据库NL2SQL、无感切换实操技能,探索数据处理提速与学习成本降低实践
|
SQL 数据安全/隐私保护 数据库管理
快速提升NineData实战能力:新手任务详解
NineData控制台的新手任务旨在帮助新用户熟悉和掌握其各项功能,包括数据源创建、SQL开发、分钟级数据恢复、不停机数据迁移和企业协同开发等。通过任务引导和实践,可以显著降低使用门槛,提升使用效率,保证数据安全。对于新用户,玖章算术提供了完整的NineData新手任务实战指南,覆盖四大模块的基础操作和企业协同开发。完成任务后可掌握关键概念、功能和应用能力,提升工作效率和数据安全性。
94 0
快速提升NineData实战能力:新手任务详解
|
关系型数据库 分布式数据库 PolarDB
零基础PolarDB开源贡献新手教学 更有机会赢取PolarDB定制咖啡杯!
欢迎大家参与到我们的开源文档贡献,无论是文档捉虫、文档优化还是进阶的案例、代码贡献,我们都非常欢迎,快来开启你的第一次开源贡献之旅吧,前10名参与文档贡献并正式合入PR,可获得PolarDB定制咖啡杯一个!
零基础PolarDB开源贡献新手教学  更有机会赢取PolarDB定制咖啡杯!
|
运维 分布式数据库 云栖大会
PolarDB-X 开源分布式数据库进阶营玩法公告
为了帮助大家能够快速体验 PolarDB-X 新版本特性,阿里云 PolarDB-X 技术专家团队联合开发者学堂,共同打造 PolarDB-X 开源分布式数据库进阶训练营,通过本次课程您将学习到 PolarDB-X 新版本诸多重点特性,包括更高效的部署与运维方法(一键创建、审计日志、节点重搭、备份恢复、参数模版、只读实例)、数据快速导入导出、TP负载测试、分区管理、数据TTL过期自动删除、冷热数据归档、读写分离与HTAP等众多新特性,更有免费线上实验环境,理论学习与场景实操相结合,完成七天打卡任务将获得训练营结营证书,还有精美定制礼品等着你!
PolarDB-X 开源分布式数据库进阶营玩法公告
|
存储 运维 架构师
直播预告 | PolarDB 开源版在可计算存储上的降本增效原理和实践
通过 PolarDB 开源数据库和 ScaleFlux 可计算存储产品 CSD2000/3000,也可以降低数据库存储成本并且进一步提升性能。本次分享主要介绍 CSD 降本增效原理以及 PolarDB-X 和 PolarDB-PG 开源版在 CSD 上的部署实践。
直播预告 | PolarDB 开源版在可计算存储上的降本增效原理和实践
|
存储 算法 Cloud Native
PolarDB 开源生态插件心选 - 这些插件让业务战斗力提升100倍!!!
PolarDB 的云原生存算分离架构, 具备低廉的数据存储、高效扩展弹性、高速多机并行计算能力、高速数据搜索和处理; PolarDB与计算算法结合, 将实现双剑合璧, 推动业务数据的价值产出, 将数据变成生产力. 本文将介绍PolarDB 开源版插件生态, 通过插件给数据库加装新的算法和索引|存储结构, 结合PolarDB的大规模存储管理能力, 实现算法和存储双剑合璧, 是企业在数据驱动时代的决胜利器.
445 0
|
数据采集 人工智能 运维