Greenplum 类型一致性使用规范 - 索引条件、JOIN的类型一致性限制

简介:

标签

PostgreSQL , Greenplum , 类型一致 , 索引 , inner转换 , join


背景

在查询时,有很多用户会犯浑,发现建立了索引,但是查询偏偏不走索引。

怎么不走索引啊?

这里做容易混淆的是类型的差异,例如字段类型为字符串,但是输入的是INT类型,这样就可能不走索引。(除非创建了自动的CAST,自动进行类型转换)

查询的输入类型与索引的类型一致是非常有必要的。

例子

1、建表(使用变长、定长字符串类型),写入测试数据

create table tbl3(id int, c1 text, c2 varchar(64), c3 char(64));  
  
postgres=# insert into tbl3 select id, md5, md5, md5 from (select id, md5(random()::text) as md5 from generate_series(1,1000000) t(id)) t;  
INSERT 0 1000000  

2、创建索引

postgres=# create index idx_tbl3_1 on tbl3 (c1);  
CREATE INDEX  
postgres=# create index idx_tbl3_2 on tbl3 (c2);  
CREATE INDEX  
postgres=# create index idx_tbl3_3 on tbl3 (c3);  
CREATE INDEX  

3、结构如下

postgres=# \d+ tbl3  
                         Table "public.tbl3"  
 Column |         Type          | Modifiers | Storage  | Description   
--------+-----------------------+-----------+----------+-------------  
 id     | integer               |           | plain    |   
 c1     | text                  |           | extended |   
 c2     | character varying(64) |           | extended |   
 c3     | character(64)         |           | extended |   
Indexes:  
    "idx_tbl3_1" btree (c1)  
    "idx_tbl3_2" btree (c2)  
    "idx_tbl3_3" btree (c3)  
Has OIDs: no  
Distributed by: (id)  

4、走索引,强制转换为bpchar。

postgres=# explain analyze select * from tbl3 where c3='abc';  
                                     QUERY PLAN                                       
------------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..201.23 rows=1 width=135)  
   Rows out:  0 rows at destination with 3.081 ms to end, start offset by 0.294 ms.  
   ->  Index Scan using idx_tbl3_3 on tbl3  (cost=0.00..201.23 rows=1 width=135)  
         Index Cond: c3 = 'abc'::bpchar  
         Rows out:  0 rows (seg0) with 0.138 ms to end, start offset by 3.211 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 147K bytes.  
   (slice1)    Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 3.484 ms  
(11 rows)  

5、不走索引,因为输入的为text变长,与char定长不一致。

postgres=# explain analyze select * from tbl3 where c3='abc'::text;  
                                     QUERY PLAN                                        
-------------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..20032.60 rows=1001 width=135)  
   Rows out:  0 rows at destination with 305 ms to end, start offset by 0.335 ms.  
   ->  Seq Scan on tbl3  (cost=0.00..20032.60 rows=334 width=135)  
         Filter: c3::text = 'abc'::text  
         Rows out:  0 rows (seg0) with 302 ms to end, start offset by 4.023 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 147K bytes.  
   (slice1)    Executor memory: 195K bytes avg x 3 workers, 195K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 305.985 ms  
(11 rows)  

6、走索引,输入的类型为变长类型,字段也是变长类型,与TEXT兼容。

postgres=# explain analyze select * from tbl3 where c2='abc'::text;  
                                     QUERY PLAN                                       
------------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..200.86 rows=1 width=135)  
   Rows out:  0 rows at destination with 2.744 ms to end, start offset by 0.275 ms.  
   ->  Index Scan using idx_tbl3_2 on tbl3  (cost=0.00..200.86 rows=1 width=135)  
         Index Cond: c2::text = 'abc'::text  
         Rows out:  0 rows (seg0) with 0.031 ms to end, start offset by 2.868 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 147K bytes.  
   (slice1)    Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 3.120 ms  
(11 rows)  

7、不走索引,输入的类型为定长类型,但是字段为变长类型。

postgres=# explain analyze select * from tbl3 where c2='abc'::bpchar;  
                                    QUERY PLAN                                       
-----------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..17531.66 rows=1 width=135)  
   Rows out:  0 rows at destination with 60 ms to end, start offset by 0.276 ms.  
   ->  Seq Scan on tbl3  (cost=0.00..17531.66 rows=1 width=135)  
         Filter: c2::bpchar = 'abc'::bpchar  
         Rows out:  0 rows (seg0) with 57 ms to end, start offset by 2.864 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 147K bytes.  
   (slice1)    Executor memory: 131K bytes avg x 3 workers, 131K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 60.320 ms  
(11 rows)  

8、内表不走索引,因为JOIN字段类型不匹配。

postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c3) and t1.c1='abc';  
                                                QUERY PLAN                                                   
-----------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=25235.81..25235.82 rows=1 width=8)  
   ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=25235.74..25235.79 rows=1 width=8)  
         ->  Aggregate  (cost=25235.74..25235.75 rows=1 width=8)  
               ->  Nested Loop  (cost=0.00..25235.66 rows=11 width=0)  
                     ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..200.88 rows=1 width=33)  
                           ->  Index Scan using idx_tbl3_1 on tbl3 t1  (cost=0.00..200.86 rows=1 width=33)  
                                 Index Cond: c1 = 'abc'::text AND 'abc'::text = c1  
                     ->  Seq Scan on tbl3 t2  (cost=0.00..25034.46 rows=11 width=65)  
                           Filter: 'abc'::text = c3::text AND c3::text = 'abc'::text  
(9 rows)  

9、内表走索引,因为JOIN字段类型都是变长,匹配。

postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c2) and t1.c1='abc';  
                                                   QUERY PLAN                                                      
-----------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=401.86..401.87 rows=1 width=8)  
   ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=401.80..401.85 rows=1 width=8)  
         ->  Aggregate  (cost=401.80..401.81 rows=1 width=8)  
               ->  Nested Loop  (cost=200.88..401.79 rows=2 width=0)  
                     ->  Index Scan using idx_tbl3_1 on tbl3 t1  (cost=0.00..200.86 rows=1 width=33)  
                           Index Cond: c1 = 'abc'::text AND 'abc'::text = c1  
                     ->  Materialize  (cost=200.88..200.89 rows=1 width=33)  
                           ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..200.88 rows=1 width=33)  
                                 ->  Index Scan using idx_tbl3_2 on tbl3 t2  (cost=0.00..200.86 rows=1 width=33)  
                                       Index Cond: 'abc'::text = c2::text AND c2::text = 'abc'::text  
(10 rows)  

分析

实际上就是要求索引的表达式与条件的表达式一致。

例子,使用表达式,强制转换为变长类型。

postgres=# create index idx_tbl3_4 on tbl3 ((c3::text));  
CREATE INDEX  

查询时,只要表达式的类型与条件类型匹配即可,走索引,同时会使用Implicit转换。

postgres=# explain analyze select * from tbl3 where (c3::text)='abc';  
                                                   QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=108.60..52793.25 rows=1001 width=135)  
   Rows out:  0 rows at destination with 3.796 ms to end, start offset by 0.309 ms.  
   ->  Bitmap Heap Scan on tbl3  (cost=108.60..52793.25 rows=334 width=135)  
         Recheck Cond: c3::text = 'abc'::text  
         Rows out:  0 rows (seg0) with 0.163 ms to end, start offset by 3.095 ms.  
         ->  Bitmap Index Scan on idx_tbl3_4  (cost=0.00..108.35 rows=334 width=0)  
               Index Cond: c3::text = 'abc'::text  
               Bitmaps out:  Avg 1.0 x 3 workers.  Max 1 (seg0) with 0.135 ms to end, start offset by 3.119 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 155K bytes.  
   (slice1)    Executor memory: 321K bytes avg x 3 workers, 321K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_seqscan=off  
 Total runtime: 4.219 ms  
(15 rows)  
postgres=# explain analyze select * from tbl3 where (c3::text) = '123'::int;  
                                                   QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------  
 Gather Motion 3:1  (slice1; segments: 3)  (cost=108.60..52793.25 rows=1001 width=135)  
   Rows out:  0 rows at destination with 2.926 ms to end, start offset by 0.298 ms.  
   ->  Bitmap Heap Scan on tbl3  (cost=108.60..52793.25 rows=334 width=135)  
         Recheck Cond: c3::text = '123'::text  
         Rows out:  0 rows (seg0) with 0.110 ms to end, start offset by 3.057 ms.  
         ->  Bitmap Index Scan on idx_tbl3_4  (cost=0.00..108.35 rows=334 width=0)  
               Index Cond: c3::text = '123'::text  
               Bitmaps out:  Avg 1.0 x 3 workers.  Max 1 (seg0) with 0.064 ms to end, start offset by 3.095 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 155K bytes.  
   (slice1)    Executor memory: 289K bytes avg x 3 workers, 289K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Settings:  enable_seqscan=off  
 Total runtime: 3.327 ms  
(15 rows)  
postgres=# \dC  
                                         List of casts  
         Source type         |         Target type         |      Function      |   Implicit?     
-----------------------------+-----------------------------+--------------------+---------------  
 abstime                     | date                        | date               | in assignment  
 abstime                     | integer                     | (binary coercible) | no  
 abstime                     | timestamp without time zone | timestamp          | yes  
 abstime                     | timestamp with time zone    | timestamptz        | yes  
 abstime                     | time without time zone      | time               | in assignment  
 bigint                      | bit                         | bit                | no  
 bigint                      | character                   | text               | in assignment  
 bigint                      | character varying           | text               | in assignment  
 bigint                      | double precision            | float8             | yes  
 bigint                      | integer                     | int4               | in assignment  
 bigint                      | numeric                     | numeric            | yes  
 bigint                      | oid                         | oid                | yes  
 bigint                      | real                        | float4             | yes  
 ...............  

小结

要让索引起作用,最起码应该关注一下几点。

1、注意变长字符串和定长字符串,属于两种数据类型,不要混淆。

2、JOIN时操作符两边的字段类型一致。

3、查询时,WHERE条件与索引表达式一致。

4、查询时,WHERE条件与索引字段的类型一致。

5、查询时,WHERE条件的collate与索引的collate一致。(本地化相关)

6、操作符本身支持对应的索引访问方法。例如大多数的=操作符,支持b-tree的索引访问方法。

7、PG是CBO优化,因此索引访问的成本更低时,才会选择索引扫描。(设置了特定的优化器开关,或者使用了HINT除外)。

参考

《PostgreSQL 自定义自动类型转换(CAST)》

目录
相关文章
|
5月前
|
数据采集 缓存 监控
唯品会 API 开发痛点解析:从权限申请到数据清洗的实战经验
本文深入解析唯品会开放平台(VOP)API开发全流程,涵盖权限申请、签名机制、数据清洗、性能优化等核心挑战与实战解决方案,助力开发者高效构建稳定可靠的电商数据整合系统。
|
域名解析 运维 关系型数据库
3步!完成WordPress博客迁移与重新部署
服务器到期后,该如何迁移、重新部署WordPress博客呢?
3步!完成WordPress博客迁移与重新部署
|
1月前
|
人工智能 运维 算法
AR眼镜基于上下文智能识别:电力运维高效规范操作应用方案|阿法龙XR云平台
基于AR与AI融合技术,打造智能电力运维解决方案。通过轻量化AR眼镜,实现设备自动识别、状态智能研判、操作规范引导与异常实时预警,结合语音交互与全息显示,提升巡检效率,降低误操作风险,推动电力运维向智能化、可视化、标准化升级。(238字)
|
2月前
|
缓存 自然语言处理 API
阿里巴巴国际站关键字搜索 API 实战:3 步搞定多语言适配 + 限流破局,询盘量提升 40%
跨境电商API开发常陷合规、多语言、限流等坑。本文详解从国际合规(GDPR/CCPA)到参数优化、数据结构化及区域化搜索的全链路方案,附Python代码模板与缓存重试架构,助力提升调用成功率至99%+,精准询盘增长42%。
|
传感器 数据采集 移动开发
基于STM32的智能手环wifi连接手机APP(下)
基于STM32的智能手环wifi连接手机APP(下)
919 0
|
边缘计算 Cloud Native 数据管理
【阿里云云原生专栏】云原生背景下的AIoT布局:阿里云Link平台解析
【5月更文挑战第29天】阿里云Link平台,作为阿里云在AIoT领域的核心战略,借助云原生技术,为开发者打造一站式物联网服务平台。平台支持多协议设备接入与标准化管理,提供高效数据存储、分析及可视化,集成边缘计算实现低延时智能分析。通过实例代码展示,平台简化设备接入,助力智能家居等领域的创新应用,赋能开发者构建智能生态系统。
459 3
|
存储 Linux API
深入探索Android系统架构:从内核到应用层的全面解析
本文旨在为读者提供一份详尽的Android系统架构分析,从底层的Linux内核到顶层的应用程序框架。我们将探讨Android系统的模块化设计、各层之间的交互机制以及它们如何共同协作以支持丰富多样的应用生态。通过本篇文章,开发者和爱好者可以更深入理解Android平台的工作原理,从而优化开发流程和提升应用性能。
【亲测有效】解决url中&times会被转成×的问题 &timestamp=打印出来是 ×tamp=
【亲测有效】解决url中&times会被转成×的问题 &timestamp=打印出来是 ×tamp=
364 0
|
机器学习/深度学习 供应链 算法
量子计算:从理论到实践的跨越
量子计算基于量子力学原理,利用量子比特的叠加态和纠缠特性,展现出远超经典计算机的计算能力。本文从基本概念、发展历程、应用场景及未来挑战四个方面,全面介绍量子计算从理论到实践的跨越,展望其在优化问题、量子化学、机器学习等领域的广泛应用前景。
|
运维 安全 虚拟化