PostgreSQL 优化器案例之 - order by limit 索引选择问题

本文涉及的产品
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
RDS SQL Server Serverless,2-4RCU 50GB 3个月
推荐场景:
云原生数据库 PolarDB 分布式版,标准版 2核8GB
简介:

标签

PostgreSQL , limit , order by , 优化器 , 选择性 , 相关性 , 数据存储顺序 , 目标数据存储顺序


背景

当我们在执行一个这样的SQL时,假如有这样几个索引(c1,c2) (id),数据库到底该用哪个索引呢?

explain select * from tbl where c1=200 and c2=200 order by id limit 10;  
或  
explain select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  

实际上PG会通过计算成本得到应该使用哪个索引。

但是实际上计算公式本身的通用性,使得当C1,C2数据分布倾斜时,可能导致某些C1,C2输入值的执行计划不准确。

走c1,c2索引,当满足c1,c2条件的记录很少时。如果数据比较多,则SORT耗费的成本就较大,导致整个耗时变大。

走id索引,当满足c1,c2条件的记录很多时,(并且c1,c2满足条件的数据分布在id偏小的端),如果分布在大端,则需要扫描更多的记录才能找到满足条件的记录。

如果数据库统计信息中可以评估出来满足条件的c1,c2的记录与ID顺序分布的关系,也可以作为优化器计算成本的一种参考输入(当然要做到这个可能比较复杂,同时也会增加优化器计算成本的开销)。

下面通过实例来说明。

例子

1、建测试表

postgres=# create table tbl (id int, c1 int, c2 int, c3 int, c4 int);  
CREATE TABLE  

2、写入一批随机数据,ID从1到1000万。

postgres=# insert into tbl select generate_series(1,10000000), random()*100, random()*100, random()*100, random()*100;  
INSERT 0 10000000  

3、写入另一批100万条数据,c1,c2 与前面1000万的值不一样。

postgres=# insert into tbl select generate_series(10000001,11000000), 200,200,200,200;  
INSERT 0 1000000  

4、创建两个索引,也就是本文需要重点关注的,到底走哪个索引更划算

postgres=# create index idx_tbl_1 on tbl(id);  
CREATE INDEX  
postgres=# create index idx_tbl_2 on tbl(c1,c2,c3,c4);  
CREATE INDEX  

5、收集统计信息

postgres=# vacuum analyze tbl;  
VACUUM  

6、查看下面SQL的执行计划,走了id索引

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 10;  
                                      QUERY PLAN                                        
--------------------------------------------------------------------------------------  
 Limit  (cost=0.43..32.59 rows=10 width=20)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20)  
         Filter: ((c1 = 200) AND (c2 = 200))  
(3 rows)  

还是走了id索引

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 1000;  
                                      QUERY PLAN                                        
--------------------------------------------------------------------------------------  
 Limit  (cost=0.43..3215.74 rows=1000 width=20)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20)  
         Filter: ((c1 = 200) AND (c2 = 200))  
(3 rows)  

当LIMIT达到50000时,走了c1,c2的索引。为什么呢?

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 50000;  
                                         QUERY PLAN                                           
--------------------------------------------------------------------------------------------  
 Limit  (cost=70355.06..70480.06 rows=50000 width=20)  
   ->  Sort  (cost=70355.06..70606.39 rows=100533 width=20)  
         Sort Key: id  
         ->  Bitmap Heap Scan on tbl  (cost=1457.82..62005.97 rows=100533 width=20)  
               Recheck Cond: ((c1 = 200) AND (c2 = 200))  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0)  
                     Index Cond: ((c1 = 200) AND (c2 = 200))  
(7 rows)  

7、分析以上执行计划的含义

首先,表的记录数(1100万)除以"满足c1=200 and c2=200 条件的记录数"(100533),得到平均需要扫描多少条记录,可以得到一条满足c1=200 and c2=200条件的记录.

postgres=# select 11000000/100533.0;  
       ?column?         
----------------------  
 109.4168084111684720  
(1 row)  

也就是说每扫描109.4条记录,可以得到一条满足条件的记录。(优化器这么算,是认为数据分布是均匀的。)

但是,实际上,数据分布是不均匀的,c1=200 and c2=200的记录在表的末端(1000万条记录后面),也就是说需要扫描1000万条记录后,才能得到1条满足c1=200 and c2=200的记录。

并不是估算的每扫描109.4条记录,可以得到一条满足条件的记录。

问题就出在这里。

8、我们再来分析一下为什么limit 50000时,选择了c1,c2的索引。而不是id的索引

使用ID索引时,需要扫描100533条记录,同时需要排序,直到排序完成,总成约70606.39。然后就是GET HEAP TUPLE的成本。

当使用id的索引扫描时,返回多少条记录能达到70606.39的成本呢?

以limit 1000的3215.74成本为例  
  
postgres=# select 70606.39/3215.74;  
      ?column?         
---------------------  
 21.9564983487471002  
(1 row)  
  
postgres=# select 21.956*1000;  
 ?column?    
-----------  
 21956.000  
(1 row)  

分水岭

经过以上分析,也就是说,LIMIT 21956时,走ID索引扫描的执行计划,成本可达到70606.39。

所以limit 21956是一个分水岭,大于这个值时,可能使用c1,c2的索引扫描,而小于它,则会使用ID索引扫描.

如下

postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 22000;  
                                         QUERY PLAN                                           
--------------------------------------------------------------------------------------------  
 Limit  (cost=69759.69..69814.69 rows=22000 width=20)  
   ->  Sort  (cost=69759.69..70011.02 rows=100533 width=20)  
         Sort Key: id  
         ->  Bitmap Heap Scan on tbl  (cost=1457.82..62005.97 rows=100533 width=20)  
               Recheck Cond: ((c1 = 200) AND (c2 = 200))  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0)  
                     Index Cond: ((c1 = 200) AND (c2 = 200))  
(7 rows)  
postgres=# explain select * from tbl where c1=200 and c2=200 order by id limit 21000;  
                                      QUERY PLAN                                        
--------------------------------------------------------------------------------------  
 Limit  (cost=0.43..67521.75 rows=21000 width=20)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20)  
         Filter: ((c1 = 200) AND (c2 = 200))  
(3 rows)  

真实的执行耗时

很显然,使用id扫描,一定会慢,因为满足条件的数据都分布在1000万行后面。

1、c1,c2索引扫描,直接命中数据,加排序(100万条),略快。

postgres=# explain  analyze select * from tbl where c1=200 and c2=200 order by id limit 22000;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=69759.69..69814.69 rows=22000 width=20) (actual time=293.961..299.054 rows=22000 loops=1)  
   ->  Sort  (cost=69759.69..70011.02 rows=100533 width=20) (actual time=293.960..296.006 rows=22000 loops=1)  
         Sort Key: id  
         Sort Method: top-N heapsort  Memory: 3255kB  
         ->  Bitmap Heap Scan on tbl  (cost=1457.82..62005.97 rows=100533 width=20) (actual time=47.919..175.698 rows=1000000 loops=1)  
               Recheck Cond: ((c1 = 200) AND (c2 = 200))  
               Heap Blocks: exact=6370  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0) (actual time=47.160..47.160 rows=1000000 loops=1)  
                     Index Cond: ((c1 = 200) AND (c2 = 200))  
 Planning time: 0.152 ms  
 Execution time: 300.664 ms  
(11 rows)  

2、id 索引扫描,慢。

postgres=# explain  analyze select * from tbl where c1=200 and c2=200 order by id limit 21000;  
                                                                QUERY PLAN                                                                  
------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..67521.75 rows=21000 width=20) (actual time=1404.932..1412.594 rows=21000 loops=1)  
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323244.26 rows=100533 width=20) (actual time=1404.930..1409.639 rows=21000 loops=1)  
         Filter: ((c1 = 200) AND (c2 = 200))  
         Rows Removed by Filter: 10000000  
 Planning time: 0.139 ms  
 Execution time: 1414.142 ms  
(6 rows)  

3、limit 10同样,id 索引扫描,慢。

postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2=200 order by id limit 10;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..32.59 rows=10 width=20) (actual time=1403.861..1403.865 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=91020  
   ->  Index Scan using idx_tbl_1 on public.tbl  (cost=0.43..323244.26 rows=100533 width=20) (actual time=1403.859..1403.861 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Filter: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
         Rows Removed by Filter: 10000000  
         Buffers: shared hit=91020  
 Planning time: 0.127 ms  
 Execution time: 1403.893 ms  
(10 rows)  

优化方法

1、使用HINT或者改SQL,强制不走ID扫描。

并不适用于所有场景,比如数据分布均匀时,那么PG的这周成本计算方法就对口,那么什么时候使用ID,什么时候使用C1,C2索引扫描就合乎常理了。

postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2=200 order by id+0 limit 10;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=64429.79..64429.81 rows=10 width=24) (actual time=409.622..409.626 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4, ((id + 0))  
   Buffers: shared hit=10205  
   ->  Sort  (cost=64429.79..64681.12 rows=100533 width=24) (actual time=409.620..409.621 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4, ((id + 0))  
         Sort Key: ((tbl.id + 0))  
         Sort Method: top-N heapsort  Memory: 25kB  
         Buffers: shared hit=10205  
         ->  Bitmap Heap Scan on public.tbl  (cost=1457.82..62257.30 rows=100533 width=24) (actual time=47.347..237.455 rows=1000000 loops=1)  
               Output: id, c1, c2, c3, c4, (id + 0)  
               Recheck Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
               Heap Blocks: exact=6370  
               Buffers: shared hit=10205  
               ->  Bitmap Index Scan on idx_tbl_2  (cost=0.00..1432.69 rows=100533 width=0) (actual time=46.577..46.577 rows=1000000 loops=1)  
                     Index Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
                     Buffers: shared hit=3835  
 Planning time: 0.133 ms  
 Execution time: 409.670 ms  
(18 rows)  

2、如果c1,c2是等值查询的话,可以用以下索引,那么效率是最高的。

postgres=# create index idx_tbl_3 on tbl(c1,c2,id);  
CREATE INDEX  
  
  
postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 =200 order by id limit 10;  
                                                              QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.56..6.93 rows=10 width=20) (actual time=0.102..0.106 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=1 read=4  
   I/O Timings: read=0.047  
   ->  Index Scan using idx_tbl_3 on public.tbl  (cost=0.56..64086.79 rows=100533 width=20) (actual time=0.101..0.103 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Index Cond: ((tbl.c1 = 200) AND (tbl.c2 = 200))  
         Buffers: shared hit=1 read=4  
         I/O Timings: read=0.047  
 Planning time: 0.142 ms  
 Execution time: 0.131 ms  
(11 rows)  

3、建议方法:

注意方法2 不适合非等值查询,

postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  
                                                                  QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..35.32 rows=10 width=20) (actual time=1371.094..1371.099 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=91020  
   ->  Index Scan using idx_tbl_1 on public.tbl  (cost=0.43..350743.84 rows=100533 width=20) (actual time=1371.092..1371.095 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Filter: ((tbl.c2 >= 100) AND (tbl.c2 <= 300) AND (tbl.c1 = 200))  
         Rows Removed by Filter: 10000000  
         Buffers: shared hit=91020  
 Planning time: 0.278 ms  
 Execution time: 1371.128 ms  
(10 rows)  

但是不用担心,我们依旧可以使用其他等值查询列,加上排序列组成复合索引,在INDEX SCAN中使用FILTER来加速。

新增如下索引,加速如下

postgres=# create index idx_tbl_4 on tbl(c1,id);  
CREATE INDEX  
  
postgres=# explain ( analyze,verbose,timing,costs,buffers) select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  
                                                               QUERY PLAN                                                                 
----------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..10.47 rows=10 width=20) (actual time=0.105..0.110 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=1 read=3  
   I/O Timings: read=0.051  
   ->  Index Scan using idx_tbl_4 on public.tbl  (cost=0.43..100877.50 rows=100533 width=20) (actual time=0.104..0.107 rows=10 loops=1)  
         Output: id, c1, c2, c3, c4  
         Index Cond: (tbl.c1 = 200)  
         Filter: ((tbl.c2 >= 100) AND (tbl.c2 <= 300))  
         Buffers: shared hit=1 read=3  
         I/O Timings: read=0.051  
 Planning time: 0.172 ms  
 Execution time: 0.134 ms  
(12 rows)  

附Oracle

《Oracle migration to Greenplum - (含 Ora2pg)》

http://www.dba-oracle.com/t_OracleAutotrace.htm

SQL> create table tbl(id int, c1 int, c2 int, c3 int, c4 int);

Table created.

SQL> insert into tbl select rownum,trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)),trunc(dbms_random.value(0, 100)) from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> insert into tbl select rownum+10000000, 200,200,200,200 from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index idx_tbl_1 on tbl(id);

Index created.

SQL> create index idx_tbl_2 on tbl(c1,c2,c3,c4);

Index created.

SQL> set linesize 512
SQL> set pagesize 50000

SQL> set autotrace on;

SQL> exec DBMS_STATS.GATHER_TABLE_STATS('JIUDU','TBL'); 

PL/SQL procedure successfully completed.

SQL> select * from (select * from tbl where c1=200 and c2=200 order by id) t where rownum<10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
  10000001        200        200        200        200
  10000002        200        200        200        200
  10000003        200        200        200        200
  10000004        200        200        200        200
  10000005        200        200        200        200
  10000006        200        200        200        200
  10000007        200        200        200        200
  10000008        200        200        200        200
  10000009        200        200        200        200

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 745043579

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     9 |   585 |       | 10253   (2)| 00:02:04 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 84875 |  5387K|       | 10253   (2)| 00:02:04 |
|*  3 |    SORT ORDER BY STOPKEY|      | 84875 |  1491K|  2672K| 10253   (2)| 00:02:04 |
|*  4 |     TABLE ACCESS FULL   | TBL  | 84875 |  1491K|       |  9767   (2)| 00:01:58 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   4 - filter("C1"=200 AND "C2"=200)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      34868  consistent gets
          0  physical reads
          0  redo size
        937  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed

SQL> select * from (select * from tbl where c1=1 and c2=1 order by id) t where rownum<10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
      9697          1          1         78         39
     20586          1          1         81         71
     27820          1          1         33         64
     44324          1          1         26         27
     47079          1          1          3          5
     64669          1          1         13         49
     73715          1          1         20         74
     80903          1          1         96         25
     98368          1          1         59          9

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 447312937

--------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |     9 |   585 |   641   (1)| 00:00:08 |
|*  1 |  COUNT STOPKEY                 |           |       |       |            |          |
|   2 |   VIEW                         |           |   704 | 45760 |   641   (1)| 00:00:08 |
|*  3 |    SORT ORDER BY STOPKEY       |           |   704 | 12672 |   641   (1)| 00:00:08 |
|   4 |     TABLE ACCESS BY INDEX ROWID| TBL       |   704 | 12672 |   640   (0)| 00:00:08 |
|*  5 |      INDEX RANGE SCAN          | IDX_TBL_2 |   704 |       |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   5 - access("C1"=1 AND "C2"=1)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1072  consistent gets
         11  physical reads
          0  redo size
        969  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed


SQL> create index idx_tbl_3 on tbl(c1,c2,id);
Index created.

SQL> select * from (select * from tbl where c1=200 and c2 between 100 and 300 order by id) t where rownum < 10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
  10000001        200        200        200        200
  10000002        200        200        200        200
  10000003        200        200        200        200
  10000004        200        200        200        200
  10000005        200        200        200        200
  10000006        200        200        200        200
  10000007        200        200        200        200
  10000008        200        200        200        200
  10000009        200        200        200        200

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 745043579

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     9 |   585 |       | 10253   (2)| 00:02:04 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 84875 |  5387K|       | 10253   (2)| 00:02:04 |
|*  3 |    SORT ORDER BY STOPKEY|      | 84875 |  1491K|  2672K| 10253   (2)| 00:02:04 |
|*  4 |     TABLE ACCESS FULL   | TBL  | 84875 |  1491K|       |  9767   (2)| 00:01:58 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   4 - filter("C1"=200 AND "C2">=100 AND "C2"<=300)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      34868  consistent gets
          0  physical reads
          0  redo size
        937  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          9  rows processed

SQL> select * from (select * from tbl where c1=200 and c2 =200 order by id) t where rownum < 10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
  10000001        200        200        200        200
  10000002        200        200        200        200
  10000003        200        200        200        200
  10000004        200        200        200        200
  10000005        200        200        200        200
  10000006        200        200        200        200
  10000007        200        200        200        200
  10000008        200        200        200        200
  10000009        200        200        200        200

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1825274432

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     9 |   585 |    12   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |           |       |       |            |          |
|   2 |   VIEW                        |           |    10 |   650 |    12   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TBL       |    10 |   180 |    12   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | IDX_TBL_3 |       |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   4 - access("C1"=200 AND "C2"=200)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          2  physical reads
          0  redo size
        937  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          9  rows processed

同时也发现一个问题,Oracle可能无法使用index filter来优化,例如将index2,index3删除后,留下ID索引,Oracle无法走索引,而PG可以。

SQL> select * from (select * from tbl where c1=1 and c2 =1 order by id) t where rownum < 10;

        ID         C1         C2         C3         C4
---------- ---------- ---------- ---------- ----------
      9697          1          1         78         39
     20586          1          1         81         71
     27820          1          1         33         64
     44324          1          1         26         27
     47079          1          1          3          5
     64669          1          1         13         49
     73715          1          1         20         74
     80903          1          1         96         25
     98368          1          1         59          9

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 745043579

--------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     9 |   585 |  9766   (2)| 00:01:58 |
|*  1 |  COUNT STOPKEY          |      |       |       |            |          |
|   2 |   VIEW                  |      |   704 | 45760 |  9766   (2)| 00:01:58 |
|*  3 |    SORT ORDER BY STOPKEY|      |   704 | 12672 |  9766   (2)| 00:01:58 |
|*  4 |     TABLE ACCESS FULL   | TBL  |   704 | 12672 |  9765   (2)| 00:01:58 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<10)
   3 - filter(ROWNUM<10)
   4 - filter("C1"=1 AND "C2"=1)


Statistics
----------------------------------------------------------
        186  recursive calls
          0  db block gets
      34893  consistent gets
          0  physical reads
          0  redo size
        969  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
          9  rows processed
  

PG


postgres=# explain analyze select * from tbl where c1=1 and c2 =1 order by id limit 10;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..3703.11 rows=10 width=20) (actual time=7.199..23.926 rows=10 loops=1)
   ->  Index Scan using idx_tbl_1 on tbl  (cost=0.43..323243.84 rows=873 width=20) (actual time=7.198..23.921 rows=10 loops=1)
         Filter: ((c1 = 1) AND (c2 = 1))
         Rows Removed by Filter: 142814
 Planning time: 0.119 ms
 Execution time: 23.950 ms
(6 rows)

小结

当SQL查询中包括排序,以及其他字段的过滤条件,并使用LIMIT快速返回少量数据时,如果满足条件的数据分布在排序键的末端,那么优化器给出的执行计划可能是不好的,导致通过排序索引扫描更多的数据后才能命中需要的记录。

然而,数据库目前使用的评估走排序键时,LIMIT需要扫描多少条记录,使用了数据均匀分布的假设,所以在数据(满足条件的数据与排序键本身的相关性不均匀)分布不均匀时,导致成本估算不准(oracle干脆走全表扫描)。

建议优化方法:

增加索引,创建等值查询条件列(s)加排序列(s)组成的复合索引,降低扫描量。

例子

select * from tbl where c1=200 and c2 between 100 and 300 order by id limit 10;  
  
增加索引  
  
(c1,id)  -- 索引扫描, filter c2  
  
已有  
(c1,c2)  -- 索引扫描, sort id  
(id)     -- 索引扫描, filter c1,c2  
select * from tbl where c1=200 and c2 =200 order by id limit 10;  
  
增加索引  
  
(c1,c2,id)  -- 索引扫描  
  
已有  
(c1,c2)  -- 索引扫描, sort id  
(id)     -- 索引扫描, filter c1,c2  

参考

《PostgreSQL 10 黑科技 - 自定义统计信息》

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
5天前
|
JSON 关系型数据库 PostgreSQL
PostgreSQL 9种索引的原理和应用场景
PostgreSQL 支持九种主要索引类型,包括 B-Tree、Hash、GiST、SP-GiST、GIN、BRIN、Bitmap、Partial 和 Unique 索引。每种索引适用于不同场景,如 B-Tree 适合范围查询和排序,Hash 仅用于等值查询,GiST 支持全文搜索和几何数据查询,GIN 适用于多值列和 JSON 数据,BRIN 适合非常大的表,Bitmap 适用于低基数列,Partial 只对部分数据创建索引,Unique 确保列值唯一。
|
6月前
|
监控 关系型数据库 数据库
PostgreSQL的索引优化策略?
【8月更文挑战第26天】PostgreSQL的索引优化策略?
144 1
|
3月前
|
关系型数据库 MySQL PostgreSQL
postgresql和mysql中的limit使用方法
postgresql和mysql中的limit使用方法
105 1
|
6月前
|
SQL 关系型数据库 MySQL
SQL Server、MySQL、PostgreSQL:主流数据库SQL语法异同比较——深入探讨数据类型、分页查询、表创建与数据插入、函数和索引等关键语法差异,为跨数据库开发提供实用指导
【8月更文挑战第31天】SQL Server、MySQL和PostgreSQL是当今最流行的关系型数据库管理系统,均使用SQL作为查询语言,但在语法和功能实现上存在差异。本文将比较它们在数据类型、分页查询、创建和插入数据以及函数和索引等方面的异同,帮助开发者更好地理解和使用这些数据库。尽管它们共用SQL语言,但每个系统都有独特的语法规则,了解这些差异有助于提升开发效率和项目成功率。
657 0
|
6月前
|
关系型数据库 数据库 PostgreSQL
PostgreSQL索引维护看完这篇就够了
PostgreSQL索引维护看完这篇就够了
426 0
|
8月前
|
SQL 关系型数据库 PostgreSQL
【sql】PostgreSQL物化视图表使用案例
【sql】PostgreSQL物化视图表使用案例
75 0
|
8月前
|
关系型数据库 分布式数据库 数据库
PolarDB产品使用合集之优化器对索引的阈值一般是多少
PolarDB是阿里云推出的一种云原生数据库服务,专为云设计,提供兼容MySQL、PostgreSQL的高性能、低成本、弹性可扩展的数据库解决方案,可以有效地管理和优化PolarDB实例,确保数据库服务的稳定、高效运行。以下是使用PolarDB产品的一些建议和最佳实践合集。
|
SQL Cloud Native 关系型数据库
ADBPG(AnalyticDB for PostgreSQL)是阿里云提供的一种云原生的大数据分析型数据库
ADBPG(AnalyticDB for PostgreSQL)是阿里云提供的一种云原生的大数据分析型数据库
1352 1
|
数据可视化 关系型数据库 MySQL
将 PostgreSQL 迁移到 MySQL 数据库
将 PostgreSQL 迁移到 MySQL 数据库
1814 2
|
SQL 存储 自然语言处理
玩转阿里云RDS PostgreSQL数据库通过pg_jieba插件进行分词
在当今社交媒体的时代,人们通过各种平台分享自己的生活、观点和情感。然而,对于平台管理员和品牌经营者来说,了解用户的情感和意见变得至关重要。为了帮助他们更好地了解用户的情感倾向,我们可以使用PostgreSQL中的pg_jieba插件对这些发帖进行分词和情感分析,来构建一个社交媒体情感分析系统,系统将根据用户的发帖内容,自动判断其情感倾向是积极、消极还是中性,并将结果存储在数据库中。
玩转阿里云RDS PostgreSQL数据库通过pg_jieba插件进行分词

热门文章

最新文章

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版