通过 PolarDB for PostgreSQL 实现一体化的 HTAP 能力

本文涉及的产品
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
RDS SQL Server Serverless,2-4RCU 50GB 3个月
推荐场景:
云原生数据库 PolarDB 分布式版,标准版 2核8GB
简介: 阿里云 PolarDB for PostgreSQL作为一款领先的云原生关系型数据库,利用向量化引擎+列存索引等技术实现了 OLTP 和 OLAP 的一体化。本方案为您展示如何通过 PolarDB for PostgreSQL 来实现一体化的 HTAP 能力。

背景

在当今快速发展的数字化时代,企业对实时数据分析和高效事务处理的需求日益增长。传统的数据库架构往往将在线交易处理(OLTP)与在线分析处理(OLAP)分离,导致系统复杂性增加、维护成本高昂,并且难以满足实时决策支持的要求。随着技术的进步,混合事务与分析处理(HTAP)逐渐成为主流趋势,它允许在同一平台上同时进行高效的交易处理和复杂的数据分析。

阿里云 PolarDB for PostgreSQL作为一款领先的云原生关系型数据库,利用向量化引擎+列存索引等技术实现了 OLTP 和 OLAP 的一体化。用户不仅能够享受到高并发事务处理的卓越性能,还能即时获取最新业务数据的深度洞察,大大缩短了从数据到价值的时间周期。

本方案将为您展示如何通过 PolarDB for PostgreSQL 来实现一体化的 HTAP 能力。

TPC-H 性能测试

TPC-H是业界常用的性能标准测试,由TPC委员会制定发布,用于评测数据库的分析能力。包含8张数据表、22条复杂的SQL查询,包含单表统计、多表 Join、子查询、聚合、排序等。

本次测试数据量:10GB。

测试环境:

  • ECS:规格为 2C2GB(ecs.e-c1m1.large),挂载的磁盘 40GB
  • PolarDB PostgreSQL:
  • 计费类型:按量付费
  • 引擎版本:PostgreSQL 14
  • 规格为 4C 16GB(标准版,通用规格,X86)
  • 节点个数:1 个读写节点,0 个只读节点

注:ECS实例和PolarDB集群需保证在同一个VPC中。

步骤

  • 登录到 ECS 服务器,下载 TPC-H 工具:dbgen.tar.gz,解压后进入到目录编译 dbgen,并生成 10GB 的测试数据。约 5 分钟。
mkdir /data
cd /data
wget https://ganos-hz.oss-cn-hangzhou.aliyuncs.com/csi/dbgen.tar.gz
tar -zxvf dbgen.tar.gz
cd dbgen
make -f makefile.suite
./dbgen -s 10 -f


  • 在 ECS 上通过 psql 连接到 PolarDB PostgreSQL 实例。
## 下载polardb tools工具箱,里面包含psql命令
sudo yum install https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241113/phrrjc/PolarDB-Tools-2.0.14.26.0-20241023104506.al8.x86_64.rpm
## 执行psql命令,连接到数据库
/u01/polardb_pg_tools/bin/psql -h xxx -p xxx -Uxxx postgres


  • 在 PolarDB PostgreSQL 实例中导入生成的 TPC-H 数据。 约 5 分钟
--创建tpchdb
CREATE DATABASE tpchdb
WITH 
    ENCODING = 'UTF8'
    LC_COLLATE = 'C'
    LC_CTYPE = 'C'
    TEMPLATE template0;
--切换到tpchdb
\c tpchdb
--在 tpchdb 中创建 polar_csi 插件
CREATE EXTENSION polar_csi;
--执行dbgen目录中的dss.ddl文件,创建tpch 8张表结构
\i ./dss.ddl
--将dbgen目录中生成的8张表的数据导入到tpchdb中
\copy part from ./part.tbl with delimiter as '|' NULL '';
\copy region from ./region.tbl with delimiter as '|' NULL '';
\copy nation from ./nation.tbl with delimiter as '|' NULL '';
\copy orders from ./orders.tbl with delimiter as '|' NULL '';
\copy customer from ./customer.tbl with delimiter as '|' NULL '';
\copy lineitem from ./lineitem.tbl with delimiter as '|' NULL '';
\copy partsupp from ./partsupp.tbl with delimiter as '|' NULL '';
\copy supplier from ./supplier.tbl with delimiter as '|' NULL '';


  • 设置向量化引擎和列存索引相关的参数,用于后续查询
SET polar_csi.enable_pk to on;
set polar_csi.enable_query to on;
set polar_csi.exec_parallel to 4;
set polar_csi.cost_threshold = 0;
set polar_csi.memory_limit = 2048;


  • 数据导入后,为每张表创建列存索引。约 5 分钟
--partsupp表
ALTER TABLE PARTSUPP ADD CONSTRAINT partsupp_pkey PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY);
CREATE INDEX imps ON partsupp USING csi(ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, ps_comment);
--part表
ALTER TABLE PART ADD CONSTRAINT part_kpey PRIMARY KEY (P_PARTKEY);
CREATE INDEX im_p ON part     USING csi(p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment);
--supplier表
ALTER TABLE SUPPLIER ADD CONSTRAINT supplier_pkey PRIMARY KEY (S_SUPPKEY);
CREATE INDEX im_s ON supplier USING csi(s_suppkey, s_name, s_address, s_nationkey, s_phone, s_acctbal, s_comment);
--customer表
ALTER TABLE CUSTOMER ADD CONSTRAINT customer_pkey PRIMARY KEY (C_CUSTKEY);
CREATE INDEX im_c ON customer USING csi(c_custkey, c_name, c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment);
--orders表
ALTER TABLE ORDERS ADD CONSTRAINT orders_pkey PRIMARY KEY (O_ORDERKEY);
CREATE INDEX im_o ON orders   USING csi(o_orderkey, o_custkey, o_orderstatus, o_totalprice, o_orderdate, o_orderpriority, o_clerk, o_shippriority, o_comment);
--lineitem表
ALTER TABLE LINEITEM ADD CONSTRAINT lineitem_pkey PRIMARY KEY (L_ORDERKEY, L_LINENUMBER);
CREATE INDEX im_l ON lineitem USING csi(l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment);
--nation表
ALTER TABLE NATION ADD CONSTRAINT nation_pkey PRIMARY KEY (N_NATIONKEY);
CREATE INDEX im_n ON nation   USING csi(n_nationkey, n_name, n_regionkey, n_comment);
--region表
ALTER TABLE REGION ADD CONSTRAINT region_pkey PRIMARY KEY (R_REGIONKEY);
CREATE INDEX im_r ON region   USING csi(r_regionkey, r_name, r_comment);


  • 开启向量化引擎,执行 22 条 SQL 语句,统计耗时
\timing
--Q1
explain analyze
select
  l_returnflag,
  l_linestatus,
  sum(l_quantity) as sum_qty,
  sum(l_extendedprice) as sum_base_price,
  sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
  sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
  avg(l_quantity) as avg_qty,
  avg(l_extendedprice) as avg_price,
  avg(l_discount) as avg_disc,
  count(*) as count_order
from
  lineitem
where l_shipdate <= date '1998-12-01' - '60 day'::interval
group by
  l_returnflag,
  l_linestatus
order by
  l_returnflag,
  l_linestatus;
--Q2
explain analyze
select
  s_acctbal,
  s_name,
  n_name,
  p_partkey,
  p_mfgr,
  s_address,
  s_phone,
  s_comment
from
  part,
  supplier,
  partsupp,
  nation,
  region
where
  p_partkey = ps_partkey
  and s_suppkey = ps_suppkey
  and p_size = 43
  and p_type like '%NICKEL'
  and s_nationkey = n_nationkey
  and n_regionkey = r_regionkey
  and r_name = 'MIDDLE EAST'
  and ps_supplycost = (
    select
      min(ps_supplycost)
    from
      partsupp,
      supplier,
      nation,
      region
    where p_partkey = ps_partkey
      and s_suppkey = ps_suppkey
      and s_nationkey = n_nationkey
      and n_regionkey = r_regionkey
      and r_name = 'MIDDLE EAST'
  )
order by
  s_acctbal desc,
  n_name,
  s_name,
  p_partkey limit 100;
--Q3
explain analyze
select l_orderkey,
  sum(l_extendedprice * (1 - l_discount)) as revenue,
  o_orderdate,
  o_shippriority
from
  customer,
  orders,
  lineitem
where c_mktsegment = 'FURNITURE'
  and c_custkey = o_custkey
  and l_orderkey = o_orderkey
  and o_orderdate < date '1995-03-05'
  and l_shipdate > date '1995-03-05'
group by
  l_orderkey,
  o_orderdate,
  o_shippriority
order by
  revenue desc,
  o_orderdate limit 100;
--Q4
explain analyze
select
  o_orderpriority,
  count(*) as order_count
from
  orders
where
  o_orderdate >= date '1993-05-01'
  and o_orderdate < date '1993-05-01' + interval '3 month'::interval
  and exists (
    select
      *
    from
      lineitem
    where
      l_orderkey = o_orderkey
      and l_commitdate < l_receiptdate
  )
group by
  o_orderpriority
order by
  o_orderpriority;
--Q5
explain analyze
select
        n_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue
from
        customer,
        orders,
        lineitem,
        supplier,
        nation,
        region
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and l_suppkey = s_suppkey
        and c_nationkey = s_nationkey
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'MIDDLE EAST'
        and o_orderdate >= date '1995-01-01'
        and o_orderdate < date '1995-01-01' + '1 year'::interval
group by
        n_name
order by
        revenue desc;
--Q6
explain analyze
select
        sum(l_extendedprice * l_discount) as revenue
from
        lineitem
where
        l_shipdate >= date '1993-01-01'
        and l_shipdate < date '1993-01-01' + '1 year'::interval
        and l_discount between 0.03 - 0.01 and 0.03 + 0.01
        and l_quantity < 24;
--Q7
explain analyze 
  SELECT
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) AS revenue
FROM (
    SELECT
        n1.n_name AS supp_nation,
        n2.n_name AS cust_nation,
        extract(year FROM l_shipdate) AS l_year,
        l_extendedprice * (1 - l_discount) AS volume
    FROM
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2
    WHERE
        s_suppkey = l_suppkey
        AND o_orderkey = l_orderkey
        AND c_custkey = o_custkey
        AND s_nationkey = n1.n_nationkey
        AND c_nationkey = n2.n_nationkey
        AND ((n1.n_name = 'FRANCE'
                AND n2.n_name = 'GERMANY')
            OR (n1.n_name = 'GERMANY'
                AND n2.n_name = 'FRANCE'))
        AND l_shipdate BETWEEN CAST('1995-01-01' AS date)
        AND CAST('1996-12-31' AS date)) AS shipping
GROUP BY
    supp_nation,
    cust_nation,
    l_year
ORDER BY
    supp_nation,
    cust_nation,
    l_year;
--Q8
explain analyze
select
        o_year,
        sum(case
                when nation = 'INDONESIA' then volume
                else 0
        end) / sum(volume) as mkt_share
from
        (
                select
                        extract(year from o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) as volume,
                        n2.n_name as nation
                from
                        part,
                        supplier,
                        lineitem,
                        orders,
                        customer,
                        nation n1,
                        nation n2,
                        region
                where
                        p_partkey = l_partkey
                        and s_suppkey = l_suppkey
                        and l_orderkey = o_orderkey
                        and o_custkey = c_custkey
                        and c_nationkey = n1.n_nationkey
                        and n1.n_regionkey = r_regionkey
                        and r_name = 'ASIA'
                        and s_nationkey = n2.n_nationkey
                        and o_orderdate between '1995-01-01'::date and '1996-12-31'::date
                        and p_type = 'PROMO POLISHED NICKEL'
        ) as all_nations
group by
        o_year
order by
        o_year;
--Q9
explain analyze
select
        nation,
        o_year,
        sum(amount) as sum_profit
from
        (
                select
                        n_name as nation,
                        extract(year from o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
                from
                        part,
                        supplier,
                        lineitem,
                        partsupp,
                        orders,
                        nation
                where
                        s_suppkey = l_suppkey
                        and ps_suppkey = l_suppkey
                        and ps_partkey = l_partkey
                        and p_partkey = l_partkey
                        and o_orderkey = l_orderkey
                        and s_nationkey = n_nationkey
                        and p_name like '%navajo%'
        ) as profit
group by
        nation,
        o_year
order by
        nation,
        o_year desc limit 100;
--Q10
explain analyze
select
        c_custkey,
        c_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        c_acctbal,
        n_name,
        c_address,
        c_phone,
        c_comment
from
        customer,
        orders,
        lineitem,
        nation
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate >= date '1993-08-01'
        and o_orderdate < date '1993-08-01' + '3 month'::interval
        and l_returnflag = 'R'
        and c_nationkey = n_nationkey
group by
        c_custkey,
        c_name,
        c_acctbal,
        c_phone,
        n_name,
        c_address,
        c_comment
order by
        revenue desc
        LIMIT 20;
--Q11
explain analyze
select
        ps_partkey,
        sum(ps_supplycost * ps_availqty) as value
from
        partsupp,
        supplier,
        nation
where
        ps_suppkey = s_suppkey
        and s_nationkey = n_nationkey
        and n_name = 'ALGERIA'
group by
        ps_partkey having
                sum(ps_supplycost * ps_availqty) > (
                        select
                                sum(ps_supplycost * ps_availqty) * 0.0001000000
                        from
                                partsupp,
                                supplier,
                                nation
                        where
                                ps_suppkey = s_suppkey
                                and s_nationkey = n_nationkey
                                and n_name = 'ALGERIA'
                )
order by
        value desc;
--Q12
explain analyze
select
        l_shipmode,
        sum(case
                when o_orderpriority = '1-URGENT'
                        or o_orderpriority = '2-HIGH'
                        then 1
                else 0
        end) as high_line_count,
        sum(case
                when o_orderpriority <> '1-URGENT'
                        and o_orderpriority <> '2-HIGH'
                        then 1
                else 0
        end) as low_line_count
from
        orders,
        lineitem
where
        o_orderkey = l_orderkey
        and l_shipmode in ('AIR', 'FOB')
        and l_commitdate < l_receiptdate
        and l_shipdate < l_commitdate
        and l_receiptdate >= date '1996-01-01'
        and l_receiptdate < date '1996-01-01' + '1 year'::interval
group by
        l_shipmode
order by
        l_shipmode;
--Q13
explain analyze
select
        c_count,
        count(*) as custdist
from
        (
                select
                        c_custkey,
                        count(o_orderkey)
                from
                        customer left outer join orders on
                                c_custkey = o_custkey
                                and o_comment not like '%pending%requests%'
                group by
                        c_custkey
        ) as c_orders (c_custkey, c_count)
group by
        c_count
order by
        custdist desc,
        c_count desc;
--Q14
explain analyze
select
        100.00 * sum(case
                when p_type like 'PROMO%'
                        then l_extendedprice * (1 - l_discount)
                else 0
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
        lineitem,
        part
where
        l_partkey = p_partkey
        and l_shipdate >= date '1995-02-01'
        and l_shipdate < date '1995-02-01' + interval '1 month'::interval;
--Q15
explain analyze
with revenue0 as (
  select
    l_suppkey as supplier_no,
    sum(l_extendedprice * (1 - l_discount)) as total_revenue
  from
    lineitem
  where l_shipdate >= date '1995-08-01'
    and l_shipdate < date '1995-08-01' + '3 month'::interval
  group by
    l_suppkey)
select
  s_suppkey,
  s_name,
  s_address,
  s_phone,
  total_revenue
from
  supplier,
  revenue0
where
  s_suppkey = supplier_no
  and total_revenue = (
    select
      max(total_revenue)
    from
      revenue0
  )
order by
  s_suppkey;
--Q16
explain analyze
select
        p_brand,
        p_type,
        p_size,
        count(distinct ps_suppkey) as supplier_cnt
from
        partsupp,
        part
where
        p_partkey = ps_partkey
        and p_brand <> 'Brand#13'
        and p_type not like 'ECONOMY BRUSHED%'
        and p_size in (11, 8, 10, 31, 21, 13, 32, 28)
        and ps_suppkey not in (
                select
                        s_suppkey
                from
                        supplier
                where
                        s_comment like '%Customer%Complaints%'
        )
group by
        p_brand,
        p_type,
        p_size
order by
        supplier_cnt desc,
        p_brand,
        p_type,
        p_size limit 100;
--Q17
explain analyze
select
        sum(l_extendedprice) / 7.0 as avg_yearly
from
        lineitem,
        part
where
        p_partkey = l_partkey
        and p_brand = 'Brand#44'
        and p_container = 'MED PKG'
        and l_quantity < (
                select
                        0.2 * avg(l_quantity)
                from
                        lineitem
                where
                        l_partkey = p_partkey
        );
--Q18
explain analyze
select
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice,
        sum(l_quantity)
from
        customer,
        orders,
        lineitem
where
        o_orderkey in (
                select
                        l_orderkey
                from
                        lineitem
                group by
                        l_orderkey having
                                sum(l_quantity) > 313
        )
        and c_custkey = o_custkey
        and o_orderkey = l_orderkey
group by
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice
order by
        o_totalprice desc,
        o_orderdate limit 100;
--LIMIT 100
--Q19
explain analyze
select
        sum(l_extendedprice* (1 - l_discount)) as revenue
from
        lineitem,
        part
where
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#15'
                and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                and l_quantity >= 10 and l_quantity <= 10 + 10
                and p_size between 1 and 5
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#45'
                and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                and l_quantity >= 18 and l_quantity <= 18 + 10
                and p_size between 1 and 10
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#21'
                and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
                and l_quantity >= 30 and l_quantity <= 30 + 10
                and p_size between 1 and 15
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        );
--LIMIT -1
--Q20
explain analyze
select
        s_name,
        s_address
from
        supplier,
        nation
where
        s_suppkey in (
                select
                        ps_suppkey
                from
                        partsupp
                where
                        ps_partkey in (
                                select
                                        p_partkey
                                from
                                        part
                                where
                                        p_name like 'lemon%'
                        )
                        and ps_availqty > (
                                select
                                        0.5 * sum(l_quantity)
                                from
                                        lineitem
                                where
                                        l_partkey = ps_partkey
                                        and l_suppkey = ps_suppkey
                                        and l_shipdate >= date '1997-01-01'
                                        and l_shipdate < date '1997-01-01' + '1 year'::interval
                        )
        )
        and s_nationkey = n_nationkey
        and n_name = 'INDONESIA'
order by
        s_name limit 100;
--Q21
explain analyze
select
        s_name,
        count(*) as numwait
from
        supplier,
        lineitem l1,
        orders,
        nation
where
        s_suppkey = l1.l_suppkey
        and o_orderkey = l1.l_orderkey
        and o_orderstatus = 'F'
        and l1.l_receiptdate > l1.l_commitdate
        and exists (
                select
                        *
                from
                        lineitem l2
                where
                        l2.l_orderkey = l1.l_orderkey
                        and l2.l_suppkey <> l1.l_suppkey
        )
        and not exists (
                select
                        *
                from
                        lineitem l3
                where
                        l3.l_orderkey = l1.l_orderkey
                        and l3.l_suppkey <> l1.l_suppkey
                        and l3.l_receiptdate > l3.l_commitdate
        )
        and s_nationkey = n_nationkey
        and n_name = 'INDIA'
group by
        s_name
order by
        numwait desc,
        s_name limit 100;
--LIMIT 100
--Q22
explain analyze
select
        cntrycode,
        count(*) as numcust,
        sum(c_acctbal) as totacctbal
from
        (
                select
                        substring(c_phone from 1 for 2) as cntrycode,
                        c_acctbal
                from
                        customer
                where
                        substring(c_phone from 1 for 2) in
                                ('16', '17', '24', '21', '19', '22', '15')
                        and c_acctbal > (
                                select
                                        avg(c_acctbal)
                                from
                                        customer
                                where
                                        c_acctbal > 0.00
                                        and substring(c_phone from 1 for 2) in
                                                ('16', '17', '24', '21', '19', '22', '15')
                        )
                        and not exists (
                                select
                                        *
                                from
                                        orders
                                where
                                        o_custkey = c_custkey
                        )
        ) as custsale
group by
        cntrycode
order by
        cntrycode;
--LIMIT -1


  • 关闭向量化引擎,执行 22 条语句,统计耗时
\timing
set polar_csi.enable_query to off;
--Q1
explain analyze
select
  l_returnflag,
  l_linestatus,
  sum(l_quantity) as sum_qty,
  sum(l_extendedprice) as sum_base_price,
  sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
  sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
  avg(l_quantity) as avg_qty,
  avg(l_extendedprice) as avg_price,
  avg(l_discount) as avg_disc,
  count(*) as count_order
from
  lineitem
where l_shipdate <= date '1998-12-01' - '60 day'::interval
group by
  l_returnflag,
  l_linestatus
order by
  l_returnflag,
  l_linestatus;
--Q2
explain analyze
select
  s_acctbal,
  s_name,
  n_name,
  p_partkey,
  p_mfgr,
  s_address,
  s_phone,
  s_comment
from
  part,
  supplier,
  partsupp,
  nation,
  region
where
  p_partkey = ps_partkey
  and s_suppkey = ps_suppkey
  and p_size = 43
  and p_type like '%NICKEL'
  and s_nationkey = n_nationkey
  and n_regionkey = r_regionkey
  and r_name = 'MIDDLE EAST'
  and ps_supplycost = (
    select
      min(ps_supplycost)
    from
      partsupp,
      supplier,
      nation,
      region
    where p_partkey = ps_partkey
      and s_suppkey = ps_suppkey
      and s_nationkey = n_nationkey
      and n_regionkey = r_regionkey
      and r_name = 'MIDDLE EAST'
  )
order by
  s_acctbal desc,
  n_name,
  s_name,
  p_partkey limit 100;
--Q3
explain analyze
select l_orderkey,
  sum(l_extendedprice * (1 - l_discount)) as revenue,
  o_orderdate,
  o_shippriority
from
  customer,
  orders,
  lineitem
where c_mktsegment = 'FURNITURE'
  and c_custkey = o_custkey
  and l_orderkey = o_orderkey
  and o_orderdate < date '1995-03-05'
  and l_shipdate > date '1995-03-05'
group by
  l_orderkey,
  o_orderdate,
  o_shippriority
order by
  revenue desc,
  o_orderdate limit 100;
--Q4
explain analyze
select
  o_orderpriority,
  count(*) as order_count
from
  orders
where
  o_orderdate >= date '1993-05-01'
  and o_orderdate < date '1993-05-01' + interval '3 month'::interval
  and exists (
    select
      *
    from
      lineitem
    where
      l_orderkey = o_orderkey
      and l_commitdate < l_receiptdate
  )
group by
  o_orderpriority
order by
  o_orderpriority;
--Q5
explain analyze
select
        n_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue
from
        customer,
        orders,
        lineitem,
        supplier,
        nation,
        region
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and l_suppkey = s_suppkey
        and c_nationkey = s_nationkey
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'MIDDLE EAST'
        and o_orderdate >= date '1995-01-01'
        and o_orderdate < date '1995-01-01' + '1 year'::interval
group by
        n_name
order by
        revenue desc;
--Q6
explain analyze
select
        sum(l_extendedprice * l_discount) as revenue
from
        lineitem
where
        l_shipdate >= date '1993-01-01'
        and l_shipdate < date '1993-01-01' + '1 year'::interval
        and l_discount between 0.03 - 0.01 and 0.03 + 0.01
        and l_quantity < 24;
--Q7
explain analyze 
  SELECT
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) AS revenue
FROM (
    SELECT
        n1.n_name AS supp_nation,
        n2.n_name AS cust_nation,
        extract(year FROM l_shipdate) AS l_year,
        l_extendedprice * (1 - l_discount) AS volume
    FROM
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2
    WHERE
        s_suppkey = l_suppkey
        AND o_orderkey = l_orderkey
        AND c_custkey = o_custkey
        AND s_nationkey = n1.n_nationkey
        AND c_nationkey = n2.n_nationkey
        AND ((n1.n_name = 'FRANCE'
                AND n2.n_name = 'GERMANY')
            OR (n1.n_name = 'GERMANY'
                AND n2.n_name = 'FRANCE'))
        AND l_shipdate BETWEEN CAST('1995-01-01' AS date)
        AND CAST('1996-12-31' AS date)) AS shipping
GROUP BY
    supp_nation,
    cust_nation,
    l_year
ORDER BY
    supp_nation,
    cust_nation,
    l_year;
--Q8
explain analyze
select
        o_year,
        sum(case
                when nation = 'INDONESIA' then volume
                else 0
        end) / sum(volume) as mkt_share
from
        (
                select
                        extract(year from o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) as volume,
                        n2.n_name as nation
                from
                        part,
                        supplier,
                        lineitem,
                        orders,
                        customer,
                        nation n1,
                        nation n2,
                        region
                where
                        p_partkey = l_partkey
                        and s_suppkey = l_suppkey
                        and l_orderkey = o_orderkey
                        and o_custkey = c_custkey
                        and c_nationkey = n1.n_nationkey
                        and n1.n_regionkey = r_regionkey
                        and r_name = 'ASIA'
                        and s_nationkey = n2.n_nationkey
                        and o_orderdate between '1995-01-01'::date and '1996-12-31'::date
                        and p_type = 'PROMO POLISHED NICKEL'
        ) as all_nations
group by
        o_year
order by
        o_year;
--Q9
explain analyze
select
        nation,
        o_year,
        sum(amount) as sum_profit
from
        (
                select
                        n_name as nation,
                        extract(year from o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
                from
                        part,
                        supplier,
                        lineitem,
                        partsupp,
                        orders,
                        nation
                where
                        s_suppkey = l_suppkey
                        and ps_suppkey = l_suppkey
                        and ps_partkey = l_partkey
                        and p_partkey = l_partkey
                        and o_orderkey = l_orderkey
                        and s_nationkey = n_nationkey
                        and p_name like '%navajo%'
        ) as profit
group by
        nation,
        o_year
order by
        nation,
        o_year desc limit 100;
--Q10
explain analyze
select
        c_custkey,
        c_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        c_acctbal,
        n_name,
        c_address,
        c_phone,
        c_comment
from
        customer,
        orders,
        lineitem,
        nation
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate >= date '1993-08-01'
        and o_orderdate < date '1993-08-01' + '3 month'::interval
        and l_returnflag = 'R'
        and c_nationkey = n_nationkey
group by
        c_custkey,
        c_name,
        c_acctbal,
        c_phone,
        n_name,
        c_address,
        c_comment
order by
        revenue desc
        LIMIT 20;
--Q11
explain analyze
select
        ps_partkey,
        sum(ps_supplycost * ps_availqty) as value
from
        partsupp,
        supplier,
        nation
where
        ps_suppkey = s_suppkey
        and s_nationkey = n_nationkey
        and n_name = 'ALGERIA'
group by
        ps_partkey having
                sum(ps_supplycost * ps_availqty) > (
                        select
                                sum(ps_supplycost * ps_availqty) * 0.0001000000
                        from
                                partsupp,
                                supplier,
                                nation
                        where
                                ps_suppkey = s_suppkey
                                and s_nationkey = n_nationkey
                                and n_name = 'ALGERIA'
                )
order by
        value desc;
--Q12
explain analyze
select
        l_shipmode,
        sum(case
                when o_orderpriority = '1-URGENT'
                        or o_orderpriority = '2-HIGH'
                        then 1
                else 0
        end) as high_line_count,
        sum(case
                when o_orderpriority <> '1-URGENT'
                        and o_orderpriority <> '2-HIGH'
                        then 1
                else 0
        end) as low_line_count
from
        orders,
        lineitem
where
        o_orderkey = l_orderkey
        and l_shipmode in ('AIR', 'FOB')
        and l_commitdate < l_receiptdate
        and l_shipdate < l_commitdate
        and l_receiptdate >= date '1996-01-01'
        and l_receiptdate < date '1996-01-01' + '1 year'::interval
group by
        l_shipmode
order by
        l_shipmode;
--Q13
explain analyze
select
        c_count,
        count(*) as custdist
from
        (
                select
                        c_custkey,
                        count(o_orderkey)
                from
                        customer left outer join orders on
                                c_custkey = o_custkey
                                and o_comment not like '%pending%requests%'
                group by
                        c_custkey
        ) as c_orders (c_custkey, c_count)
group by
        c_count
order by
        custdist desc,
        c_count desc;
--Q14
explain analyze
select
        100.00 * sum(case
                when p_type like 'PROMO%'
                        then l_extendedprice * (1 - l_discount)
                else 0
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
        lineitem,
        part
where
        l_partkey = p_partkey
        and l_shipdate >= date '1995-02-01'
        and l_shipdate < date '1995-02-01' + interval '1 month'::interval;
--Q15
explain analyze
with revenue0 as (
  select
    l_suppkey as supplier_no,
    sum(l_extendedprice * (1 - l_discount)) as total_revenue
  from
    lineitem
  where l_shipdate >= date '1995-08-01'
    and l_shipdate < date '1995-08-01' + '3 month'::interval
  group by
    l_suppkey)
select
  s_suppkey,
  s_name,
  s_address,
  s_phone,
  total_revenue
from
  supplier,
  revenue0
where
  s_suppkey = supplier_no
  and total_revenue = (
    select
      max(total_revenue)
    from
      revenue0
  )
order by
  s_suppkey;
--Q16
explain analyze
select
        p_brand,
        p_type,
        p_size,
        count(distinct ps_suppkey) as supplier_cnt
from
        partsupp,
        part
where
        p_partkey = ps_partkey
        and p_brand <> 'Brand#13'
        and p_type not like 'ECONOMY BRUSHED%'
        and p_size in (11, 8, 10, 31, 21, 13, 32, 28)
        and ps_suppkey not in (
                select
                        s_suppkey
                from
                        supplier
                where
                        s_comment like '%Customer%Complaints%'
        )
group by
        p_brand,
        p_type,
        p_size
order by
        supplier_cnt desc,
        p_brand,
        p_type,
        p_size limit 100;
--Q17
explain analyze
select
        sum(l_extendedprice) / 7.0 as avg_yearly
from
        lineitem,
        part
where
        p_partkey = l_partkey
        and p_brand = 'Brand#44'
        and p_container = 'MED PKG'
        and l_quantity < (
                select
                        0.2 * avg(l_quantity)
                from
                        lineitem
                where
                        l_partkey = p_partkey
        );
--Q18
explain analyze
select
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice,
        sum(l_quantity)
from
        customer,
        orders,
        lineitem
where
        o_orderkey in (
                select
                        l_orderkey
                from
                        lineitem
                group by
                        l_orderkey having
                                sum(l_quantity) > 313
        )
        and c_custkey = o_custkey
        and o_orderkey = l_orderkey
group by
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice
order by
        o_totalprice desc,
        o_orderdate limit 100;
--LIMIT 100
--Q19
explain analyze
select
        sum(l_extendedprice* (1 - l_discount)) as revenue
from
        lineitem,
        part
where
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#15'
                and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                and l_quantity >= 10 and l_quantity <= 10 + 10
                and p_size between 1 and 5
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#45'
                and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                and l_quantity >= 18 and l_quantity <= 18 + 10
                and p_size between 1 and 10
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#21'
                and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
                and l_quantity >= 30 and l_quantity <= 30 + 10
                and p_size between 1 and 15
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        );
--LIMIT -1
--Q20
explain analyze
select
        s_name,
        s_address
from
        supplier,
        nation
where
        s_suppkey in (
                select
                        ps_suppkey
                from
                        partsupp
                where
                        ps_partkey in (
                                select
                                        p_partkey
                                from
                                        part
                                where
                                        p_name like 'lemon%'
                        )
                        and ps_availqty > (
                                select
                                        0.5 * sum(l_quantity)
                                from
                                        lineitem
                                where
                                        l_partkey = ps_partkey
                                        and l_suppkey = ps_suppkey
                                        and l_shipdate >= date '1997-01-01'
                                        and l_shipdate < date '1997-01-01' + '1 year'::interval
                        )
        )
        and s_nationkey = n_nationkey
        and n_name = 'INDONESIA'
order by
        s_name limit 100;
--Q21
explain analyze
select
        s_name,
        count(*) as numwait
from
        supplier,
        lineitem l1,
        orders,
        nation
where
        s_suppkey = l1.l_suppkey
        and o_orderkey = l1.l_orderkey
        and o_orderstatus = 'F'
        and l1.l_receiptdate > l1.l_commitdate
        and exists (
                select
                        *
                from
                        lineitem l2
                where
                        l2.l_orderkey = l1.l_orderkey
                        and l2.l_suppkey <> l1.l_suppkey
        )
        and not exists (
                select
                        *
                from
                        lineitem l3
                where
                        l3.l_orderkey = l1.l_orderkey
                        and l3.l_suppkey <> l1.l_suppkey
                        and l3.l_receiptdate > l3.l_commitdate
        )
        and s_nationkey = n_nationkey
        and n_name = 'INDIA'
group by
        s_name
order by
        numwait desc,
        s_name limit 100;
--LIMIT 100
--Q22
explain analyze
select
        cntrycode,
        count(*) as numcust,
        sum(c_acctbal) as totacctbal
from
        (
                select
                        substring(c_phone from 1 for 2) as cntrycode,
                        c_acctbal
                from
                        customer
                where
                        substring(c_phone from 1 for 2) in
                                ('16', '17', '24', '21', '19', '22', '15')
                        and c_acctbal > (
                                select
                                        avg(c_acctbal)
                                from
                                        customer
                                where
                                        c_acctbal > 0.00
                                        and substring(c_phone from 1 for 2) in
                                                ('16', '17', '24', '21', '19', '22', '15')
                        )
                        and not exists (
                                select
                                        *
                                from
                                        orders
                                where
                                        o_custkey = c_custkey
                        )
        ) as custsale
group by
        cntrycode
order by
        cntrycode;
--LIMIT -1


结果

列存索引占用空间

对比向量化引擎列存索引和行存引擎 Heap 表的空间占用情况。

测试结论:

  • 行存引擎 Heap 表占用空间为 12.7GB。
  • 列存索引占用空间为 5.5GB

表名称

表中包含的数据行数

PostgreSQL 行存

向量化引擎的列存索引

(包含主键)

LINEITEM

599,860,52

8792 MB

3678 MB

ORDERS

150,000,00

2044 MB

914 MB

PARTSUPP

80,000,00

1372 MB

787 MB

PART

20,000,00

324 MB

66 MB

CUSTOMER

15,000,00

284 MB

118 MB

SUPPLIER

1,000,00

20 MB

8 MB

NATION

25

8 KB

528 KB

REGION

5

8 KB

528 KB

合计

12.7GB

5.5GB


查询性能

对比 PostgreSQL 行存引擎与向量化引擎查询效率。

测试结果:向量化引擎性能是 PG 行存引擎性能的近 60 倍。


注:请将下方Q1-Q22任务全部完成,填写表后,并截图

查询语句

PolarDB PG 向量化引擎耗时

(单位:秒)

PostgreSQL 行存引擎耗时

(单位:秒)

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

Q11

Q12

Q13

Q14

Q15

Q16

Q17

Q18

Q19

Q20

Q21

Q22

总耗时:


相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
23天前
|
关系型数据库 分布式数据库 数据库
瑶池数据库大讲堂|PolarDB HTAP:为在线业务插上实时分析的翅膀
瑶池数据库大讲堂介绍PolarDB HTAP,为在线业务提供实时分析能力。内容涵盖MySQL在线业务的分析需求与现有解决方案、PolarDB HTAP架构优化、针对分析型负载的优化(如向量化执行、多核并行处理)及近期性能改进和用户体验提升。通过这些优化,PolarDB HTAP实现了高效的数据处理和查询加速,帮助用户更好地应对复杂业务场景。
|
3月前
|
数据库
|
5月前
|
关系型数据库 分布式数据库 数据库
开源云原生数据库PolarDB PostgreSQL 15兼容版本正式发布
PolarDB进行了深度的内核优化,从而实现以更低的成本提供商业数据库的性能。
|
7月前
|
关系型数据库 分布式数据库 数据库
PolarDB产品使用问题之如何进行PostgreSQL(简称PG)的全量和增量备份管理
PolarDB产品使用合集涵盖了从创建与管理、数据管理、性能优化与诊断、安全与合规到生态与集成、运维与支持等全方位的功能和服务,旨在帮助企业轻松构建高可用、高性能且易于管理的数据库环境,满足不同业务场景的需求。用户可以通过阿里云控制台、API、SDK等方式便捷地使用这些功能,实现数据库的高效运维与持续优化。
|
7月前
|
存储 关系型数据库 分布式数据库
PolarDB产品使用问题之如何查看PolarDB for PostgreSQL的备份信息
PolarDB产品使用合集涵盖了从创建与管理、数据管理、性能优化与诊断、安全与合规到生态与集成、运维与支持等全方位的功能和服务,旨在帮助企业轻松构建高可用、高性能且易于管理的数据库环境,满足不同业务场景的需求。用户可以通过阿里云控制台、API、SDK等方式便捷地使用这些功能,实现数据库的高效运维与持续优化。
|
7月前
|
SQL 存储 关系型数据库
关系型数据库PostgreSQL学习
【7月更文挑战第4天】
465 2
|
25天前
|
Cloud Native 关系型数据库 分布式数据库
让PolarDB更了解您--PolarDB云原生数据库核心功能体验馆
让PolarDB更了解您——PolarDB云原生数据库核心功能体验馆,由阿里云数据库产品事业部负责人宋震分享。内容涵盖PolarDB技术布局、开源进展及体验馆三大部分。技术布局包括云计算加速数据库演进、数据处理需求带来的变革、软硬协同优化等;开源部分介绍了兼容MySQL和PostgreSQL的两款产品;体验馆则通过实际操作让用户直观感受Serverless、无感切换、SQL2Map等功能。
|
4月前
|
关系型数据库 MySQL 分布式数据库
零基础教你用云数据库PolarDB搭建企业网站,完成就送桌面收纳桶!
零基础教你用云数据库PolarDB搭建企业网站,完成就送桌面收纳桶,邀请好友完成更有机会获得​小米Watch S3、小米体重称​等诸多好礼!
零基础教你用云数据库PolarDB搭建企业网站,完成就送桌面收纳桶!
|
21天前
|
运维 关系型数据库 分布式数据库
阿里云PolarDB:引领云原生数据库创新发展
阿里云PolarDB引领云原生数据库创新,2024云栖大会将分享其最新发展及在游戏行业的应用。PolarDB凭借弹性、高可用性、多写技术等优势,支持全球80多个站点,服务1万多家企业。特别是针对游戏行业,PolarDB助力Funplus等公司实现高效运维、成本优化和业务扩展。通过云原生能力,PolarDB推动游戏业务的全球化部署与快速响应,提升用户体验并保障数据安全。未来,PolarDB将继续探索AI、多云管理等前沿技术,为用户提供更智能的数据基础设施。
|
24天前
|
关系型数据库 Serverless 分布式数据库
瑶池数据库微课堂 | PolarDB Serverless弹性&价格力观测
瑶池数据库微课堂介绍阿里云PolarDB Serverless的弹性与性价比优势。通过瑶池解决方案体验馆,用户可免费实操,直观感受Serverless的秒级弹性及超高性价比。内容涵盖Serverless概念、操作步骤、压测演示及性能曲线分析,展示PolarDB在不同负载下的自动扩展能力。适合希望了解云数据库弹性和成本效益的技术人员。

相关产品

  • 云原生数据库 PolarDB