背景
PolarDB 的云原生存算分离架构, 具备低廉的数据存储、高效扩展弹性、高速多机并行计算能力、高速数据搜索和处理; PolarDB与计算算法结合, 将实现双剑合璧, 推动业务数据的价值产出, 将数据变成生产力.
本文将介绍PolarDB 开源版通过duckdb_fdw 支持 parquet 列存数据文件以及高效OLAP.
测试环境为macos+docker, polardb部署请参考:
duckdb_fdw for PolarDB
参考
《用duckdb_fdw加速PostgreSQL分析计算, 提速40倍, 真香.》
1、部署
需要一个 高版本 cmake .
https://cmake.org/download
wget https://github.com/Kitware/CMake/releases/download/v3.25.1/cmake-3.25.1.tar.gz
tar -zxvf cmake-3.25.1.tar.gz
cd cmake-3.25.1
./configure
make -j 4
make install
[postgres@67e1eed1b4b6 duckdb]$ export PATH=/usr/local/bin:$PATH
[postgres@67e1eed1b4b6 duckdb]$ which cmake
/usr/local/bin/cmake
git clone --depth 1 https://github.com/cwida/duckdb
cd duckdb
make -j 4
cd ~/duckdb
cp build/release/tools/sqlite3_api_wrapper/libsqlite3_api_wrapper.so ~/tmp_basedir_polardb_pg_1100_bld/lib/
cp tools/sqlite3_api_wrapper/include/sqlite3.h ~/tmp_basedir_polardb_pg_1100_bld/include/server/
cp build/release/src/libduckdb.so ~/tmp_basedir_polardb_pg_1100_bld/lib/
cd ~
git clone --depth 1 https://github.com/alitrack/duckdb_fdw
cd duckdb_fdw
USE_PGXS=1 make
USE_PGXS=1 make install
[postgres@67e1eed1b4b6 duckdb_fdw]$ USE_PGXS=1 make
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o connection.o connection.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o option.o option.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o deparse.o deparse.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o sqlite_query.o sqlite_query.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o duckdb_fdw.o duckdb_fdw.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -pipe -Wall -grecord-gcc-switches -I/usr/include/et -O3 -Wp,-D_FORTIFY_SOURCE=2 -fPIC -shared -o duckdb_fdw.so connection.o option.o deparse.o sqlite_query.o duckdb_fdw.o -L/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib -Wl,-rpath,'$ORIGIN/../lib' -L/opt/rh/llvm-toolset-7.0/root/usr/lib64 -Wl,--as-needed -Wl,-rpath,'/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib',--enable-new-dtags -lsqlite3_api_wrapper
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o connection.bc connection.c
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o option.bc option.c
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o deparse.bc deparse.c
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o sqlite_query.bc sqlite_query.c
/opt/rh/llvm-toolset-7.0/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/server -I/home/postgres/tmp_basedir_polardb_pg_1100_bld/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o duckdb_fdw.bc duckdb_fdw.c
[postgres@67e1eed1b4b6 duckdb_fdw]$ USE_PGXS=1 make install
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib'
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension'
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension'
/usr/bin/install -c -m 755 duckdb_fdw.so '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/duckdb_fdw.so'
/usr/bin/install -c -m 644 .//duckdb_fdw.control '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension/'
/usr/bin/install -c -m 644 .//duckdb_fdw--1.0.sql .//duckdb_fdw--1.0--1.1.sql '/home/postgres/tmp_basedir_polardb_pg_1100_bld/share/extension/'
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode/duckdb_fdw'
/usr/bin/mkdir -p '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/
/usr/bin/install -c -m 644 connection.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./
/usr/bin/install -c -m 644 option.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./
/usr/bin/install -c -m 644 deparse.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./
/usr/bin/install -c -m 644 sqlite_query.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./
/usr/bin/install -c -m 644 duckdb_fdw.bc '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode'/duckdb_fdw/./
cd '/home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/bitcode' && /opt/rh/llvm-toolset-7.0/root/usr/bin/llvm-lto -thinlto -thinlto-action=thinlink -o duckdb_fdw.index.bc duckdb_fdw/connection.bc duckdb_fdw/option.bc duckdb_fdw/deparse.bc duckdb_fdw/sqlite_query.bc duckdb_fdw/duckdb_fdw.bc
2、加载插件
create extension duckdb_fdw;
3、测试使用
cd ~/duckdb/build/release
./duckdb /home/postgres/db
COPY (select generate_series as id, md5(random()::text) as info, now()::timestamp+(generate_series||' second')::interval as crt_time from generate_series(1,100)) TO '/home/postgres/t1.parquet' (FORMAT 'PARQUET');
COPY (select generate_series as cid, md5(random()::text) as info, now()::timestamp+(generate_series||' second')::interval as crt_time from generate_series(1,100)) TO '/home/postgres/t2.parquet' (FORMAT 'PARQUET');
COPY (select (floor(random()*100)+1)::int as gid, (floor(random()*100)+1)::int as cid, (random()*10)::int as c1, (random()*100)::int as c2, (random()*1000)::int as c3, (random()*10000)::int as c4, (random()*100000)::int as c5 from generate_series(1,1000000)) TO '/home/postgres/t3.parquet' (FORMAT 'PARQUET');
create view t1 as select * from read_parquet('/home/postgres/t1.parquet');
create view t2 as select * from read_parquet('/home/postgres/t2.parquet');
create view t3 as select * from read_parquet('/home/postgres/t3.parquet');
checkpoint;
4、性能怎么样
psql
CREATE SERVER DuckDB_server FOREIGN DATA WRAPPER duckdb_fdw OPTIONS (database '/home/postgres/db');
IMPORT FOREIGN SCHEMA public FROM SERVER DuckDB_server INTO public;
postgres=# \det
List of foreign tables
Schema | Table | Server
--------+-------+---------------
public | t1 | duckdb_server
public | t2 | duckdb_server
public | t3 | duckdb_server
(3 rows)
postgres=# explain verbose select count(distinct gid) from t3;
QUERY PLAN
-------------------------------------------------------------
Foreign Scan (cost=1.00..1.00 rows=1 width=8)
Output: (count(DISTINCT gid))
SQLite query: SELECT count(DISTINCT "gid") FROM main."t3"
(3 rows)
postgres=# \timing
Timing is on.
select count(distinct gid) from t3;
select count(distinct gid),count(*) from t3;
explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid;
select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid;
select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid join t2 on t3.cid=t2.cid;
explain verbose select count(distinct t3.gid),count(*) from t3 join t1 on t3.gid=t1.gid join t2 on t3.cid=t2.cid;
5、将数据导入PolarDB本地, 做同样的查询看一下时间.
postgresql 本地计算
postgres=# create unlogged table lt1 as select * from t1;
SELECT 100
postgres=# create unlogged table lt2 as select * from t2;
SELECT 100
postgres=# create unlogged table lt3 as select * from t3;
SELECT 10000000
postgres=# \timing
Timing is on.
postgres=# select count(distinct gid) from lt3;
count
-------
100
(1 row)
Time: 14891.356 ms (00:14.891)
参考
https://github.com/alitrack/duckdb_fdw