使用HybridDB PG的外部表输出数据到OSS时,一般会输出成多个文件(文件个数一般与节点数个数一致)。如何输出为一个文件呢?步骤如下:
1)创建示例表:
create table test(
A1 char(30),
A2 char(30),
A3 varchar(255),
A4 varchar(255)
) with (appendonly=true, orientation=column, compresstype=zlib, checksum=false, fillfactor=100) distributed randomly;
2)插入数据:
insert into test values('xxxxxxx','yyyyy','zzzzz','wwwww');
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
insert into test select * from test;
3)创建外部表写表,按源数据表的结构创建外部表。注意:
- 相对于本地源表的表结构,需要添加一个临时字段,例如dummy_col,最好作为第一个字段。
- 添加output_generator_col=false
file_name_generator_col=dummy_col这两个选项。file_name_generator_col这个参数指定a.中添加的字段。
create writable external table test_oss_write(
dummy_col text,
A1 text,
A2 text,
A3 text,
A4 text
)
location('oss://oss-cn-shanghai-internal.aliyuncs.com
dir=oncall_test/
id=xx
key=xx
bucket=osshuadong2
oss_connect_timeout=60
oss_dns_cache_timeout=300
output_generator_col=false
file_name_generator_col=dummy_col
') FORMAT 'csv' ( DELIMITER ',')
distributed by (dummy_col)
;
4) 将test数据写入到oss中。这里为了避开我们的一些limitation,写成了如下形式,将A1-A4和表test换成用户的字段和表名即可。
set rds_write_oss_file_by_distribution_column=on;
explain insert into test_oss_write select x, A1, A2, A3, A4 from (select row_number() over (partition by dummy_col||dummy_col) as RN, dummy_col||dummy_col as x, A1, A2, A3, A4 from (select y as dummy_col, A1, A2, A3, A4 from test, (values(1)) as q1(y)) as q2 ) as q3;