开发者社区 > 数据库 > 数据仓库 > 正文

复制表replace partition DB::Exception state Committed

复制表schema如下:``` CREATE TABLE pub_report_query_keyword_creative_day_local ( chan_pub_id Int64, channel_id Int64, publisher_id Int64, pub_account_id Int64, pub_campaign_id Int64, pub_adgroup_id Int64, data_date Date ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/xx/pub_report_query_keyword_creative_day_local_v1', '{replica}') PARTITION BY (toYYYYMMDD(data_date)) PRIMARY KEY (chan_pub_id, pub_campaign_id, pub_adgroup_id) ORDER BY (chan_pub_id, pub_campaign_id, pub_adgroup_id) SETTINGS min_bytes_for_wide_part = '100M', index_granularity = 8192


由于我们会重做某天或者某段时间的部分数据,所以重做的思路如下:
1、创建临时本地表:

CREATE TABLE temp_0_5_20200619_7ec34a581b1d11eb80c102ed675fb066_1604473288 ( chan_pub_id Int64, channel_id Int64, publisher_id Int64, pub_account_id Int64, pub_campaign_id Int64, pub_adgroup_id Int64, device Int8, data_date Date ) ENGINE = MergeTree() PARTITION BY (toYYYYMMDD(data_date)) PRIMARY KEY (chan_pub_id, pub_campaign_id, pub_adgroup_id) ORDER BY (chan_pub_id, pub_campaign_id, pub_adgroup_id) SETTINGS index_granularity = 8192



2、将更新的部分数据写入临时表
3、将在原表不在临时表的数据写入临时表,这样临时表数据就是最新最完整的,示例

sql:insert into temp_0_5_20200619_7ec34a581b1d11eb80c102ed675fb066_1604473288 select * from pub_report_query_keyword_creative_day_local where data_date=xxx and chan_pub_id not in (select chan_pub_id temp_0_5_20200619_7ec34a581b1d11eb80c102ed675fb066_1604473288 group by chan_pub_id order by null) 4、将临时表的分区replace到原表,实例sql:alter table pub_report_query_keyword_creative_day_local replace partition tuple(20201013) from temp_0_5_20200619_7ec34a581b1d11eb80c102ed675fb066_1604473288


5、客户端没有任何错误,提示sql执行成功,但是ck的error log会有如下错误,并发(一个表不同分区并发)约频繁,错误越多:

2020.11.11 09:06:39.641967 [ 24917 ] {} pub_report_query_keyword_creative_day_local: DB::StorageReplicatedMergeTree::queueTask()::<lambda(DB::StorageReplicatedMergeTree::LogEntryPtr&)>: Code: 235, e.displayText() = DB::Exception: Part 20201110-1-3_80_80_1 (state Committed) already exists, Stack trace (when copying this message, always include the lines below):

  1. /build/obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/Exception.cpp:27: Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits , std::__1::allocator > const&, int) @ 0x18e2f1a0 in /usr/lib/debug/usr/bin/clickhouse
  2. /build/obj-x86_64-linux-gnu/../src/Common/Exception.cpp:37: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits , std::__1::allocator > const&, int) @ 0xe73a32d in /usr/lib/debug/usr/bin/clickhouse
  3. /build/obj-x86_64-linux-gnu/../src/Storages/MergeTree/MergeTreeData.cpp:1927: DB::MergeTreeData::renameTempPartAndReplace(std::__1::shared_ptrDB::IMergeTreeDataPart&, SimpleIncrement*, DB::MergeTreeData::Transaction*, std::__1::unique_lockstd::__1::mutex&, std::__1::vector<std::__1::shared_ptr<DB::IMergeTreeDataPart const>, std::__1::allocator<std::__1::shared_ptr<DB::IMergeTreeDataPart const> > >*) (.cold) @ 0x1626022a in /usr/lib/debug/usr/bin/clickhouse
  4. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/__mutex_base:140: DB::MergeTreeData::renameTempPartAndReplace(std::__1::shared_ptrDB::IMergeTreeDataPart&, SimpleIncrement*, DB::MergeTreeData::Transaction*) @ 0x1624c74c in /usr/lib/debug/usr/bin/clickhouse
  5. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/vector:555: DB::StorageReplicatedMergeTree::executeReplaceRange(DB::ReplicatedMergeTreeLogEntry const&) @ 0x16080e04 in /usr/lib/debug/usr/bin/clickhouse
  6. /build/obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:1259: DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) @ 0x160abd95 in /usr/lib/debug/usr/bin/clickhouse
  7. /build/obj-x86_64-linux-gnu/../src/Storages/StorageReplicatedMergeTree.cpp:2575: DB::StorageReplicatedMergeTree::queueTask()::'lambda0'(std::__1::shared_ptrDB::ReplicatedMergeTreeLogEntry&)::operator()(std::__1::shared_ptrDB::ReplicatedMergeTreeLogEntry&) const (.isra.0) @ 0x160ac15d in /usr/lib/debug/usr/bin/clickhouse
  8. /build/obj-x86_64-linux-gnu/../src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp:1280: DB::ReplicatedMergeTreeQueue::processEntry(std::__1::function<std::__1::shared_ptrzkutil::ZooKeeper ()>, std::__1::shared_ptrDB::ReplicatedMergeTreeLogEntry&, std::__1::function<bool (std::__1::shared_ptrDB::ReplicatedMergeTreeLogEntry&)>) @ 0x16403572 in /usr/lib/debug/usr/bin/clickhouse
  9. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:1825: DB::StorageReplicatedMergeTree::queueTask() @ 0x1605ca9e in /usr/lib/debug/usr/bin/clickhouse
  10. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:1867: DB::BackgroundProcessingPool::workLoopFunc() @ 0x161e0723 in /usr/lib/debug/usr/bin/clickhouse
  11. /build/obj-x86_64-linux-gnu/../src/Common/ThreadPool.h:170: ThreadFromGlobalPool::ThreadFromGlobalPool<DB::BackgroundProcessingPool::BackgroundProcessingPool(int, DB::BackgroundProcessingPool::PoolSettings const&, char const*, char const*)::'lambda'()>(DB::BackgroundProcessingPool::BackgroundProcessingPool(int, DB::BackgroundProcessingPool::PoolSettings const&, char const*, char const*)::'lambda'()&&)::'lambda'()::operator()() const @ 0x161e1062 in /usr/lib/debug/usr/bin/clickhouse
  12. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/atomic:856: ThreadPoolImplstd::__1::thread::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xe767b47 in /usr/lib/debug/usr/bin/clickhouse
  13. /build/obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:2615: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, void ThreadPoolImplstd::__1::thread::scheduleImpl (std::__1::function<void ()>, int, std::__1::optional )::'lambda1'()> >(void*) @ 0xe766093 in /usr/lib/debug/usr/bin/clickhouse
  14. start_thread @ 0x740b in /usr/lib64/libpthread-2.26.so
  15. __GI___clone @ 0xeceef in /usr/lib64/libc-2.26.so (version 20.9.4.76 (official build))
6、尝试过这个方案:https://github.com/ClickHouse/ClickHouse/issues/14582,不行

展开
收起
1032091447341572 2020-11-16 19:49:51 2690 0
0 条回答
写回答
取消 提交回答
问答分类:

阿里云自主研发的云原生数据仓库,具有高并发读写、低峰谷读写、弹性扩展、安全可靠等特性,可支持PB级别数据存储,可广泛应用于BI、机器学习、实时分析、数据挖掘等场景。包含AnalyticDB MySQL版、AnalyticDB PostgreSQL 版。

相关电子书

更多
Spark SQL: Past, Present and Future 立即下载
Spark SQL:Past Present &Future 立即下载
Get rid of traditional ETL, Move to Spark! 立即下载