开发者社区 > 云原生 > 正文

消息发送成功时返回FLUSH_SLAVE_TIMEOUT

Config: broker: master broker role:SYNC_MASTER sendMessageThreadPoolNums=4 topic: Topic_A: qps:3w, WaitStoreMsgOK:false Topic_B: qps:100, WaitStoreMsgOK:true

Producer of Topic_B get code FLUSH_SLAVE_TIMEOUT when send messages to the two topic at the same time. The time cost do not exceed the timeout. There is no problem about the synchronization between master and slave. The cause is the wakeup of the GroupCommitRequest in org.apache.rocketmq.store.ha.HAService.GroupTransferService#doWaitTransfer. for (int i = 0; !transferOK && i < 5; i++) { this.notifyTransferObject.waitForRunning(1000); transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset(); }

As the qps of Topic_A is much higher than Topic_B and the config sendMessageThreadPoolNums is set to 4, the Topic_A will trigger waitForRunning for 5 times before the synchronization of Topic_B is completed. In org.apache.rocketmq.store.CommitLog#handleHA, the GroupCommitRequest is added to ha service and wait unit timeout. In doWaitTransfer, how about check if the request is expire? Add expire timestamp to GroupCommitRequest and check if expire, such as follow: while (!transferOK && defaultMessageStore.getSystemClock().now() < req.getExpireTimestamp()) { this.notifyTransferObject.waitForRunning(1000); transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset(); }

原提问者GitHub用户suiyuzeng

展开
收起
芬奇福贵 2023-05-26 16:01:25 82 0
1 条回答
写回答
取消 提交回答
  • 消息比较大的情况下haTransferBatchSize设置太小就容易出现这个异常

    原回答者GitHub用户makabakaboom

    2023-05-26 17:57:25
    赞同 展开评论 打赏
问答分类:
问答地址:

阿里云拥有国内全面的云原生产品技术以及大规模的云原生应用实践,通过全面容器化、核心技术互联网化、应用 Serverless 化三大范式,助力制造业企业高效上云,实现系统稳定、应用敏捷智能。拥抱云原生,让创新无处不在。

相关电子书

更多
低代码开发师(初级)实战教程 立即下载
冬季实战营第三期:MySQL数据库进阶实战 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载