熔断功能即使没有超时也会被间断性的全部cut

验证熔断功能的时候发现即使没有超时也会被熔断，sentinel-dubbo-adapter版本为0.1.0，sentinel-core版本也是0.1.0。熔断设置：服务端设置Count为1，Grade为DEGRADE_GRADE_RT，TimeWindow为10。

dubbo的服务端设置默认timeout为500ms，然后通过客户端来发送请求（没有超时，不会被熔断的）。测试发现的结果如下，每隔1s左右给服务端发送一个请求，刚开始的几个请求是ok的，然后接下来的10s左右的请求都是被熔断的，然后接下来的几秒钟的请求又是OK的，然后接下来的10s左右的请求又被熔断了。 1 PersonId:1, PersonName:张三, PersonAge: null, Time: 2018-08-06 02:25:09

1 PersonId:1, PersonName:张三, PersonAge: null, Time: 2018-08-06 02:25:11 .... PersonId:1, PersonName:张三, PersonAge: null, Time: 2018-08-06 02:25:13 1 PersonId:1, PersonName:张三, PersonAge: null, Time: 2018-08-06 02:25:14 1 ...... 1 com.alibaba.dubbo.rpc.RpcException: Failed to invoke the method get in the service com.ctrip.framework.cdubbo.demo2.api.HelloBOMService. Tried 1 times of the providers [10.32.21.132:20880] (1/3) from the registry 0.0.0.0:9090 on the consumer 10.32.21.115 using the dubbo version 2.0.1. Last error is: com.alibaba.csp.sentinel.slots.block.degrade.DegradeException at com.alibaba.dubbo.rpc.cluster.support.FailoverClusterInvoker.doInvoke(FailoverClusterInvoker.java:101) at com.alibaba.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:232) at com.ctrip.framework.cdubbo.internal.delegate.client.CDubboClusterInvokerDelegate.invoke(CDubboClusterInvokerDelegate.java:40) at com.alibaba.dubbo.rpc.cluster.support.wrapper.MockClusterInvoker.invoke(MockClusterInvoker.java:70) at com.ctrip.framework.cdubbo.internal.delegate.callback.StreamIdAttachInvoker.invoke(StreamIdAttachInvoker.java:39) at com.alibaba.dubbo.rpc.proxy.InvokerInvocationHandler.invoke(InvokerInvocationHandler.java:51) at com.alibaba.dubbo.common.bytecode.proxy0.get(proxy0.java) at com.ctrip.framework.cdubbo.demo2.xml.client.DubboConsumer.invoke(DubboConsumer.java:61) at com.ctrip.framework.cdubbo.demo2.xml.client.DubboConsumer.main(DubboConsumer.java:50) Caused by: com.alibaba.csp.sentinel.slots.block.SentinelRpcException: com.alibaba.csp.sentinel.slots.block.degrade.DegradeException Caused by: com.alibaba.csp.sentinel.slots.block.degrade.DegradeException 1 PersonId:1, PersonName:张三, PersonAge: null, Time: 2018-08-06 02:25:25 ...... PersonId:1, PersonName:张三, PersonAge: null, Time: 2018-08-06 02:25:31 1 com.alibaba.dubbo.rpc.RpcException: Failed to invoke the method get in the service com.ctrip.framework.cdubbo.demo2.api.HelloBOMService. Tried 1 times of the providers [10.32.21.115:20880] (1/3) from the registry 0.0.0.0:9090 on the consumer 10.32.21.115 using the dubbo version 2.0.1. Last error is: com.alibaba.csp.sentinel.slots.block.degrade.DegradeException at com.alibaba.dubbo.rpc.cluster.support.FailoverClusterInvoker.doInvoke(FailoverClusterInvoker.java:101) at com.alibaba.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:232) at com.ctrip.framework.cdubbo.internal.delegate.client.CDubboClusterInvokerDelegate.invoke(CDubboClusterInvokerDelegate.java:40) at com.alibaba.dubbo.rpc.cluster.support.wrapper.MockClusterInvoker.invoke(MockClusterInvoker.java:70) at com.ctrip.framework.cdubbo.internal.delegate.callback.StreamIdAttachInvoker.invoke(StreamIdAttachInvoker.java:39) at com.alibaba.dubbo.rpc.proxy.InvokerInvocationHandler.invoke(InvokerInvocationHandler.java:51) at com.alibaba.dubbo.common.bytecode.proxy0.get(proxy0.java) at com.ctrip.framework.cdubbo.demo2.xml.client.DubboConsumer.invoke(DubboConsumer.java:61) at com.ctrip.framework.cdubbo.demo2.xml.client.DubboConsumer.main(DubboConsumer.java:50) Caused by: com.alibaba.csp.sentinel.slots.block.SentinelRpcException: com.alibaba.csp.sentinel.slots.block.degrade.DegradeException Caused by: com.alibaba.csp.sentinel.slots.block.degrade.DegradeException

服务端没有处理超时，也没有异常，不应该被熔断，所有请求都应该是正常返回的。

1、引入sentinel-core:0.1.0, sentinel-dubbo-adapter:0.1.0版本。 2、应用启动时，配置了降级的规则如下。 List rules = new ArrayList(); DegradeRule rule = new DegradeRule(); rule.setResource("xxx"); // set threshold rt, 10 ms rule.setCount(1); rule.setGrade(RuleConstant.DEGRADE_GRADE_RT); rule.setTimeWindow(10); rules.add(rule); DegradeRuleManager.loadRules(rules); 3、服务端设置默认的超时时间为500ms

大致看了下导致这个问题的原因。首先，本地请求量很小，但是DegradeRule的passCheck的clusterNode.avgRt()返回了很大的rt，有时3600，有时4900等。然后，passCount就会开始计数，RT_MAX_EXCEED_N是个常量5，所以很容易就超过了。最后，就会触发ResetTask，并且把cut设置为了true，然后就会导致接下来10s的所有请求都失败了。最后的最后，10s之后会启动ResetTask，又把cut设置回false，然后请求又进来了。

public boolean passCheck(Context context, DefaultNode node, int acquireCount, Object... args) {

if (cut) { return false; }

if (grade == RuleConstant.DEGRADE_GRADE_RT) { double rt = clusterNode.avgRt(); if (rt < this.count) { return true; }

// Sentinel will degrade the service only if count exceeds.
if (passCount.incrementAndGet() < RT_MAX_EXCEED_N) {
    return true;
}

} else { double exception = clusterNode.exceptionQps(); double success = clusterNode.successQps(); if (success == 0) { return true; }

if (exception / success < count) {
    return true;
}

}

synchronized (lock) { if (!cut) { // Automatically degrade. cut = true; ResetTask resetTask = new ResetTask(this); pool.schedule(resetTask, timeWindow, TimeUnit.SECONDS); }

return false;

}

然后又看了下为啥clusterNode.avgRt()返回的数字很大。

这个值是从StatisticNode的avgRt方法返回，这个方法又是通过ArrayMetric的rt()拿到最终的值，rt又是根据window的rt.sum汇总而来。

如下，显示了被更新过的windowWrap的内存值(4900明显大于我设置的1，所以导致了计数，然后就是设置为cut，再然后就是全部熔断)。

window = {WindowWrap@134592} windowLength = 1000 windowStart = 1533537600000 value = {Window@135195} pass = {LongAdder@135439} "0" block = {LongAdder@135440} "0" exception = {LongAdder@135441} "0" rt = {LongAdder@135437} "0" success = {LongAdder@135442} "0" minRt = {LongAdder@135443} "4900"

原提问者GitHub用户haiyang1985

熔断功能即使没有超时也会被间断性的全部cut

云原生

热门讨论

热门文章

相关电子书

熔断功能即使没有超时也会被间断性的全部cut

云原生

热门讨论

热门文章

相关文章

相关电子书