在SLS平台可以使用机器学习函数进行相关的时序异常检测,具体的相关函数可以使用如下函数进行异常检测,帮助用户提高巡检和分析的效率,具体的函数列表如下,具体的地址如下:https://help.aliyun.com/document_detail/93210.html
我们可以通过上面的函数组合,可以得到如下的巡检操作图标,我们将逐步拆解如何得到对应的结果:
- 最复杂的巡检SQL函数如下所示:
* |
SELECT res.name AS INSTANCE
FROM
(SELECT ts_anomaly_filter(INSTANCE, ts, ds, preds, probs, cast(5 AS bigint), cast(1 AS bigint)) AS res
FROM
(SELECT INSTANCE,
res[1] AS ts,
res[2] AS ds,
res[3] AS preds,
res[4] AS uppers,
res[5] AS lowers,
res[6] AS probs
FROM
(SELECT INSTANCE,
array_transpose(ts_predicate_arma(TIME, value, 5, 1, 1, 1, 1, TRUE)) AS res
FROM
(SELECT (TIME/1000) AS TIME,
labels['instance'] AS INSTANCE,
value
FROM
(SELECT promql_query_range('1 - avg(irate(node_cpu_seconds_total{instance=~".*",mode="idle"}[10m])) by (instance) ', '10m') AS t
FROM metrics)
ORDER BY TIME ASC)
GROUP BY INSTANCE)))
我们对上面的SQL进行拆解,看看怎么一步一步获取到对应的结果!
- 我们先获得到对应要检测的对象:
* |
SELECT (TIME/1000) AS TIME,
labels['instance'] AS INSTANCE,
value
FROM
(SELECT promql_query_range('1 - avg(irate(node_cpu_seconds_total{instance=~".*",mode="idle"}[10m])) by (instance) ', '10m') AS t
FROM metrics)
这里,从SLS中使用PromQL获取对应N个监控对象每10分钟的cpu idle指标,为了形象的展示出来,我们可以使用流图将对应的图进行可视化。
- 我们要针对获取的N条线,进行异常检测。SLS提供了异常检测函数,同时支持group by模式,我们可以较为放方便的使用上述方法进行巡检
* |
SELECT INSTANCE,
ts_predicate_arma(TIME, value, 5, 1, 1, 1.0, 1.0, TRUE)
FROM
(SELECT (TIME/1000) AS TIME,
labels['instance'] AS INSTANCE,
value
FROM
(SELECT promql_query_range('1 - avg(irate(node_cpu_seconds_total{instance=~".*",mode="idle"}[10m])) by (instance) ', '10m') AS t
FROM metrics))
GROUP BY INSTANCE
利用上述的sql,我们可以轻松的对N条线进行异常检测,我们将会得到如下结果,表格的第一列是表示instance实例,第二列对应的每条线的检测结果。但是对于这么复杂的结果,该如何进行操作呢?
针对ts_predicate_arma 这个函数,我们提供了对应的函数对模型结果进行解析和转换,我们先检测结果中的数组进行转置操作。
* |
SELECT INSTANCE,
array_transpose(ts_predicate_arma(TIME, value, 5, 1, 1, 1.0, 1.0, TRUE)) AS res
FROM
(SELECT (TIME/1000) AS TIME,
labels['instance'] AS INSTANCE,
value
FROM
(SELECT promql_query_range('1 - avg(irate(node_cpu_seconds_total{instance=~".*",mode="idle"}[10m])) by (instance) ', '10m') AS t
FROM metrics))
GROUP BY INSTANCE
使用 array_transpose 我们已经对函数结果做了转换,将对应的结果做unnest操作后,获取到对应的结果,进行后续的处理。
* |
SELECT INSTANCE,
res[1] AS ts,
res[2] AS ds,
res[3] AS preds,
res[4] AS uppers,
res[5] AS lowers,
res[6] AS probs
FROM
(SELECT INSTANCE,
array_transpose(ts_predicate_arma(TIME, value, 5, 1, 1, 1.0, 1.0, TRUE)) AS res
FROM
(SELECT (TIME/1000) AS TIME,
labels['instance'] AS INSTANCE,
value
FROM
(SELECT promql_query_range('1 - avg(irate(node_cpu_seconds_total{instance=~".*",mode="idle"}[10m])) by (instance) ', '10m') AS t
FROM metrics))
GROUP BY INSTANCE)
我们得到了对应的结果如下图所示:
针对这样的结果,我们筛选出满足我们的异常,我们使用ts_anomaly_filter这个函数来解决这个问题,具体的操作可以参看文档 https://help.aliyun.com/document_detail/93210.html
这就是我们最初复杂SQL的全部内容了。我们得到对应的表格结果后,可以通过SLS这边对应的跳转配置完成对应的分析操作,具体可配置如下:
配置DrillDown操作将数据进行可视化操纵
这样就可以实现对应的选择跳转了。