开发者社区> 问答> 正文

MPP计算引擎执行计划分析



语法

<PRE prettyprinted? linenums>

  1. /*+engine=mpp*/ EXPLAIN [ ( option [, ...] ) ] statement
  2. 其中 option 可以是以下之一:
  3.     FORMAT { TEXT | GRAPHVIZ }
  4.     TYPE { LOGICAL | DISTRIBUTED | VALIDATE }


描述


通过EXPLAIN可以查看一个语句的逻辑或分布式执行计划,或者对一个语句进行逻辑验证。 通过 TYPE DISTRIBUTED 选项可以查看分布式的分片计划(fragmented plan)。每个分片计划由一个或者多个COMPUTENODE Worker节点执行。 多个分片在COMPUTENODE Worker节点之间通过Data Exchange交换数据。分片类型指定了分片如何在COMPUTENODE Worker节点之间交换数据,以及数据在多个分片中如何分布:
SINGLE
  • 分片在单个COMPUTENODE Worker节点执行。

HASH
  • 分片在固定个数的COMPUTENODE Worker节点执行,每个节点的输入数据通过Hash函数进行分桶输入。

ROUND_ROBIN
  • 分片在固定个数的COMPUTENODE Worker节点执行,每个节点的输入数据通过轮询的方式进行输入。

BROADCAST
  • 分片在固定个数的COMPUTENODE Worker节点执行,输入数据通过广播的方式广播到所有节点。

SOURCE
  • 分片在输入数据源的访问节点上执行。


例子


逻辑执行计划:<PRE prettyprinted? linenums>
  1. tiny> /*+engine=mpp*/ EXPLAIN SELECT regionkey, count(*) FROM nation GROUP BY 1;
  2.                                                 Query Plan
  3. ----------------------------------------------------------------------------------------------------------
  4. - Output[regionkey, _col1] => [regionkey:bigint, count:bigint]
  5.          _col1 := count
  6.      - RemoteExchange[GATHER] => regionkey:bigint, count:bigint
  7.          - Aggregate(FINAL)[regionkey] => [regionkey:bigint, count:bigint]
  8.                 count := "count"("count_8")
  9.              - LocalExchange[HASH][$hashvalue] ("regionkey") => regionkey:bigint, count_8:bigint, $hashvalue:bigint
  10.                  - RemoteExchange[REPARTITION][$hashvalue_9] => regionkey:bigint, count_8:bigint, $hashvalue_9:bigint
  11.                      - Project[] => [regionkey:bigint, count_8:bigint, $hashvalue_10:bigint]
  12.                              $hashvalue_10 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("regionkey"), 0))
  13.                          - Aggregate(PARTIAL)[regionkey] => [regionkey:bigint, count_8:bigint]
  14.                                  count_8 := "count"(*)
  15.                              - TableScan[tpch:tpch:nation:sf0.1, originalConstraint = true] => [regionkey:bigint]
  16.                                      regionkey := tpch:regionkey

分布式执行计划:<PRE prettyprinted? linenums>
  1. tiny> /*+engine=mpp*/ EXPLAIN (TYPE DISTRIBUTED) SELECT regionkey, count(*) FROM nation GROUP BY 1;
  2.                                           Query Plan
  3. ----------------------------------------------------------------------------------------------
  4. Fragment 0 [SINGLE]
  5.      Output layout: [regionkey, count]
  6.      Output partitioning: SINGLE []
  7.      - Output[regionkey, _col1] => [regionkey:bigint, count:bigint]
  8.              _col1 := count
  9.          - RemoteSource[1] => [regionkey:bigint, count:bigint]
  10. Fragment 1 [HASH]
  11.      Output layout: [regionkey, count]
  12.      Output partitioning: SINGLE []
  13.      - Aggregate(FINAL)[regionkey] => [regionkey:bigint, count:bigint]
  14.              count := "count"("count_8")
  15.          - LocalExchange[HASH][$hashvalue] ("regionkey") => regionkey:bigint, count_8:bigint, $hashvalue:bigint
  16.              - RemoteSource[2] => [regionkey:bigint, count_8:bigint, $hashvalue_9:bigint]
  17. Fragment 2 [SOURCE]
  18.      Output layout: [regionkey, count_8, $hashvalue_10]
  19.      Output partitioning: HASH [regionkey][$hashvalue_10]
  20.      - Project[] => [regionkey:bigint, count_8:bigint, $hashvalue_10:bigint]
  21.              $hashvalue_10 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("regionkey"), 0))
  22.          - Aggregate(PARTIAL)[regionkey] => [regionkey:bigint, count_8:bigint]
  23.                  count_8 := "count"(*)
  24.              - TableScan[tpch:tpch:nation:sf0.1, originalConstraint = true] => [regionkey:bigint]
  25.                      regionkey := tpch:regionkey

验证查询语句:<PRE prettyprinted? linenums>
  1. tiny> /*+engine=mpp*/ EXPLAIN (TYPE VALIDATE) SELECT regionkey, count(*) FROM nation GROUP BY 1;
  2. Valid
  3. -------
  4. true


MPP引擎执行计划运行时分析



语法


/*+engine=mpp*/ EXPLAIN ANALYZE statement

描述


执行语句,并展示语句对应的分布式执行计划和每个操作的详细开销信息。

Note
相关开销的统计信息可能不是完全准确,特别是对于执行非常快的查询。


例子


在以下例子中,你可以查看每个stage的CPU开销,以及在stage中每个Plan Node的相对开销。注意,每个Plan Node的相对开销是基于Wall Time的,Wall Time与CPU Time可能相关,也可能不相关。 针对每个Plan Node,你还能看到一些额外的统计信息内容(比如:每个节点实例的平均输入,相关Plan Node的平均Hash冲突数)。 这些统计信息有助于用户发现查询的执行时的数据异常情况(数据倾斜、异常Hash冲突等)。<PRE prettyprinted? linenums>
  1. sf1> /*+engine=mpp*/ EXPLAIN ANALYZE SELECT count(*), clerk FROM orders WHERE orderdate > date '1995-01-01' GROUP BY clerk;
  2.                                           Query Plan
  3. -----------------------------------------------------------------------------------------------
  4. Fragment 1 [HASH]
  5.     Cost: CPU 88.57ms, Input: 4000 rows (148.44kB), Output: 1000 rows (28.32kB)
  6.     Output layout: [count, clerk]
  7.     Output partitioning: SINGLE []
  8.     - Project[] => [count:bigint, clerk:varchar(15)]
  9.             Cost: 26.24%, Input: 1000 rows (37.11kB), Output: 1000 rows (28.32kB), Filtered: 0.00%
  10.             Input avg.: 62.50 lines, Input std.dev.: 14.77%
  11.         - Aggregate(FINAL)[clerk][$hashvalue] => [clerk:varchar(15), $hashvalue:bigint, count:bigint]
  12.                 Cost: 16.83%, Output: 1000 rows (37.11kB)
  13.                 Input avg.: 250.00 lines, Input std.dev.: 14.77%
  14.                 count := "count"("count_8")
  15.             - LocalExchange[HASH][$hashvalue] ("clerk") => clerk:varchar(15), count_8:bigint, $hashvalue:bigint
  16.                     Cost: 47.28%, Output: 4000 rows (148.44kB)
  17.                     Input avg.: 4000.00 lines, Input std.dev.: 0.00%
  18.                 - RemoteSource[2] => [clerk:varchar(15), count_8:bigint, $hashvalue_9:bigint]
  19.                         Cost: 9.65%, Output: 4000 rows (148.44kB)
  20.                         Input avg.: 4000.00 lines, Input std.dev.: 0.00%
  21. Fragment 2 [tpch:orders:1500000]
  22.     Cost: CPU 14.00s, Input: 818058 rows (22.62MB), Output: 4000 rows (148.44kB)
  23.     Output layout: [clerk, count_8, $hashvalue_10]
  24.     Output partitioning: HASH [clerk][$hashvalue_10]
  25.     - Aggregate(PARTIAL)[clerk][$hashvalue_10] => [clerk:varchar(15), $hashvalue_10:bigint, count_8:bigint]
  26.             Cost: 4.47%, Output: 4000 rows (148.44kB)
  27.             Input avg.: 204514.50 lines, Input std.dev.: 0.05%
  28.             Collisions avg.: 5701.28 (17569.93% est.), Collisions std.dev.: 1.12%
  29.             count_8 := "count"(*)
  30.         - ScanFilterProject[table = tpch:tpch:orders:sf1.0, originalConstraint = ("orderdate" > "$literal$date"(BIGINT '9131')), filterPredicate = ("orderdate" > "$literal$date"(BIGINT '9131'))] => [cler
  31.                 Cost: 95.53%, Input: 1500000 rows (0B), Output: 818058 rows (22.62MB), Filtered: 45.46%
  32.                 Input avg.: 375000.00 lines, Input std.dev.: 0.00%
  33.                 $hashvalue_10 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("clerk"), 0))
  34.                 orderdate := tpch:orderdate
  35.                 clerk := tpch:clerk

展开
收起
nicenelly 2017-10-26 15:46:41 3094 0
0 条回答
写回答
取消 提交回答
问答排行榜
最热
最新

相关电子书

更多
时序数据库TSDB的SQL与流计算 立即下载
阿里云流计算 Flink SQL 核心功能解密 立即下载
阿里云流计算Flink SQL核心功能解密 立即下载