MySQL在8.0.16版本之前,对 IN和 EXISTS处理是不一样的,EXISTS只能采用子查询方式,所以执行计划中能看到DEPENDENT SUBQUERY。但可以把IN优化成semi join,优化器开关(optimizer_switch)中有几个相关的开关
loosescan=on firstmatch=on duplicateweedout=on materialization=on
MySQL从8.0.16开始,增加对EXISTS的优化,和IN一样也支持自动转换成semi join。
从8.0.18开始,又增加了对NOT EXISTS转变成anti join的优化。
我们来看下同一个SQL在5.7版本和8.0.18版本中的不同表现
1. 测试环境
两个测试表的DDL
# t1表共有30万条记录
[root@yejr.run]> show create table t1\G1. row **
Table: t1
Create Table: CREATE TABLE t1 (
id int unsigned NOT NULL AUTO_INCREMENT,
seq int unsigned NOT NULL DEFAULT '0',
name varchar(20) NOT NULL DEFAULT '',
x int DEFAULT NULL,
PRIMARY KEY (id),
KEY k1 (seq)
) ENGINE=InnoDB AUTO_INCREMENT=300001;
# t2表共有19条记录
[root@yejr.run]> show create table t2\G
1. row **
Table: t2
Create Table: CREATE TABLE t2 (
id int NOT NULL,
nu int DEFAULT NULL,
name varchar(20) NOT NULL DEFAULT '',
PRIMARY KEY (id)
) ENGINE=InnoDB;
数据随机填充。
2. MySQL 5.7下的执行计划及成本
5.7还不支持anti-join优化,只能是用子查询。
[root@yejr.run]> explain select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu);
+----+--------------------+-------+------+------+---------+------+--------+----------+-------------+
| id | select_type | table | type | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------+------+---------+------+--------+----------+-------------+
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | 376310 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | 19 | 10.00 | Using where |
+----+--------------------+-------+------+------+---------+------+--------+----------+-------------+
[root@yejr.run]> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1276 | Field or reference 'yejr.t1.x' of SELECT #2 was resolved in SELECT #1 |
| Note | 1003 | / select#1 / select yejr.t1.id AS id,yejr.t1.seq AS seq,yejr.t1.name AS name,yejr.t1.x AS x from yejr.t1 where exists(/ select#2 / select 1 from yejr.t2 where (yejr.t1.x = yejr.t2.nu)) is false |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
#该SQL耗时 1.34秒
[root@yejr.run]> select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu);
299994 rows in set (1.34 sec)
#从统计结果中的 Handler_read_rnd_next 值来看,应该是做了一次笛卡尔积扫描
[root@yejr.run]> show status like 'handl%read%';
+-----------------------+---------+
| Variable_name | Value |
+-----------------------+---------+
| Handler_read_first | 300001 |
| Handler_read_key | 300001 |
...
| Handler_read_rnd_next | 6299939 |
+-----------------------+---------+
这里要纠个偏,不少人说MySQL对子查询支持不好,实际上是因为优化器无法将这个子查询改写优化成JOIN查询。
像下面这个子查询SQL,就可以被自动优化了
[root@yejr.run]> explain select * from t1 where exists ( select nu from t2 where t1.seq=t2.nu and nu >= 10);
+----+-------------+-------+-------+---------------+------+---------+------------+------+----------+----------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+------+---------+------------+------+----------+----------------------------------------+
| 1 | SIMPLE | t2 | range | nu | nu | 4 | NULL | 17 | 111.76 | Using where; Using index; LooseScan |
| 1 | SIMPLE | t1 | ref | k1 | k1 | 4 | yejr.t2.nu | 1 | 100.00 | Using join buffer (Batched Key Access) |
+----+-------------+-------+-------+---------------+------+---------+------------+------+----------+----------------------------------------+
2 rows in set, 2 warnings (0.00 sec)
[root@yejr.run]> show warnings;
| Level | Code | Message |
| Note | 1276 | Field or reference 'yejr.t1.seq' of SELECT #2 was resolved in SELECT #1 |
| Note | 1003 | / select#1 / select `yejr`.`t1`.`id` AS `id`,`yejr`.`t1`.`seq` AS `seq`,`yejr`.`t1`.`name` AS `name`,`yejr`.`t1`.`x` AS `x` from `yejr`.`t1` semi join (`yejr`.`t2`) where ((`yejr`.`t1`.`seq` = `yejr`.`t2`.`nu`) and (`yejr`.`t2`.`nu` >= 10)) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3. MySQL 8.0.19下的执行计划及成本
这时支持对anti-join的优化,优化器会对SQL进行改写优化。
[root@yejr.run]> explain select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu);
+----+--------------+-------------+--------+---------------------+---------------------+---------+-----------------+--------+----------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+--------+---------------------+---------------------+---------+-----------------+--------+----------+-------------------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 376310 | 100.00 | NULL |
| 1 | SIMPLE | | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 5 | zhishutang.t1.x | 1 | 100.00 | Using where; Not exists |
| 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 19 | 100.00 | NULL |
+----+--------------+-------------+--------+---------------------+---------------------+---------+-----------------+--------+----------+-------------------------+
3 rows in set, 2 warnings (0.00 sec)
#直接优化成anti join了
[root@yejr.run]> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1276 | Field or reference 'yejr.t1.x' of SELECT #2 was resolved in SELECT #1 |
| Note | 1003 | / select#1 / select yejr.t1.id AS id,yejr.t1.seq AS seq,yejr.t1.name AS name,yejr.t1.x AS x from yejr.t1 anti join (yejr.t2) on((.nu = yejr.t1.x)) where true |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# explain analyze结果
[root@yejr.run]> explain analyze select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu);
| -> Nested loop anti-join (actual time=0.058..248.704 rows=299994 loops=1)
-> Table scan on t1 (cost=37975.50 rows=376310) (actual time=0.035..82.504 rows=300000 loops=1)
-> Single-row index lookup on using <auto_distinct_key> (nu=t1.x) (actual time=0.000..0.000 rows=0 loops=300000)
-> Materialize with deduplication (actual time=0.000..0.000 rows=0 loops=300000)
-> Filter: (t2.nu is not null) (cost=2.15 rows=19) (actual time=0.006..0.011 rows=19 loops=1)
-> Table scan on t2 (cost=2.15 rows=19) (actual time=0.005..0.009 rows=19 loops=1)
#该SQL耗时 0.15 秒
[root@yejr.run]> select * from t1 where not exists ( select 1 from t2 where t1.x = t2.nu);
299994 rows in set (0.15 sec)
[root@yejr.run]> show status like 'handl%read%';
+-----------------------+--------+
| Variable_name | Value |
+-----------------------+--------+
| Handler_read_first | 2 |
| Handler_read_key | 200574 |
...
| Handler_read_rnd_next | 300021 |
相对于5.7的性能有了很大提升。
4. 一个小小的脑洞
测试过程中我突发奇想,在MySQL 8.0.19版本下,如果把oiptimizer_switch里的semijoin关闭后,应该也相当于关闭anti join优化。
用EXPLAIN ANALYZE再查看这个SQL的执行计划及耗时,结果是这样的
看起来的确如此。在8.0.19中,也修复了松华老师之前在8.0.18版本遇到的小bug。
最后提醒大家不要轻易关闭 semijoin 开关,以防连 anti-join 优化也跟着消失,这就不划算了。
MySQL优化器的确在不断进步中,欣慰,也建议大家适时更新到高版本。
参考:
- Antijoin in MySQL 8( )
- MySQL 8 Anti-Join 几点总结
全文完。
</div>