美文网首页数据库
MySql语法(4)—exists和in的区别(explain分

MySql语法(4)—exists和in的区别(explain分

作者: 小胖学编程 | 来源:发表于2020-02-28 21:39 被阅读0次

首先在mysql数据库中新建两张表,并且插入数据。

MySql(准备)—mysql使用存储过程快速插入百万条数据

1. exists的原理

exists[ɪɡˈzɪsts]返回的结果是boolean类型,只有true和false。

语法:

select 字段1,字段2 from 表1 
where exists (select 字段1,字段2 from 表2 where 表1.字段2=表2.字段2);

测试结果:

mysql> explain select * from t1 where exists (select 1 from t4 where t1.t1_col =t4.t4_col);

+----+--------------------+-------+------------+------+---------------+--------------+---------+----------------+--------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys | key          | key_len | ref            | rows   | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------+--------------+---------+----------------+--------+----------+-------------+
|  1 | PRIMARY            | t1    | NULL       | ALL  | NULL          | NULL         | NULL    | NULL           | 998026 |      100 | Using where |
|  2 | DEPENDENT SUBQUERY | t4    | NULL       | ref  | t4_col_index  | t4_col_index | 303     | mydb.t1.t1_col |     95 |      100 | Using index |
+----+--------------------+-------+------------+------+---------------+--------------+---------+----------------+--------+----------+-------------+
2 rows in set

mysql> explain select * from t4 where exists (select 1 from t1 where t1.t1_col =t4.t4_col);

+----+--------------------+-------+------------+-------+---------------+--------------+---------+----------------+--------+----------+--------------------------+
| id | select_type        | table | partitions | type  | possible_keys | key          | key_len | ref            | rows   | filtered | Extra                    |
+----+--------------------+-------+------------+-------+---------------+--------------+---------+----------------+--------+----------+--------------------------+
|  1 | PRIMARY            | t4    | NULL       | index | NULL          | t4_col_index | 303     | NULL           | 100409 |      100 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | t1    | NULL       | ref   | t1_col_index  | t1_col_index | 303     | mydb.t4.t4_col |   1001 |      100 | Using index              |
+----+--------------------+-------+------------+-------+---------------+--------------+---------+----------------+--------+----------+--------------------------+
2 rows in set

由上图可以看出,外表先执行,而后执行的内表。并且是对外表进行了全表扫描。内表使用了索引。

结论:对外表做loop循环,每次循环对内表(子查询)进行查询。内表的查询可以使用索引,而外表需要全表扫描。

2. in的原理

场景1:外表为百万数据,内表为10w数据

-- t1表为百万级数据,t4表为十万级数据
mysql> EXPLAIN SELECT * FROM t1 WHERE id IN (SELECT id FROM t4);
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref        | rows   | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
|  1 | SIMPLE      | t4    | NULL       | index  | PRIMARY       | PRIMARY | 4       | NULL       | 100409 |      100 | Using index |
|  1 | SIMPLE      | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | mydb.t4.id |      1 |      100 | NULL        |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
2 rows in set

mysql> SHOW WARNINGS;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                          |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `mydb`.`t1`.`id` AS `id`,`mydb`.`t1`.`t1_col` AS `t1_col`,`mydb`.`t1`.`t1_col2` AS `t1_col2` from `mydb`.`t4` join `mydb`.`t1` where (`mydb`.`t1`.`id` = `mydb`.`t4`.`id`) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

结论:在执行计划中,看到并未使用到中间表,在,SHOW WARNINGS;中发现mysql优化器直接将语句优化为join查询,并且大表使用到了索引查询,而小表是根据索引的全表扫描。

SELECT
    `mydb`.`t1`.`id` AS `id`,
    `mydb`.`t1`.`t1_col` AS `t1_col`,
    `mydb`.`t1`.`t1_col2` AS `t1_col2`
FROM
    `mydb`.`t4`
JOIN `mydb`.`t1`
WHERE
    (
        `mydb`.`t1`.`id` = `mydb`.`t4`.`id`
    )

场景2:外表为10w数据,内表为百万数据

-- t1表为百万级数据,t4表为十万级数据
mysql> EXPLAIN select * from t4 where id in(select id from t1);
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref        | rows   | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
|  1 | SIMPLE      | t4    | NULL       | ALL    | PRIMARY       | NULL    | NULL    | NULL       | 100409 |      100 | NULL        |
|  1 | SIMPLE      | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | mydb.t4.id |      1 |      100 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
2 rows in set

mysql> SHOW WARNINGS;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                       |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `mydb`.`t4`.`id` AS `id`,`mydb`.`t4`.`t4_col` AS `t4_col` from `mydb`.`t1` join `mydb`.`t4` where (`mydb`.`t1`.`id` = `mydb`.`t4`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

由上可知,mysql优化器并未使用中间表,而是两个表直接关联查询。依旧大表使用索引,小表全表扫描。

SELECT
    `mydb`.`t4`.`id` AS `id`,
    `mydb`.`t4`.`t4_col` AS `t4_col`
FROM
    `mydb`.`t1`
JOIN `mydb`.`t4`
WHERE
    (
        `mydb`.`t1`.`id` = `mydb`.`t4`.`id`
    )

场景3:外表使用主键索引,内表不使用索引查询

mysql> EXPLAIN SELECT * FROM t4 WHERE id IN (SELECT id FROM t1 where t1_col2='t1_83');

+----+-------------+-------+------------+--------+---------------+--------------+---------+------------+--------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key          | key_len | ref        | rows   | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+--------------+---------+------------+--------+----------+-------------+
|  1 | SIMPLE      | t4    | NULL       | index  | PRIMARY       | t4_col_index | 303     | NULL       | 100409 |      100 | Using index |
|  1 | SIMPLE      | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY      | 4       | mydb.t4.id |      1 |       10 | Using where |
+----+-------------+-------+------------+--------+---------------+--------------+---------+------------+--------+----------+-------------+
2 rows in set

mysql> SHOW WARNINGS;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                               |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `mydb`.`t4`.`id` AS `id`,`mydb`.`t4`.`t4_col` AS `t4_col` from `mydb`.`t1` join `mydb`.`t4` where ((`mydb`.`t1`.`id` = `mydb`.`t4`.`id`) and (`mydb`.`t1`.`t1_col2` = 't1_83')) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

mysql优化后的sql:

SELECT
    `mydb`.`t4`.`id` AS `id`,
    `mydb`.`t4`.`t4_col` AS `t4_col`
FROM
    `mydb`.`t1`
JOIN `mydb`.`t4`
WHERE
    (
        (
            `mydb`.`t1`.`id` = `mydb`.`t4`.`id`
        )
        AND (
            `mydb`.`t1`.`t1_col2` = 't1_83'
        )
    )

上图可以看到,并未出现中间表,而是直接进行关联查询。mysql优化的依旧是大表使用索引小表全表扫描。

场景4:外表主键索引,内表查询值非主键

mysql> EXPLAIN select * from t4 where id in(select t1_col from t1);
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+------------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref            | rows   | filtered | Extra                        |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+------------------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL    | NULL          | NULL    | NULL    | NULL           | 998026 |      100 | Using where; Start temporary |
|  1 | SIMPLE      | t4    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | mydb.t1.t1_col |      1 |      100 | Using where; End temporary   |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+------------------------------+
2 rows in set

mysql> SHOW WARNINGS;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                  |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `mydb`.`t4`.`id` AS `id`,`mydb`.`t4`.`t4_col` AS `t4_col` from `mydb`.`t4` semi join (`mydb`.`t1`) where (`mydb`.`t4`.`id` = `mydb`.`t1`.`t1_col`) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

mysql优化后的sql

SELECT
    `mydb`.`t4`.`id` AS `id`,
    `mydb`.`t4`.`t4_col` AS `t4_col`
FROM
    `mydb`.`t4` semi
JOIN (`mydb`.`t1`)
WHERE
    (
        `mydb`.`t4`.`id` = `mydb`.`t1`.`t1_col`
    )

由于内查询并未返回主键值,mysql依旧是之间关联查询(不存在中间表)。此时外表使用的是索引查询,而内表进行了全表扫描。

场景5:外表非索引列,内表返回主键

mysql> EXPLAIN SELECT * FROM t1 WHERE t1_col IN (SELECT id FROM t4);

+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+--------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref            | rows   | filtered | Extra                    |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+--------------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL    | NULL          | NULL    | NULL    | NULL           | 998026 |      100 | Using where              |
|  1 | SIMPLE      | t4    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | mydb.t1.t1_col |      1 |      100 | Using where; Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+--------------------------+
2 rows in set

mysql> SHOW WARNINGS;
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                              |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `mydb`.`t1`.`id` AS `id`,`mydb`.`t1`.`t1_col` AS `t1_col`,`mydb`.`t1`.`t1_col2` AS `t1_col2` from `mydb`.`t4` join `mydb`.`t1` where (`mydb`.`t1`.`t1_col` = `mydb`.`t4`.`id`) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

Mysql优化后的sql:

SELECT
    `mydb`.`t1`.`id` AS `id`,
    `mydb`.`t1`.`t1_col` AS `t1_col`,
    `mydb`.`t1`.`t1_col2` AS `t1_col2`
FROM
    `mydb`.`t4`
JOIN `mydb`.`t1`
WHERE
    (
        `mydb`.`t1`.`t1_col` = `mydb`.`t4`.`id`
    )

外表的t1_col是非索引列,而in()返回的是主键,mysql并未产生中间表,而是直接进行join关联。最终优化结果是内表走索引,而外表进行全表扫描。

场景6:外表in不是主键列,内表不返回主键。

t1_col不是索引列且t4_col也不是索引列。

mysql> EXPLAIN SELECT * FROM t1 WHERE t1_col IN (SELECT t4_col FROM t4);

+----+--------------+-------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| id | select_type  | table       | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                                              |
+----+--------------+-------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL | NULL   |      100 | NULL                                               |
|  1 | SIMPLE       | t1          | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 998026 |       10 | Using where; Using join buffer (Block Nested Loop) |
|  2 | MATERIALIZED | t4          | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 100409 |      100 | NULL                                               |
+----+--------------+-------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
3 rows in set

mysql> SHOW WARNINGS;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                           |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `mydb`.`t1`.`id` AS `id`,`mydb`.`t1`.`t1_col` AS `t1_col`,`mydb`.`t1`.`t1_col2` AS `t1_col2` from `mydb`.`t1` semi join (`mydb`.`t4`) where (`mydb`.`t1`.`t1_col` = `<subquery2>`.`t4_col`) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

此时mysql优化的结果是产生的中间表。即先执行in()查询,得到的值生成中间表后,在与外表进行join关联。

上面的执行计划顺序是:先执行in()中的子查询,可以看到是没有走索引的,得到中间表<subquery2>,然后和t1表进行关联查询(join),因为in不是索引列,且中间表也无索引列,所以均不会走索引。

场景7:外表列上添加索引

ALTER TABLE `t1` ADD INDEX t1_col_index ( `t1_col` ) ;  --创建普通索引
mysql> EXPLAIN SELECT * FROM t1 WHERE t1_col IN (SELECT t4_col FROM t4);

+----+--------------+-------------+------------+------+---------------+--------------+---------+--------------------+--------+----------+-------------+
| id | select_type  | table       | partitions | type | possible_keys | key          | key_len | ref                | rows   | filtered | Extra       |
+----+--------------+-------------+------------+------+---------------+--------------+---------+--------------------+--------+----------+-------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL  | NULL          | NULL         | NULL    | NULL               | NULL   |      100 | Using where |
|  1 | SIMPLE       | t1          | NULL       | ref  | t1_col_index  | t1_col_index | 303     | <subquery2>.t4_col |   1001 |      100 | NULL        |
|  2 | MATERIALIZED | t4          | NULL       | ALL  | NULL          | NULL         | NULL    | NULL               | 100409 |      100 | NULL        |
+----+--------------+-------------+------------+------+---------------+--------------+---------+--------------------+--------+----------+-------------+
3 rows in set

先执行id=2,即全表扫描t4表,获取到中间表,中间表和t1表做关联查询(join),因为t1表中有普通索引,最终外表会执行普通索引,内表不会走索引。

场景8:外表列添加索引,内表依旧二级索引列覆盖

在t4上建立索引,使其执行覆盖索引

ALTER TABLE `t4` ADD INDEX t4_col_index ( `t4_col` ) ;
mysql> EXPLAIN SELECT * FROM t1 WHERE t1_col IN (SELECT t4_col FROM t4);

+----+--------------+-------------+------------+--------+---------------+--------------+---------+----------------+--------+----------+-------------+
| id | select_type  | table       | partitions | type   | possible_keys | key          | key_len | ref            | rows   | filtered | Extra       |
+----+--------------+-------------+------------+--------+---------------+--------------+---------+----------------+--------+----------+-------------+
|  1 | SIMPLE       | t1          | NULL       | ALL    | t1_col_index  | NULL         | NULL    | NULL           | 998026 |      100 | Using where |
|  1 | SIMPLE       | <subquery2> | NULL       | eq_ref | <auto_key>    | <auto_key>   | 303     | mydb.t1.t1_col |      1 |      100 | NULL        |
|  2 | MATERIALIZED | t4          | NULL       | index  | t4_col_index  | t4_col_index | 303     | NULL           | 100409 |      100 | Using index |
+----+--------------+-------------+------------+--------+---------------+--------------+---------+----------------+--------+----------+-------------+
3 rows in set

mysql> SHOW WARNINGS;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                           |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `mydb`.`t1`.`id` AS `id`,`mydb`.`t1`.`t1_col` AS `t1_col`,`mydb`.`t1`.`t1_col2` AS `t1_col2` from `mydb`.`t1` semi join (`mydb`.`t4`) where (`<subquery2>`.`t4_col` = `mydb`.`t1`.`t1_col`) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

依旧会存在子查询的问题,先按照索引顺序全表扫描t4表,获取到中间表。t1表和中间表关联查询。

结论:

in是否走索引,依赖的是mysql的优化器,mysql优化器会将in优化为join语句。可能是单独的join语句;也可能是先执行in()语句,获取中间表,中间表在于外表进行join连接。

3. exists和in的使用场景

  1. exists子查询返回的是boolean类型,由执行计划可知,会对外表进行全表扫描,处理内表时会使用索引。适用于内表大,而外表小的情况
  2. in是外表和内表做join连接,根据mysql优化器生成sql,但是一般情况下,in中的条件不能超过1000个。适合in()数量少(内部小),而外表大的情况

4. not in和not exists的区别

  1. 执行速度
使用not in的场景.png 使用not exists的场景.png

结论:由上图可知,通常情况下,not exists的执行效率要高于not in。

  1. 条件中存在null
image.png

对于not in,若条件中存在null值,那么最终结果会直接停止执行并返回null结果。

推荐阅读

SQL IN 一定走索引吗?

历史文章

mybatis&&数据库优化&&缓存目录
JAVA && Spring && SpringBoot2.x — 学习目录

【mysql知识点整理】 --- 准确理解 in 和 exists

相关文章

网友评论

    本文标题:MySql语法(4)—exists和in的区别(explain分

    本文链接:https://www.haomeiwen.com/subject/cwmlhhtx.html