首先在mysql数据库中新建两张表,并且插入数据。
MySql(准备)—mysql使用存储过程快速插入百万条数据
1. exists的原理
exists[ɪɡˈzɪsts]
返回的结果是boolean类型,只有true和false。
语法:
select 字段1,字段2 from 表1
where exists (select 字段1,字段2 from 表2 where 表1.字段2=表2.字段2);
测试结果:
mysql> explain select * from t1 where exists (select 1 from t4 where t1.t1_col =t4.t4_col);
+----+--------------------+-------+------------+------+---------------+--------------+---------+----------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------+--------------+---------+----------------+--------+----------+-------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 998026 | 100 | Using where |
| 2 | DEPENDENT SUBQUERY | t4 | NULL | ref | t4_col_index | t4_col_index | 303 | mydb.t1.t1_col | 95 | 100 | Using index |
+----+--------------------+-------+------------+------+---------------+--------------+---------+----------------+--------+----------+-------------+
2 rows in set
mysql> explain select * from t4 where exists (select 1 from t1 where t1.t1_col =t4.t4_col);
+----+--------------------+-------+------------+-------+---------------+--------------+---------+----------------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+-------+---------------+--------------+---------+----------------+--------+----------+--------------------------+
| 1 | PRIMARY | t4 | NULL | index | NULL | t4_col_index | 303 | NULL | 100409 | 100 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | t1 | NULL | ref | t1_col_index | t1_col_index | 303 | mydb.t4.t4_col | 1001 | 100 | Using index |
+----+--------------------+-------+------------+-------+---------------+--------------+---------+----------------+--------+----------+--------------------------+
2 rows in set
由上图可以看出,外表先执行,而后执行的内表。并且是对外表进行了全表扫描。内表使用了索引。
结论:对外表做loop循环,每次循环对内表(子查询)进行查询。内表的查询可以使用索引,而外表需要全表扫描。
2. in的原理
场景1:外表为百万数据,内表为10w数据
-- t1表为百万级数据,t4表为十万级数据
mysql> EXPLAIN SELECT * FROM t1 WHERE id IN (SELECT id FROM t4);
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
| 1 | SIMPLE | t4 | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 100409 | 100 | Using index |
| 1 | SIMPLE | t1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | mydb.t4.id | 1 | 100 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
2 rows in set
mysql> SHOW WARNINGS;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `mydb`.`t1`.`id` AS `id`,`mydb`.`t1`.`t1_col` AS `t1_col`,`mydb`.`t1`.`t1_col2` AS `t1_col2` from `mydb`.`t4` join `mydb`.`t1` where (`mydb`.`t1`.`id` = `mydb`.`t4`.`id`) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
结论:在执行计划中,看到并未使用到中间表,在,SHOW WARNINGS;
中发现mysql优化器直接将语句优化为join查询,并且大表使用到了索引查询,而小表是根据索引的全表扫描。
SELECT
`mydb`.`t1`.`id` AS `id`,
`mydb`.`t1`.`t1_col` AS `t1_col`,
`mydb`.`t1`.`t1_col2` AS `t1_col2`
FROM
`mydb`.`t4`
JOIN `mydb`.`t1`
WHERE
(
`mydb`.`t1`.`id` = `mydb`.`t4`.`id`
)
场景2:外表为10w数据,内表为百万数据
-- t1表为百万级数据,t4表为十万级数据
mysql> EXPLAIN select * from t4 where id in(select id from t1);
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
| 1 | SIMPLE | t4 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 100409 | 100 | NULL |
| 1 | SIMPLE | t1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | mydb.t4.id | 1 | 100 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+
2 rows in set
mysql> SHOW WARNINGS;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `mydb`.`t4`.`id` AS `id`,`mydb`.`t4`.`t4_col` AS `t4_col` from `mydb`.`t1` join `mydb`.`t4` where (`mydb`.`t1`.`id` = `mydb`.`t4`.`id`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
由上可知,mysql优化器并未使用中间表,而是两个表直接关联查询。依旧大表使用索引,小表全表扫描。
SELECT
`mydb`.`t4`.`id` AS `id`,
`mydb`.`t4`.`t4_col` AS `t4_col`
FROM
`mydb`.`t1`
JOIN `mydb`.`t4`
WHERE
(
`mydb`.`t1`.`id` = `mydb`.`t4`.`id`
)
场景3:外表使用主键索引,内表不使用索引查询
mysql> EXPLAIN SELECT * FROM t4 WHERE id IN (SELECT id FROM t1 where t1_col2='t1_83');
+----+-------------+-------+------------+--------+---------------+--------------+---------+------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+--------------+---------+------------+--------+----------+-------------+
| 1 | SIMPLE | t4 | NULL | index | PRIMARY | t4_col_index | 303 | NULL | 100409 | 100 | Using index |
| 1 | SIMPLE | t1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | mydb.t4.id | 1 | 10 | Using where |
+----+-------------+-------+------------+--------+---------------+--------------+---------+------------+--------+----------+-------------+
2 rows in set
mysql> SHOW WARNINGS;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `mydb`.`t4`.`id` AS `id`,`mydb`.`t4`.`t4_col` AS `t4_col` from `mydb`.`t1` join `mydb`.`t4` where ((`mydb`.`t1`.`id` = `mydb`.`t4`.`id`) and (`mydb`.`t1`.`t1_col2` = 't1_83')) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
mysql优化后的sql:
SELECT
`mydb`.`t4`.`id` AS `id`,
`mydb`.`t4`.`t4_col` AS `t4_col`
FROM
`mydb`.`t1`
JOIN `mydb`.`t4`
WHERE
(
(
`mydb`.`t1`.`id` = `mydb`.`t4`.`id`
)
AND (
`mydb`.`t1`.`t1_col2` = 't1_83'
)
)
上图可以看到,并未出现中间表,而是直接进行关联查询。mysql优化的依旧是大表使用索引小表全表扫描。
场景4:外表主键索引,内表查询值非主键
mysql> EXPLAIN select * from t4 where id in(select t1_col from t1);
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 998026 | 100 | Using where; Start temporary |
| 1 | SIMPLE | t4 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | mydb.t1.t1_col | 1 | 100 | Using where; End temporary |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+------------------------------+
2 rows in set
mysql> SHOW WARNINGS;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `mydb`.`t4`.`id` AS `id`,`mydb`.`t4`.`t4_col` AS `t4_col` from `mydb`.`t4` semi join (`mydb`.`t1`) where (`mydb`.`t4`.`id` = `mydb`.`t1`.`t1_col`) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
mysql优化后的sql
SELECT
`mydb`.`t4`.`id` AS `id`,
`mydb`.`t4`.`t4_col` AS `t4_col`
FROM
`mydb`.`t4` semi
JOIN (`mydb`.`t1`)
WHERE
(
`mydb`.`t4`.`id` = `mydb`.`t1`.`t1_col`
)
由于内查询并未返回主键值,mysql依旧是之间关联查询(不存在中间表)。此时外表使用的是索引查询,而内表进行了全表扫描。
场景5:外表非索引列,内表返回主键
mysql> EXPLAIN SELECT * FROM t1 WHERE t1_col IN (SELECT id FROM t4);
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+--------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 998026 | 100 | Using where |
| 1 | SIMPLE | t4 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | mydb.t1.t1_col | 1 | 100 | Using where; Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------+--------+----------+--------------------------+
2 rows in set
mysql> SHOW WARNINGS;
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `mydb`.`t1`.`id` AS `id`,`mydb`.`t1`.`t1_col` AS `t1_col`,`mydb`.`t1`.`t1_col2` AS `t1_col2` from `mydb`.`t4` join `mydb`.`t1` where (`mydb`.`t1`.`t1_col` = `mydb`.`t4`.`id`) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
Mysql优化后的sql:
SELECT
`mydb`.`t1`.`id` AS `id`,
`mydb`.`t1`.`t1_col` AS `t1_col`,
`mydb`.`t1`.`t1_col2` AS `t1_col2`
FROM
`mydb`.`t4`
JOIN `mydb`.`t1`
WHERE
(
`mydb`.`t1`.`t1_col` = `mydb`.`t4`.`id`
)
外表的t1_col是非索引列,而in()返回的是主键,mysql并未产生中间表,而是直接进行join关联。最终优化结果是内表走索引,而外表进行全表扫描。
场景6:外表in不是主键列,内表不返回主键。
t1_col不是索引列且t4_col也不是索引列。
mysql> EXPLAIN SELECT * FROM t1 WHERE t1_col IN (SELECT t4_col FROM t4);
+----+--------------+-------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100 | NULL |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 998026 | 10 | Using where; Using join buffer (Block Nested Loop) |
| 2 | MATERIALIZED | t4 | NULL | ALL | NULL | NULL | NULL | NULL | 100409 | 100 | NULL |
+----+--------------+-------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
3 rows in set
mysql> SHOW WARNINGS;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `mydb`.`t1`.`id` AS `id`,`mydb`.`t1`.`t1_col` AS `t1_col`,`mydb`.`t1`.`t1_col2` AS `t1_col2` from `mydb`.`t1` semi join (`mydb`.`t4`) where (`mydb`.`t1`.`t1_col` = `<subquery2>`.`t4_col`) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
此时mysql优化的结果是产生的中间表。即先执行in()查询,得到的值生成中间表后,在与外表进行join关联。
上面的执行计划顺序是:先执行in()中的子查询,可以看到是没有走索引的,得到中间表<subquery2>,然后和t1表进行关联查询(join),因为in不是索引列,且中间表也无索引列,所以均不会走索引。
场景7:外表列上添加索引
ALTER TABLE `t1` ADD INDEX t1_col_index ( `t1_col` ) ; --创建普通索引
mysql> EXPLAIN SELECT * FROM t1 WHERE t1_col IN (SELECT t4_col FROM t4);
+----+--------------+-------------+------------+------+---------------+--------------+---------+--------------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+------+---------------+--------------+---------+--------------------+--------+----------+-------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100 | Using where |
| 1 | SIMPLE | t1 | NULL | ref | t1_col_index | t1_col_index | 303 | <subquery2>.t4_col | 1001 | 100 | NULL |
| 2 | MATERIALIZED | t4 | NULL | ALL | NULL | NULL | NULL | NULL | 100409 | 100 | NULL |
+----+--------------+-------------+------------+------+---------------+--------------+---------+--------------------+--------+----------+-------------+
3 rows in set
先执行id=2,即全表扫描t4表,获取到中间表,中间表和t1表做关联查询(join),因为t1表中有普通索引,最终外表会执行普通索引,内表不会走索引。
场景8:外表列添加索引,内表依旧二级索引列覆盖
在t4上建立索引,使其执行覆盖索引
ALTER TABLE `t4` ADD INDEX t4_col_index ( `t4_col` ) ;
mysql> EXPLAIN SELECT * FROM t1 WHERE t1_col IN (SELECT t4_col FROM t4);
+----+--------------+-------------+------------+--------+---------------+--------------+---------+----------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+--------+---------------+--------------+---------+----------------+--------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | ALL | t1_col_index | NULL | NULL | NULL | 998026 | 100 | Using where |
| 1 | SIMPLE | <subquery2> | NULL | eq_ref | <auto_key> | <auto_key> | 303 | mydb.t1.t1_col | 1 | 100 | NULL |
| 2 | MATERIALIZED | t4 | NULL | index | t4_col_index | t4_col_index | 303 | NULL | 100409 | 100 | Using index |
+----+--------------+-------------+------------+--------+---------------+--------------+---------+----------------+--------+----------+-------------+
3 rows in set
mysql> SHOW WARNINGS;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `mydb`.`t1`.`id` AS `id`,`mydb`.`t1`.`t1_col` AS `t1_col`,`mydb`.`t1`.`t1_col2` AS `t1_col2` from `mydb`.`t1` semi join (`mydb`.`t4`) where (`<subquery2>`.`t4_col` = `mydb`.`t1`.`t1_col`) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
依旧会存在子查询的问题,先按照索引顺序全表扫描t4表,获取到中间表。t1表和中间表关联查询。
结论:
in是否走索引,依赖的是mysql的优化器,mysql优化器会将in优化为join语句。可能是单独的join语句;也可能是先执行in()语句,获取中间表,中间表在于外表进行join连接。
3. exists和in的使用场景
- exists子查询返回的是boolean类型,由执行计划可知,会对外表进行全表扫描,处理内表时会使用索引。适用于内表大,而外表小的情况。
- in是外表和内表做join连接,根据mysql优化器生成sql,但是一般情况下,in中的条件不能超过1000个。适合in()数量少(内部小),而外表大的情况
4. not in和not exists的区别
- 执行速度


结论:由上图可知,通常情况下,not exists的执行效率要高于not in。
- 条件中存在null

对于not in,若条件中存在null值,那么最终结果会直接停止执行并返回null结果。
网友评论