为什么查询走索引会更慢--回表问题

作者: 洋仔聊编程 | 来源:发表于2018-12-20 19:30 被阅读72次

为什么查询走索引会更慢--回表问题
63 MySQL实战性能优化-optimizer_trace
MySQL中的回表查询与索引覆盖
MySQL in 走不走索引？
mysql查询优化
MySQL正确使用索引
MYSQL查询优化
MYSQL覆盖索引
对于Mysql索引的理解（二）
sqlserver group by 非聚集多字段组合索引性能慢

环境

数据库：TiDB数据库（和mysql数据库类似的数据库）
- 表名：index_basedata
- 表数据：13 000 000条数据
- 表索引：包含一个普通索引，索引列 ”year“
- 测试sql：
  - SQL1 : select brand from index_basedata where year = 2018 group by day limit 5;
  - SQL2 : select brand from index_basedata where month = 201807 group by day limit 5;
  - SQL3 : select brand from index_basedata where year = 2018 limit 5;
  - SQL4 : select brand from index_basedata where month = 201807 limit 5;
  - sql1与sql2对比， sql3与sql4对比

问题

测试sql执行时间：
* SQL1 : 23.6s
* SQL2 : 4.5s
* SQL3 : 0.007s
* SQL4 : 1.4s
- explain解释：了解TiDB数据库相关与explain请移步：https://pingcap.com/docs-cn/overview/#tidb-%e6%95%b4%e4%bd%93%e6%9e%b6%e6%9e%84
  - sql1:select brand from xcar_index_basedata_noprice where year = 2018 group by day limit 5;
  - sql2：select brand from xcar_index_basedata_noprice where month = 201807 group by day limit 5;
  - sql3:select brand from xcar_index_basedata_noprice where year = 2018 limit 5;
  - sql4:select brand from xcar_index_basedata_noprice where month = 201807 limit 5;
- 从图中可以看出，sql1和sql3使用了索引，sql2和sql4没有使用索引。
- 对于sql1和sql2，本应该使用所以的查询时间少，但是使用了索引的sql1使用的时间是没有使用索引的sql2查询时间的5倍，为什么？
- 对于sql3和sql4，恢复了正常，使用索引比不使用索引查询速度快，为什么上述两个现象会相差如此之大？

解答（以下为个人理解，不同理解请不吝指教）

在sql1和sql2中，sql1索引列获取数据的速度大于sql2中获得数据的速度。但是在group by时在sql1中，使用索引得到的地址，需要回表才可以得到真实的数据，需要根据地址去获取数据，数据回表问题严重。
在sql2中获取的是直接数据，group by 不需要回表。
sql2,sql3,sql4表现正常。

什么是回表？

通俗的讲：如果索引的列在select所需获得列中就不需要回表，如果select所需获得列中有大量的非索引列，索引就需要到表中找到相应的列的信息，这就叫回表。
案例：
测试环境：与上述相同
测试sql：
- sql1：select brand from index_basedata where year = 2018 group by day limit 5;
- 执行时间：21.8s
- explain一下：
- 使用了索引“year”, 则索引列为year，但是select brand from..中brand并不是索引列，就需要回表（通过图也可以看出，进行了tablescan，另外其中的IndexLookUp也说明了进行了回表），所以花费时间长
- sql2：select brand from index_basedata where year = 2018 group by year limit 5;
- 执行时间：21.7s
- explain一下：
- 使用了索引“year”, 则索引列为year，但是select brand from..中brand并不是索引列，就需要回表（通过图也可以看出，进行了tablescan,另外其中的IndexLookUp也说明了进行了回表），所以花费时间长，另外，对于sql2中的group by使用的是索引列，所以使用的StreamAgg，不同于sql1
- sql3：select year from index_basedata where year = 2018 group by year limit 5;
- 执行时间：2.5s
- explain一下：
- 可以看到：没有tablescan，也没有使用IndexLookUp而是IndexReader说明直接从索引中读取索引列并使用。
总结：在上述案例中，sql3使用了索引列，没有进行回表，sql1与sql2进行了回表，所以花费时间长。所以说，发生严重的回表的时候，查询速度比不使用索引还慢。

那么，下面的5个sql执行时间能不能理解呢？

select year from xcar_index_basedata_noprice where year = 2018 group by year limit 5;
select month from xcar_index_basedata_noprice where month = 201807 group by month limit 5
select brand_id from xcar_index_basedata_noprice where month = 201807 group by month limit 5;
select brand_id from xcar_index_basedata_noprice where year = 2018 group by year limit 5;
select brand_id from xcar_index_basedata_noprice where year = 2018 group by month limit 5;
对应执行时间：