版权声明:本文为博主原创文章,未经博主允许不得转载。https://www.jianshu.com/u/c801d8028b0c
Hive Partitions
mysql> use metastore;
select count(*) from DBS;
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.00 sec)
mysql> select *from dbs;
+-------+-----------------------+-----------------------------------------------------+----------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+-----------------------------------------------------+----------+------------+------------+
| 1 | Default Hive database | hdfs://nameservice1/user/hive/warehouse | default | public | ROLE |
| 24340 | NULL | hdfs://nameservice1/user/hive/warehouse/bi_svcdb.db | bi_svcdb | hive | USER |
| 73526 | NULL | hdfs://nameservice1/user/hive/warehouse/bi_alydb.db | bi_alydb | hive | USER |
| 81180 | NULL | hdfs://nameservice1/user/hive/warehouse/bi_kyndb.db | bi_kyndb | hive | USER |
+-------+-----------------------+-----------------------------------------------------+----------+------------+------------+
4 rows in set (0.00 sec)
查看库中表数量
select TBLS.DB_ID,NAME,count(*) TAB_NUM from TBLS join DBS on TBLS.DB_ID = DBS.DB_ID group by DB_ID,NAME order by TAB_NUM
+-------+----------+---------+
| DB_ID | NAME | TAB_NUM |
+-------+----------+---------+
| 81180 | bi_kyndb | 87 |
| 73526 | bi_alydb | 88 |
| 24340 | bi_svcdb | 133 |
+-------+----------+---------+
Hive的表分区数量需要控制
通过Hive的元数据库可以统计目前整个Hive中的表和分区数量。经过查看,目前存在一些分区数量超过1000个分区的表,这些表可能会影响集群的整体访问效率。目前在集群中可以看到的超过1000个分区的表如下。
select TBLS.TBL_ID, DBS.NAME, TBL_NAME, count(*) as PART_NUM from PARTITIONS join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID join DBS on TBLS.DB_ID = DBS.DB_ID group by TBL_ID, TBL_NAME, DBS.NAME order by PART_NUM desc limit 50;
+--------+----------+------------------------------------------+----------+
| TBL_ID | NAME | TBL_NAME | PART_NUM |
+--------+----------+------------------------------------------+----------+
| 24551 | bi_svcdb | tbl_glhis_ct_settle_dtl_bas | 3290 |
| 24552 | bi_svcdb | tbl_glhis_ct_settle_dtl_bas_sor | 3290 |
| 53103 | bi_svcdb | tmp_tbl_glhis_bat_dtl_bas | 3239 |
| 61607 | bi_svcdb | tmp_rtdtrs_dtl_cups_fee | 2497 |
| 64914 | bi_svcdb | tmp_rtdtrs_dtl_cups | 2497 |
| 53042 | bi_svcdb | tbl_bisvc_cups_fee_day_dtl | 2408 |
| 24554 | bi_svcdb | tbl_mahis_trans_log_ssa | 2286 |
| 60705 | bi_svcdb | tmp_tbl_mahis_bat_trans_log | 2223 |
| 64990 | bi_svcdb | tmp_rtdtrs_dtl_cross_dist_scratch | 2191 |
| 53057 | bi_svcdb | tbl_bisvc_mchnt_stat_day_scl | 1828 |
| 53040 | bi_svcdb | tbl_bisvc_achis_ma_cups_trans_dtl | 1828 |
| 53054 | bi_svcdb | tbl_bisvc_ins_stat_day_scl | 1828 |
| 60702 | bi_svcdb | tmp_stdtrs_scl_mchnt_stat | 1777 |
| 60696 | bi_svcdb | tmp_stdtrs_scl_ins_stat | 1774 |
| 73851 | bi_svcdb | tbl_achis_trans_flow_ssa | 1570 |
| 60704 | bi_svcdb | tmp_tbl_achis_sor_trans_flow | 1507 |
| 67800 | bi_svcdb | tmp_dtdtrs_dtl_achis_ma_cups | 1462 |
| 53062 | bi_svcdb | tbl_ibhis_crs_dst_match_dtl | 1282 |
| 24560 | bi_svcdb | tbl_swhis_cuslt_swt_log | 996 |
| 53059 | bi_svcdb | tbl_glhis_ct_settle_dtl_fee | 886 |
| 53060 | bi_svcdb | tbl_glhis_lego_dtl_fee | 855 |
| 53061 | bi_svcdb | tbl_glhis_lego_svc_dtl_fee | 785 |
| 99666 | bi_alydb | tmp_stdtrs_bsl_token | 783 |
| 53079 | bi_svcdb | tbl_qrhis_trans_flow | 717 |
Hive中的数据库中总分区数量建议控制在10万个以内,目前存在的数据库尚未有超过该值的。
select DBS.DB_ID, NAME, count(*) as part_num from PARTITIONS, TBLS, DBS where PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.DB_ID = DBS.DB_ID group by DB_ID, NAME order by part_num desc limit 30;
+-------+----------+----------+
| DB_ID | NAME | part_num |
+-------+----------+----------+
| 24340 | bi_svcdb | 46232 |
| 73526 | bi_alydb | 2001 |
+-------+----------+----------+
网友评论