Python如何识别 MySQL 中的冗余索引

作者：Bing@DBA 时间：2024-01-13 13:44:46　

前言

最近在搞标准化巡检平台，通过 MySQL 的元数据分析一些潜在的问题。冗余索引也是一个非常重要的巡检目，表中索引过多，会导致表空间占用较大，索引的数量与表的写入速度与索引数成线性关系(微秒级)，如果发现有冗余索引，建议立即审核删除。

PS：之前见过一个客户的数据库上面竟然创建 300 多个索引！？当时的想法是 “他们在玩排列组合呢” 表写入非常慢，严重影响性能和表维护的复杂度。

脚本介绍

表结构

下方是演示的表结构：

CREATE TABLE `index_test03` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
`create_time` varchar(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uqi_name` (`name`),
KEY `idx_name` (`name`),
KEY `idx_name_createtime`(name, create_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

MySQL 元数据

MySQL 可以通过 information_schema.STATISTICS 表查询索引信息：

SELECT * from information_schema.STATISTICS where TABLE_SCHEMA = 'test02' and TABLE_NAME = 'index_test03';

TABLE_CATALOG	TABLE_SCHEMA	TABLE_NAME	NON_UNIQUE	INDEX_SCHEMA	INDEX_NAME	SEQ_IN_INDEX	COLUMN_NAME	COLLATION	SUB_PART	PACKED	INDEX_TYPE
def	test02	index_test03	0	test02	PRIMARY	1	id	A	NULL	NULL	BTREE
def	test02	index_test03	0	test02	uqi_name	1	name	A	NULL	NULL	BTREE
def	test02	index_test03	1	test02	idx_name	1	name	A	NULL	NULL	BTREE
def	test02	index_test03	1	test02	idx_name_createtime	1	name	A	NULL	NULL	BTREE
def	test02	index_test03	1	test02	idx_name_createtime	2	create_time	A	NULL	NULL	BTREE

脚本通过获得 STATISTICS 表中的索引信息来分析表中是否存在冗余索引，分析粒度为表级别。

DEMO 演示

需要使用 pandas 模块。

import pandas as pd

df_table_level = pd.read_excel('/Users/cooh/Desktop/STATISTICS.xlsx')

table_indexes = df_table_level['INDEX_NAME'].drop_duplicates().tolist()

_indexes = list()
for index_name in table_indexes:
index_info = {'index_cols': df_table_level[df_table_level['INDEX_NAME'] == index_name]['COLUMN_NAME'].tolist(),
'non_unique': df_table_level[df_table_level['INDEX_NAME'] == index_name]['NON_UNIQUE'].tolist()[0],
'index_name': index_name
}
_indexes.append(index_info)

content = ''
election_dict = {i['index_name']: 0 for i in _indexes}

while len(_indexes) > 0:
choice_index_1 = _indexes.pop(0)

for choice_index_2 in _indexes:
# 对比两个索引字段的个数，使用字段小的进行迭代
min_len = min([len(choice_index_1['index_cols']), len(choice_index_2['index_cols'])])

# 获得相似字段的个数据
similarity_col = 0
for i in range(min_len):
# print(i)
if choice_index_1['index_cols'][i] == choice_index_2['index_cols'][i]:
similarity_col += 1

# 然后进行逻辑判断
if similarity_col == 0:
# print('毫无冗余')
pass
else:
# 两个索引的字段包含内容都相同，说明两个索引完全相同，接下来就需要从中选择一个删除
if len(choice_index_1['index_cols']) == similarity_col and len(
choice_index_2['index_cols']) == similarity_col:
# 等于 0 表示有唯一约束
if choice_index_1['non_unique'] == 1:
content += '索引 {0} 与索引 {1} 重复, '.format(choice_index_2['index_name'], choice_index_1['index_name'])
election_dict[choice_index_1['index_name']] += 1
elif choice_index_2['non_unique'] == 1:
content += '索引 {0} 与索引 {1} 重复, '.format(choice_index_1['index_name'], choice_index_2['index_name'])
election_dict[choice_index_2['index_name']] += 1
else:
content += '索引 {0} 与索引 {1} 重复, '.format(choice_index_2['index_name'], choice_index_1['index_name'])
election_dict[choice_index_1['index_name']] += 1

elif len(choice_index_1['index_cols']) == similarity_col and choice_index_1['non_unique'] != 0:
content += '索引 {0} 与索引 {1} 重复, '.format(choice_index_2['index_name'], choice_index_1['index_name'])
election_dict[choice_index_1['index_name']] += 1

elif len(choice_index_2['index_cols']) == similarity_col and choice_index_1['non_unique'] != 0:
content += '索引 {0} 与索引 {1} 重复, '.format(choice_index_1['index_name'], choice_index_2['index_name'])
election_dict[choice_index_2['index_name']] += 1

redundancy_indexes = list()
for _k_name, _vote in election_dict.items():
if _vote > 0:
redundancy_indexes.append(_k_name)

content += '建议删除索引：{0}'.format(', '.join(redundancy_indexes))

print(content)

输出结果：

索引 uqi_name 与索引 idx_name 重复, 索引 idx_name_createtime 与索引 idx_name 重复, 建议删除索引：idx_name

SQL 查询冗余索引

MySQL 5.7 是可以直接通过 sys 元数据库中的视图来查冗余索引的，但是云上 RDS 用户看不到 sys 库。所以才被迫写这个脚本，因为实例太多了，一个一个看不现实。如果你是自建的 MySQL，就不用费那么大劲了，直接使用下面 SQL 来统计。

select * from sys.schema_redundant_indexes;

后记

删除索引属于高危操作，删除前需要多次 check 后再删除。上面是一个 demo 可以包装成函数，使用 pandas 以表为粒度传入数据，就可以嵌入到程序中。有问题欢迎评论沟通。

来源：https://blog.csdn.net/qq_42768234/article/details/127366182

标签：python,mysql,冗余索引

投稿

Python如何识别 MySQL 中的冗余索引

前言

脚本介绍

表结构

MySQL 元数据

DEMO 演示

SQL 查询冗余索引

后记

猜你喜欢

网站数据库，是选SQL Server还是Access好

Python flask框架post接口调用示例

网页用多宽才更合适？

如何利用PHP实现上传图片功能详解

Python学习之名字,作用域,名字空间(下)

pandas group分组与agg聚合的实例

设计的俗化特征

sqlserver中查询横表变竖表的sql语句简析

Python深度学习实战PyQt5布局管理项目示例详解

详解python基础中的for循环

语义、标准和样式

asp如何判断一个电子信箱格式是否有效？

python文件操作之目录遍历实例分析

浅析Python面向对象编程

如何使用VSCode愉快的写Python于调试配置步骤

mysql实现合并同一ID对应多条数据的方法

Python Decorator装饰器的创建方法及常用场景分析

分享一个超好用的php header下载函数

mysql8.0.12如何重置root密码

Python流行ORM框架sqlalchemy安装与使用教程