python 批量修改 labelImg 生成的xml文件的方法

作者：Miscellaneous0712 时间：2022-09-03 12:04:23　

概述

自己在用labelImg打好标签后，想只用其中几类训练，不想训练全部类别，又不想重新打标生成.xml文件，因此想到这个办法：直接在.xml文件中删除原有的不需要的标签类及其属性。

打标时标签名出现了大小写（工程量大时可能会手滑），程序中有改写标签值为小写的过程，因为我做py-faster-rcnn 训练时，标签必须全部为小写。

以如下的.xml文件为例，我故意把标签增加了大写

<annotation verified="yes">
<filename>test.jpg</filename>
<path>C:\Users\yasin\Desktop\test</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>400</width>
<height>300</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>People</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>80</xmin>
<ymin>69</ymin>
<xmax>144</xmax>
<ymax>89</ymax>
</bndbox>
</object>
<object>
<name>CAT</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>40</xmin>
<ymin>69</ymin>
<xmax>143</xmax>
<ymax>16</ymax>
</bndbox>
</object>
<object>
<name>dog</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>96</xmin>
<ymin>82</ymin>
<xmax>176</xmax>
<ymax>87</ymax>
</bndbox>
</object>
</annotation>

具体实现

假如我们只想保留图片上的people和cat类，其他都删除，代码如下：

from xml.etree.ElementTree import ElementTree
from os import walk, path

def read_xml(in_path):
tree = ElementTree()
tree.parse(in_path)
return tree

def write_xml(tree, out_path):
tree.write(out_path, encoding="utf-8", xml_declaration=True)

def find_nodes(tree, path):
return tree.findall(path)

def del_node_by_target_classes(nodelist, target_classes_lower, tree_root):
for parent_node in nodelist:
children = parent_node.getchildren()
if (parent_node.tag == "object" and children[0].text.lower() not in target_classes_lower):
tree_root.remove(parent_node)
elif (parent_node.tag == "object" and children[0].text.lower() in target_classes_lower):
children[0].text = children[0].text.lower()

def get_fileNames(rootdir):
data_path = []
prefixs = []
for root, dirs, files in walk(rootdir, topdown=True):
for name in files:
pre, ending = path.splitext(name)
if ending != ".xml":
continue
else:
data_path.append(path.join(root, name))
prefixs.append(pre)

return data_path, prefixs

if __name__ == "__main__":
# get all the xml paths, prefixes if not used here
paths_xml, prefixs = get_fileNames("/home/yasin/old_labels/")

target_classes = ["PEOPLE", "CAT"] # target flags you want to keep

target_classes_lower = []
for i in range(len(target_classes)):
target_classes_lower.append(target_classes[i].lower()) # make sure your target is lowe-case

# print(target_classes_lower)
for i in range(len(paths_xml)):
# rename and save the corresponding xml
tree = read_xml(paths_xml[i])

# get tree node
tree_root = tree.getroot()

# get parent nodes
del_parent_nodes = find_nodes(tree, "./")

# get target classes and delete
target_del_node = del_node_by_target_classes(del_parent_nodes, target_classes_lower, tree_root)

# save output xml, 000001.xml
write_xml(tree, "/home/yasin/new_labels/{}.xml".format("％06d" ％ i))

按照上述代码，示例.xml变为如下.xml，可以看出我们删除了除people和cat类的类别（即dog类），并把保留类别的打标改成了小写：

<?xml version='1.0' encoding='utf-8'?>
<annotation verified="yes">
<filename>test.jpg</filename>
<path>C:\Users\yasin\Desktop\test</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>400</width>
<height>300</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>people</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>80</xmin>
<ymin>69</ymin>
<xmax>144</xmax>
<ymax>89</ymax>
</bndbox>
</object>
<object>
<name>cat</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>40</xmin>
<ymin>69</ymin>
<xmax>143</xmax>
<ymax>16</ymax>
</bndbox>
</object>
</annotation>

来源：https://blog.csdn.net/zhou4411781/article/details/96650819

标签：python,labelImg,xml文件

投稿

python 批量修改 labelImg 生成的xml文件的方法

猜你喜欢

使用matplotlib绘制图例标签中带有公式的图

利用XMLHTTP检测网址及探测服务器类型

ASP防止图片木马上传的代码

利用Python提取PDF文本的简单方法实例

这样写python注释让代码更加的优雅

Asp实现伪静态的方法

SQL Server 2005改进后的几个实用新特性

微信小程序学习笔记之表单提交与PHP后台数据交互处理图文详解

解决pandas展示数据输出时列名不能对齐的问题

Python中的 any() 函数和 all() 函数

Pandas 同元素多列去重的实例

15款最佳jQuery LightBox插件

Access2000迁移到Oracle9i要点

使用 XML 文件记录操作日志

python算法学习之桶排序算法实例(分块排序)

pyqt5移动鼠标显示坐标的方法

Python闭包装饰器使用方法汇总

OpenCV计算平均值cv::mean实例代码

asp自动采集程序

python中pytest收集用例规则与运行指定用例详解