使用python opencv对目录下图片进行去重的方法
作者:Sand_Ng 时间:2023-07-06 20:04:13
版本:
平台:ubuntu 14 / I5 / 4G内存
python版本:python2.7
opencv版本:2.13.4
依赖:
如果系统没有python,则需要进行安装
sudo apt-get install python
sudo apt-get install python-dev
sudo apt-get install python-pip
sudo pip install numpy mathplotlib
sudo apt-get install libcv-dev
sudo apt-get install python-opencv
使用感知哈希算法进行图片去重
原理:对每个文件进行遍历所有进行去重,因此图片越多速度越慢,但是可以节省手动操作
感知哈希原理:
1、需要比较的图片都缩放成8*8大小的灰度图
2、获得每个图片每个像素与平均值的比较,得到指纹
3、根据指纹计算汉明距离
5、如果得出的不同的元素小于5则为相同(相似?)的图片
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import cv2
import numpy as np
import os,sys,types
def cmpandremove2(path):
dirs = os.listdir(path)
dirs.sort()
if len(dirs) <= 0:
return
dict={}
for i in dirs:
prepath = path + "/" + i
preimg = cv2.imread(prepath)
if type(preimg) is types.NoneType:
continue
preresize = cv2.resize(preimg, (8,8))
pregray = cv2.cvtColor(preresize, cv2.COLOR_BGR2GRAY)
premean = cv2.mean(pregray)[0]
prearr = np.array(pregray.data)
for j in range(0,len(prearr)):
if prearr[j] >= premean:
prearr[j] = 1
else:
prearr[j] = 0
print "get", prepath
dict[i] = prearr
dictkeys = dict.keys()
dictkeys.sort()
index = 0
while True:
if index >= len(dictkeys):
break
curkey = dictkeys[index]
dellist=[]
print curkey
index2 = index
while True:
if index2 >= len(dictkeys):
break
j = dictkeys[index2]
if curkey == j:
index2 = index2 + 1
continue
arr1 = dict[curkey]
arr2 = dict[j]
diff = 0
for k in range(0,len(arr2)):
if arr1[k] != arr2[k]:
diff = diff + 1
if diff <= 5:
dellist.append(j)
index2 = index2 + 1
if len(dellist) > 0:
for j in dellist:
file = path + "/" + j
print "remove", file
os.remove(file)
dict.pop(j)
dictkeys = dict.keys()
dictkeys.sort()
index = index + 1
def cmpandremove(path):
index = 0
flag = 0
dirs = os.listdir(path)
dirs.sort()
if len(dirs) <= 0:
return 0
while True:
if index >= len(dirs):
break
prepath = path + dirs[index]
print prepath
index2 = 0
preimg = cv2.imread(prepath)
if type(preimg) is types.NoneType:
index = index + 1
continue
preresize = cv2.resize(preimg,(8,8))
pregray = cv2.cvtColor(preresize, cv2.COLOR_BGR2GRAY)
premean = cv2.mean(pregray)[0]
prearr = np.array(pregray.data)
for i in range(0,len(prearr)):
if prearr[i] >= premean:
prearr[i] = 1
else:
prearr[i] = 0
removepath = []
while True:
if index2 >= len(dirs):
break
if index2 != index:
curpath = path + dirs[index2]
#print curpath
curimg = cv2.imread(curpath)
if type(curimg) is types.NoneType:
index2 = index2 + 1
continue
curresize = cv2.resize(curimg, (8,8))
curgray = cv2.cvtColor(curresize, cv2.COLOR_BGR2GRAY)
curmean = cv2.mean(curgray)[0]
curarr = np.array(curgray.data)
for i in range(0,len(curarr)):
if curarr[i] >= curmean:
curarr[i] = 1
else:
curarr[i] = 0
diff = 0
for i in range(0,len(curarr)):
if curarr[i] != prearr[i] :
diff = diff + 1
if diff <= 5:
print 'the same'
removepath.append(curpath)
flag = 1
index2 = index2 + 1
index = index + 1
if len(removepath) > 0:
for file in removepath:
print "remove", file
os.remove(file)
dirs = os.listdir(path)
dirs.sort()
if len(dirs) <= 0:
return 0
#index = 0
return flag
def main(argv):
if len(argv) <= 1:
print "command error"
return -1
if os.path.exists(argv[1]) is False:
return -1
path = argv[1]
'''
while True:
if cmpandremove(path) == 0:
break
'''
cmpandremove(path)
return 0
if __name__ == '__main__':
main(sys.argv)
为了节省操作,遍历所有目录,把想要去重的目录遍历一遍
#!/bin/bash
indir=$1
addcount=0
function intest()
{
for file in $1/*
do
echo $file
if test -d $file
then
~/similar.py $file/
intest $file
fi
done
}
intest $indir
来源:https://blog.csdn.net/shan_xg/article/details/79448314
标签:python,opencv,图片,去重
![](/images/zang.png)
![](/images/jiucuo.png)
猜你喜欢
Python OpenCV图像指定区域裁剪的实现
2021-12-30 01:41:35
Python基于二分查找实现求整数平方根的方法
2023-01-17 01:33:49
MySQL 一次执行多条语句的实现及常见问题
2024-01-12 20:03:23
“您无权查看该网页”的原因和解决方法
2008-03-24 16:57:00
python3中关于excel追加写入格式被覆盖问题(实例代码)
2022-10-18 14:23:25
Linux VPS备份教程 数据库/网站文件自动定时备份
2024-01-14 21:41:42
![](https://img.aspxhome.com/file/2023/7/75837_0s.jpg)
解读tf.keras.layers模块中的函数
2023-04-02 04:26:29
python实现爬取百度图片的方法示例
2021-11-22 00:46:04
IE中radio 或checkbox的checked属性初始状态下不能选中显示问题
2024-05-10 14:06:42
windows及linux环境下永久修改pip镜像源的方法
2021-02-19 09:08:59
![](https://img.aspxhome.com/file/2023/3/86473_0s.jpg)
Python+OpenCV图片局部区域像素值处理详解
2023-10-26 12:59:22
详细介绍Python进度条tqdm的使用
2022-12-13 19:45:08
![](https://img.aspxhome.com/file/2023/9/77619_0s.gif)
css布局查看器
2008-10-29 11:22:00
python 监听salt job状态,并任务数据推送到redis中的方法
2022-09-14 05:19:47
python安装numpy和pandas的方法步骤
2023-05-19 09:43:09
详解在spring中使用JdbcTemplate操作数据库的几种方式
2024-01-29 09:29:50
Vue+tracking.js 实现前端人脸检测功能
2024-05-05 09:24:56
![](https://img.aspxhome.com/file/2023/4/128894_0s.png)
浅析Sql server锁,独占锁,共享锁,更新锁,乐观锁,悲观锁
2024-01-14 01:53:11
C#中实现查找mysql的安装路径
2024-01-24 05:48:15
python实现动态数组的示例代码
2023-10-22 16:02:07