Python聚类算法之DBSACN实例分析

作者:intergret 时间:2021-03-26 00:11:10 

本文实例讲述了Python聚类算法之DBSACN。分享给大家供大家参考,具体如下:

DBSCAN:是一种简单的,基于密度的聚类算法。本次实现中,DBSCAN使用了基于中心的方法。在基于中心的方法中,每个数据点的密度通过对以该点为中心以边长为2*EPs的网格(邻域)内的其他数据点的个数来度量。根据数据点的密度分为三类点:

核心点:该点在邻域内的密度超过给定的阀值MinPs。
边界点:该点不是核心点,但是其邻域内包含至少一个核心点。
噪音点:不是核心点,也不是边界点。

有了以上对数据点的划分,聚合可以这样进行:各个核心点与其邻域内的所有核心点放在同一个簇中,把边界点跟其邻域内的某个核心点放在同一个簇中。


# scoding=utf-8
import pylab as pl
from collections import defaultdict,Counter
points = [[int(eachpoint.split("#")[0]), int(eachpoint.split("#")[1])] for eachpoint in open("points","r")]
# 计算每个数据点相邻的数据点,邻域定义为以该点为中心以边长为2*EPs的网格
Eps = 10
surroundPoints = defaultdict(list)
for idx1,point1 in enumerate(points):
 for idx2,point2 in enumerate(points):
   if (idx1 < idx2):
     if(abs(point1[0]-point2[0])<=Eps and abs(point1[1]-point2[1])<=Eps):
       surroundPoints[idx1].append(idx2)
       surroundPoints[idx2].append(idx1)
# 定义邻域内相邻的数据点的个数大于4的为核心点
MinPts = 5
corePointIdx = [pointIdx for pointIdx,surPointIdxs in surroundPoints.iteritems() if len(surPointIdxs)>=MinPts]
# 邻域内包含某个核心点的非核心点,定义为边界点
borderPointIdx = []
for pointIdx,surPointIdxs in surroundPoints.iteritems():
 if (pointIdx not in corePointIdx):
   for onesurPointIdx in surPointIdxs:
     if onesurPointIdx in corePointIdx:
       borderPointIdx.append(pointIdx)
       break
# 噪音点既不是边界点也不是核心点
noisePointIdx = [pointIdx for pointIdx in range(len(points)) if pointIdx not in corePointIdx and pointIdx not in borderPointIdx]
corePoint = [points[pointIdx] for pointIdx in corePointIdx]
borderPoint = [points[pointIdx] for pointIdx in borderPointIdx]
noisePoint = [points[pointIdx] for pointIdx in noisePointIdx]
# pl.plot([eachpoint[0] for eachpoint in corePoint], [eachpoint[1] for eachpoint in corePoint], 'or')
# pl.plot([eachpoint[0] for eachpoint in borderPoint], [eachpoint[1] for eachpoint in borderPoint], 'oy')
# pl.plot([eachpoint[0] for eachpoint in noisePoint], [eachpoint[1] for eachpoint in noisePoint], 'ok')
groups = [idx for idx in range(len(points))]
# 各个核心点与其邻域内的所有核心点放在同一个簇中
for pointidx,surroundIdxs in surroundPoints.iteritems():
 for oneSurroundIdx in surroundIdxs:
   if (pointidx in corePointIdx and oneSurroundIdx in corePointIdx and pointidx < oneSurroundIdx):
     for idx in range(len(groups)):
       if groups[idx] == groups[oneSurroundIdx]:
         groups[idx] = groups[pointidx]
# 边界点跟其邻域内的某个核心点放在同一个簇中
for pointidx,surroundIdxs in surroundPoints.iteritems():
 for oneSurroundIdx in surroundIdxs:
   if (pointidx in borderPointIdx and oneSurroundIdx in corePointIdx):
     groups[pointidx] = groups[oneSurroundIdx]
     break
# 取簇规模最大的5个簇
wantGroupNum = 3
finalGroup = Counter(groups).most_common(3)
finalGroup = [onecount[0] for onecount in finalGroup]
group1 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[0]]
group2 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[1]]
group3 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[2]]
pl.plot([eachpoint[0] for eachpoint in group1], [eachpoint[1] for eachpoint in group1], 'or')
pl.plot([eachpoint[0] for eachpoint in group2], [eachpoint[1] for eachpoint in group2], 'oy')
pl.plot([eachpoint[0] for eachpoint in group3], [eachpoint[1] for eachpoint in group3], 'og')
# 打印噪音点,黑色
pl.plot([eachpoint[0] for eachpoint in noisePoint], [eachpoint[1] for eachpoint in noisePoint], 'ok')  
pl.show()

运行效果截图如下:

Python聚类算法之DBSACN实例分析

希望本文所述对大家Python程序设计有所帮助。

标签:Python,算法
0
投稿

猜你喜欢

  • 如何安装控制器JavaScript生成插件详解

    2024-04-10 10:51:51
  • 基于python的itchat库实现微信聊天机器人(推荐)

    2021-11-30 13:54:21
  • Python2中的raw_input() 与 input()

    2022-05-16 07:42:06
  • MySQL安全性指南(3)(转)

    2024-01-21 23:46:07
  • 纯CSS无限级下拉菜单

    2009-09-17 11:29:00
  • jupyter实现重新加载模块

    2023-12-16 20:46:45
  • pd.read_csv读取文件路径出现的问题解决

    2022-03-16 06:12:53
  • 使用python实现回文数的四种方法小结

    2022-01-17 14:57:51
  • Python脚本实现自动发带图的微博

    2021-04-18 08:37:01
  • 全面解析Bootstrap表单使用方法(表单按钮)

    2024-05-10 14:08:49
  • 在MySQL中获得更好的全文搜索结果

    2008-05-09 10:38:00
  • Oracle三种上载文件技术

    2010-07-16 13:34:00
  • 利用Python实现图书超期提醒

    2021-03-25 18:58:05
  • python中实现延时回调普通函数示例代码

    2023-10-03 02:17:04
  • 深入浅出分析Python装饰器用法

    2022-10-25 16:49:02
  • python 字符串只保留汉字的方法

    2022-07-15 00:34:49
  • 从其他电脑访问本机的Mysql的设置方法

    2024-01-17 10:25:57
  • iframe框架用JavaScript子页面控制父页面

    2009-01-19 13:43:00
  • 在Python中使用Neo4j的方法

    2023-01-16 04:12:39
  • python 如何停止一个死循环的线程

    2021-04-17 04:25:34
  • asp之家 网络编程 m.aspxhome.com