python基础教程项目四之新闻聚合

作者：the5fire 时间：2021-10-03 01:31:33　

《python基础教程》书中的第四个练习，新闻聚合。现在很少见的一类应用，至少我从来没有用过，又叫做Usenet。这个程序的主要功能是用来从指定的来源（这里是Usenet新闻组）收集信息，然后讲这些信息保存到指定的目的文件中（这里使用了两种形式：纯文本和html文件）。这个程序的用处有些类似于现在的博客订阅工具或者叫RSS订阅器。

先上代码，然后再来逐一分析：

from nntplib import NNTP
from time import strftime,time,localtime
from email import message_from_string
from urllib import urlopen
import textwrap
import re
day = 24*60*60
def wrap(string,max=70):
'''
'''
return '\n'.join(textwrap.wrap(string)) + '\n'
class NewsAgent:
'''
'''
def __init__(self):
self.sources = []
self.destinations = []
def addSource(self,source):
self.sources.append(source)
def addDestination(self,dest):
self.destinations.append(dest)
def distribute(self):
items = []
for source in self.sources:
items.extend(source.getItems())
for dest in self.destinations:
dest.receiveItems(items)
class NewsItem:
def __init__(self,title,body):
self.title = title
self.body = body
class NNTPSource:
def __init__(self,servername,group,window):
self.servername = servername
self.group = group
self.window = window
def getItems(self):
start = localtime(time() - self.window*day)
date = strftime('％y％m％d',start)
hour = strftime('％H％M％S',start)
server = NNTP(self.servername)
ids = server.newnews(self.group,date,hour)[1]
for id in ids:
lines = server.article(id)[3]
message = message_from_string('\n'.join(lines))
title = message['subject']
body = message.get_payload()
if message.is_multipart():
body = body[0]
yield NewsItem(title,body)
server.quit()
class SimpleWebSource:
def __init__(self,url,titlePattern,bodyPattern):
self.url = url
self.titlePattern = re.compile(titlePattern)
self.bodyPattern = re.compile(bodyPattern)
def getItems(self):
text = urlopen(self.url).read()
titles = self.titlePattern.findall(text)
bodies = self.bodyPattern.findall(text)
for title.body in zip(titles,bodies):
yield NewsItem(title,wrap(body))
class PlainDestination:
def receiveItems(self,items):
for item in items:
print item.title
print '-'*len(item.title)
print item.body
class HTMLDestination:
def __init__(self,filename):
self.filename = filename
def receiveItems(self,items):
out = open(self.filename,'w')
print >> out,'''
<html>
<head>
<title>Today's News</title>
</head>
<body>
<h1>Today's News</hi>
'''
print >> out, '<ul>'
id = 0
for item in items:
id += 1
print >> out, '<li><a href="#" rel="external nofollow" >％s</a></li>' ％ (id,item.title)
print >> out, '</ul>'
id = 0
for item in items:
id += 1
print >> out, '<h2><a name="％i">％s</a></h2>' ％ (id,item.title)
print >> out, '<pre>％s</pre>' ％ item.body
print >> out, '''
</body>
</html>
'''
def runDefaultSetup():
agent = NewsAgent()
bbc_url = 'http://news.bbc.co.uk/text_only.stm'
bbc_title = r'(?s)a href="[^" rel="external nofollow" ]*">\s*<b>\s*(.*?)\s*</b>'
bbc_body = r'(?s)</a>\s*<br/>\s*(.*?)\s*<'
bbc = SimpleWebSource(bbc_url, bbc_title, bbc_body)
agent.addSource(bbc)
clpa_server = 'news2.neva.ru'
clpa_group = 'alt.sex.telephone'
clpa_window = 1
clpa = NNTPSource(clpa_server,clpa_group,clpa_window)
agent.addSource(clpa)
agent.addDestination(PlainDestination())
agent.addDestination(HTMLDestination('news.html'))
agent.distribute()
if __name__ == '__main__':
runDefaultSetup()

这个程序，首先从整体上进行分析，重点部分在于NewsAgent，它的作用是存储新闻来源，存储目标地址，然后在分别调用来源服务器（NNTPSource以及SimpleWebSource）以及写新闻的类（PlainDestination和HTMLDestination）。所以从这里也看的出，NNTPSource是专门用来获取新闻服务器上的信息的，SimpleWebSource是获取一个url上的数据的。而PlainDestination和HTMLDestination的作用很明显，前者是用来输出获取到的内容到终端的，后者是写数据到html文件中的。

有了这些分析，然后在来看主程序中的内容，主程序就是来给NewsAgent添加信息源和输出目的地址的。

这确实是个简单的程序，不过这个程序可是用到了分层了。

来源：https://www.the5fire.com/python-pro4-newsagent.html

标签：python,基础教程,新闻聚合

投稿

python基础教程项目四之新闻聚合

猜你喜欢

Python列表切片常用操作实例解析

一文带你学会MySQL的select语句

如何运用python读写CSV文件

SQL语句单引号与双引号的使用方法

python使用pandas按照行数分割表格

非原型不设计

Java 数据库连接池Druid 的介绍

TypeScript学习之强制类型的转换

Django中使用Celery的教程详解

python生成遍历暴力破解密码的方法

node.js中使用socket.io制作命名空间

pygame学习笔记（2）：画点的三种方法和动画实例

python解压TAR文件至指定文件夹的实例

Python实战之手写一个搜索引擎

浅谈php自定义错误日志

python如何让类支持比较运算

Sql server2005 优化查询速度50个方法小结

php随机取mysql记录方法小结

python函数和python匿名函数lambda详解

Python Sqlalchemy如何实现select for update

python基础教程项目四之新闻聚合

猜你喜欢

Python列表切片常用操作实例解析

一文带你学会MySQL的select语句

如何运用python读写CSV文件

SQL语句单引号与双引号的使用方法

python使用pandas按照行数分割表格

非原型 不设计

Java 数据库连接池Druid 的介绍

TypeScript学习之强制类型的转换

Django中使用Celery的教程详解

python生成遍历暴力破解密码的方法

node.js中使用socket.io制作命名空间

pygame学习笔记（2）：画点的三种方法和动画实例

python解压TAR文件至指定文件夹的实例

Python实战之手写一个搜索引擎

浅谈php自定义错误日志

python如何让类支持比较运算

Sql server2005 优化查询速度50个方法小结

php随机取mysql记录方法小结

python函数和python匿名函数lambda详解

Python Sqlalchemy如何实现select for update

非原型不设计