python抓取京东商城手机列表url实例代码
时间:2022-11-11 18:23:04
#-*- coding: UTF-8 -*-
'''
Created on 2013-12-5
@author: good-temper
'''
import urllib2
import bs4
import time
def getPage(urlStr):
'''
获取页面内容
'''
content = urllib2.urlopen(urlStr).read()
return content
def getNextPageUrl(currPageNum):
#http://list.jd.com/9987-653-655-0-0-0-0-0-0-0-1-1-页码-1-1-72-4137-33.html
url = u'http://list.jd.com/9987-653-655-0-0-0-0-0-0-0-1-1-'+str(currPageNum+1)+'-1-1-72-4137-33.html'
#是否有下一页
content = getPage(url);
soup = bs4.BeautifulSoup(content)
list = soup.findAll('span',{'class':'next-disabled'});
if(len(list) == 0):
return url
return ''
def analyzeList():
pageNum = 0
list = []
url = getNextPageUrl(pageNum)
while url !='':
soup = bs4.BeautifulSoup(getPage(url))
pagelist = soup.findAll('div',{'class':'p-name'})
for elem in pagelist:
soup1 = bs4.BeautifulSoup(str(elem))
list.append(soup1.find('a')['href'])
pageNum = pageNum+1
print pageNum
url = getNextPageUrl(pageNum)
return list
def analyzeContent(url):
return ''
def writeToFile(list, path):
f = open(path, 'a')
for elem in list:
f.write(elem+'\n')
f.close()
if __name__ == '__main__':
list = analyzeList()
print '共抓取'+str(len(list))+'条\n'
writeToFile(list, u'E:\\jd_phone_list.dat');
![](/images/zang.png)
![](/images/jiucuo.png)
猜你喜欢
Python对XML文件实现增删改查操作
Python 实现中值滤波、均值滤波的方法
![](https://img.aspxhome.com/file/2023/0/121100_0s.jpg)
tensorflow实现简单的卷积神经网络
用ASP对网页进行限制性的访问
asp生成静态HTML(动态读取)
Python matplotlib如何简单绘制不同类型的表格
![](https://img.aspxhome.com/file/2023/7/75887_0s.png)
Python实用技巧之利用元组代替字典并为元组元素命名
python中Pexpect的工作流程实例讲解
django rest framework 数据的查找、过滤、排序的示例
![](https://img.aspxhome.com/file/2023/2/92152_0s.png)
Python进行特征提取的示例代码
sql2000各个版本区别总结第1/3页
Oracle RMAN还原时set newname文件名有空格报错的解决方法
![](https://img.aspxhome.com/file/2023/1/67531_0s.png)
Python开发之Nginx+uWSGI+virtualenv多项目部署教程
![](https://img.aspxhome.com/file/2023/9/61689_0s.jpg)
MYSQL表优化方法小结 讲的挺全面
YUI Compressor快速使用指南
解析PHP观察者模式Observer
![](https://img.aspxhome.com/file/2023/2/55402_0s.jpg)
Golang实现http文件上传小功能的案例
Select的OnChange()事件
Python设计模式行为型责任链模式
![](https://img.aspxhome.com/file/2023/2/77012_0s.png)