python制作花瓣网美女图片爬虫
作者:hebedich 时间:2023-05-20 01:51:55
花瓣图片的加载使用了延迟加载的技术,源代码只能下载20多张图片,修改后基本能下载所有的了,只是速度有点慢,后面再优化下
import urllib, urllib2, re, sys, os,requests
path=r"C:\wqa\beautify"
url = 'http://huaban.com/favorite/beauty'
#http://huaban.com/explore/zhongwenlogo/?ig1un9tq&max=327773629&limit=20&wfl=1
i_headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36"}
count=0
def urlHandle(url):
req = urllib2.Request(url, headers=i_headers)
html = urllib2.urlopen(req).read()
reg = re.compile(r'"pin_id":(\d+),.+?"file":{"farm":"farm1", "bucket":"hbimg",.+?"key":"(.*?)",.+?"type":"image/(.*?)"', re.S)
groups = re.findall(reg, html)
return groups
def imgHandle(groups):
if groups:
for att in groups:
pin_id = att[0]
att_url = att[1] + '_fw236'
img_type = att[2]
img_url = 'http://img.hb.aicdn.com/' + att_url
r = requests.get(img_url)
with open(path + att_url + '.' + img_type, 'wb') as fd:
for chunk in r.iter_content():
fd.write(chunk)
groups = urlHandle(url)
imgHandle(groups)
while(groups):
count+=1
print count
pin_id = groups[-1][0]
print pin_id
urltemp = url+'/?max=' + str(pin_id) + '&limit=' + str(20) + '&wfl=1'
print(urltemp)
groups = urlHandle(urltemp)
#print groups
imgHandle(groups)
标签:python,爬虫
![](/images/zang.png)
![](/images/jiucuo.png)
猜你喜欢
Golang利用channel协调协程的方法详解
2024-05-08 10:21:54
javascript设置页面背景色及背景图片的方法
2023-09-06 22:00:51
mysql8.0主从复制搭建与配置方案
2024-01-15 11:26:25
![](https://img.aspxhome.com/file/2023/5/89925_0s.png)
SQL 中STUFF用法
2024-01-25 01:14:30
Linux下通过python获取本机ip方法示例
2023-02-18 05:56:15
![](https://img.aspxhome.com/file/2023/0/120560_0s.png)
《悟透JavaScript》之 甘露模型
2008-06-09 14:03:00
![](https://img.aspxhome.com/file/UploadPic/20086/9/200869142355966s.gif)
Django框架表单操作实例分析
2022-01-27 23:43:59
Python实现四个经典小游戏合集
2021-08-16 12:17:35
![](https://img.aspxhome.com/file/2023/2/85022_0s.png)
深入剖析Go语言编程中switch语句的使用
2024-02-19 16:50:45
![](https://img.aspxhome.com/file/2023/3/107353_0s.jpg)
JSP页面传参出现中文乱码的解决方案
2023-06-13 12:53:03
如何删除vue项目下的node_modules文件夹
2023-07-02 17:10:00
![](https://img.aspxhome.com/file/2023/0/139960_0s.png)
在Vue中配置代理服务器的方法详解
2024-04-30 10:22:10
![](https://img.aspxhome.com/file/2023/5/130255_0s.jpg)
python学习教程之Numpy和Pandas的使用
2022-12-14 12:41:06
![](https://img.aspxhome.com/file/2023/1/120281_0s.jpg)
python推导式的使用方法实例
2021-03-13 09:38:25
![](https://img.aspxhome.com/file/2023/9/78679_0s.png)
python实现飞机大战游戏(pygame版)
2021-11-11 17:59:24
![](https://img.aspxhome.com/file/2023/9/97109_0s.gif)
Jupyter加载文件的实现方法
2021-11-12 23:16:15
![](https://img.aspxhome.com/file/2023/8/85058_0s.jpg)
python seaborn heatmap可视化相关性矩阵实例
2022-02-08 13:12:30
![](https://img.aspxhome.com/file/2023/0/101450_0s.jpg)
Python机器视觉之基于OpenCV的手势检测
2021-06-12 10:54:11
![](https://img.aspxhome.com/file/2023/9/103359_0s.jpg)
javascript 打印内容方法小结
2023-08-22 16:15:09
Python的Django框架中模板碎片缓存简介
2022-11-20 11:00:28