python通过伪装头部数据抵抗反爬虫的实例
作者:JackReach 时间:2022-03-11 09:56:04
0x00 环境
系统环境:win10
编写工具:JetBrains PyCharm Community Edition 2017.1.2 x64
python 版本:python-3.6.2
抓包工具:Fiddler 4
0x01 头部数据伪装思路
通过http向服务器提交数据,以下是通过Fiddler 抓取python没有伪装的报文头信息
GET /u012870721 HTTP/1.1
Accept-Encoding: identity
Host: blog.csdn.net
User-Agent: <span style="color:#ff0000;">Python-urllib/3.6</span>
Connection: close
Python-urllib/3.6
很明显啊,我们暴露了。现在要问了,该怎么!模拟浏览器,让自己伪装成浏览器,一下是浏览器访问发送的头部数据
Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
Referer: http://write.blog.csdn.net/postlist
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.8
0x02代码实现
from urllib import request
html_url = "http://blog.csdn.net/u012870721";
#伪装构造头
header ={
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36",
"Accept":" text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip,deflate",
"Accept-Language": "zh-CN,zh;q=0.8"
};
#int main()
#{
req = request.Request(url=html_url, headers=header);
resp = request.urlopen(req);
# return 0;
# }
伪装后进行发送的信息头
GET /u012870721 HTTP/1.1
Host: blog.csdn.net
Connection: close
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip,deflate
Accept-Language: zh-CN,zh;q=0.8
来源:https://blog.csdn.net/u012870721/article/details/77750454
标签:python,爬虫,头部
![](/images/zang.png)
![](/images/jiucuo.png)
猜你喜欢
详解CSS3中的属性选择符
2008-04-24 14:30:00
div不换行,css不换行,自动换行
2009-09-07 12:52:00
利用xslt对xml进行缩进格式化处理
2008-09-04 10:34:00
python Zmail模块简介与使用示例
2023-07-23 23:42:37
使用正则表达式找出不包含特定字符串的条目
2010-03-02 22:06:00
![](https://img.aspxhome.com/file/UploadPic/20103/2/image_thumb-80s.png)
Python模块汇总(常用第三方库)
2023-05-21 16:25:37
![](https://img.aspxhome.com/file/2023/4/91864_0s.png)
在Django的模型中添加自定义方法的示例
2021-12-07 17:14:58
Python CategoricalDtype自定义排序实现原理解析
2021-05-31 22:30:32
![](https://img.aspxhome.com/file/2023/7/84107_0s.jpg)
Python离线安装openpyxl模块的步骤
2021-08-10 16:04:04
![](https://img.aspxhome.com/file/2023/6/89286_0s.png)
Python 从相对路径下import的方法
2023-06-15 03:16:10
Yii2基于Ajax自动获取表单数据的方法
2023-11-21 00:59:56
python利用numpy存取文件案例教程
2023-05-22 03:55:53
![](https://img.aspxhome.com/file/2023/5/95295_0s.png)
python commands模块的适用方式
2022-02-26 19:38:14
Python函数式编程指南(一):函数式编程概述
2023-07-08 01:20:25
Python猫眼电影最近上映的电影票房信息
2023-07-02 18:05:01
![](https://img.aspxhome.com/file/2023/6/97556_0s.png)
js实现鼠标切换图片(无定时器)
2023-09-07 02:44:58
django-rest-framework解析请求参数过程详解
2023-03-26 18:18:00
![](https://img.aspxhome.com/file/2023/4/86454_0s.jpg)
Oracle数据操作和控制语言详解
2008-01-16 19:18:00
阿里系的中国雅虎新首页浅谈
2008-07-16 12:19:00
![](https://img.aspxhome.com/file/UploadPic/20087/16/2008716122345245s.png)
python密码学简单替代密码解密及测试教程
2023-09-30 08:13:00
![](https://img.aspxhome.com/file/2023/2/62852_0s.jpg)