Python3爬虫带上cookie的实例代码
作者:yang 时间:2021-11-10 18:06:58
Cookie的英文原意是“点心”,它是在客户端访问Web服务器时,服务器在客户端硬盘上存放的信息,好像是服务器发送给客户的“点心”。服务器可以根据Cookie来跟踪客户状态,这对于需要区别客户的场合(如电子商务)特别有用。
当客户端首次请求访问服务器时,服务器先在客户端存放包含该客户的相关信息的Cookie,以后客户端每次请求访问服务器时,都会在HTTP请求数据中包含Cookie,服务器解析HTTP请求中的Cookie,就能由此获得关于客户的相关信息。
下面我们就来看一下python3爬虫带上cookie的方法:
1、直接将Cookie写在header头部
# coding:utf-8
import requests
from bs4 import BeautifulSoup
cookie = '''cisession=19dfd70a27ec0eecf1fe3fc2e48b7f91c7c83c60;CNZZDATA1000201968=181584
6425-1478580135-https%253A%252F%252Fwww.baidu.com%252F%7C1483922031;Hm_lvt_f805f7762a9a2
37a0deac37015e9f6d9=1482722012,1483926313;Hm_lpvt_f805f7762a9a237a0deac37015e9f6d9=14839
26368'''
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Geck
o) Chrome/53.0.2785.143 Safari/537.36',
'Connection': 'keep-alive',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Cookie': cookie}
url = 'https://www.jb51.net/article/191947.htm'
wbdata = requests.get(url,headers=header).text
soup = BeautifulSoup(wbdata,'lxml')
print(soup)
2、使用requests插入Cookie
# coding:utf-8
import requests
from bs4 import BeautifulSoup
cookie = {
"cisession":"19dfd70a27ec0eecf1fe3fc2e48b7f91c7c83c60",
"CNZZDATA100020196":"1815846425-1478580135-https%253A%252F%252Fwww.baidu.com%252F%7C1483
922031",
"Hm_lvt_f805f7762a9a237a0deac37015e9f6d9":"1482722012,1483926313",
"Hm_lpvt_f805f7762a9a237a0deac37015e9f6d9":"1483926368"
}
url = 'https://www.jb51.net/article/191947.htm'
wbdata = requests.get(url,cookies=cookie).text
soup = BeautifulSoup(wbdata,'lxml')
print(soup)
实例扩展:
使用cookie登录哈工大ACM站点
获取站点登录地址
http://acm.hit.edu.cn/hoj/system/login
查看要传送的post数据
user和password
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
__author__ = 'pi'
__email__ = 'pipisorry@126.com'
"""
import urllib.request, urllib.parse, urllib.error
import http.cookiejar
LOGIN_URL = 'http://acm.hit.edu.cn/hoj/system/login'
values = {'user': '******', 'password': '******'} # , 'submit' : 'Login'
postdata = urllib.parse.urlencode(values).encode()
user_agent = r'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
headers = {'User-Agent': user_agent, 'Connection': 'keep-alive'}
cookie_filename = 'cookie.txt'
cookie = http.cookiejar.MozillaCookieJar(cookie_filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
request = urllib.request.Request(LOGIN_URL, postdata, headers)
try:
response = opener.open(request)
page = response.read().decode()
# print(page)
except urllib.error.URLError as e:
print(e.code, ':', e.reason)
cookie.save(ignore_discard=True, ignore_expires=True) # 保存cookie到cookie.txt中
print(cookie)
for item in cookie:
print('Name = ' + item.name)
print('Value = ' + item.value)
get_url = 'http://acm.hit.edu.cn/hoj/problem/solution/?problem=1' # 利用cookie请求訪问还有一个网址
get_request = urllib.request.Request(get_url, headers=headers)
get_response = opener.open(get_request)
print(get_response.read().decode())
# print('You have not solved this problem' in get_response.read().decode())
来源:https://www.py.cn/spider/guide/18504.html
标签:Python3,爬虫,cookie
![](/images/zang.png)
![](/images/jiucuo.png)
猜你喜欢
一篇文章带你了解python集合基础
2022-04-09 14:10:17
![](https://img.aspxhome.com/file/2023/5/101145_0s.jpg)
python用BeautifulSoup库简单爬虫实例分析
2022-04-02 21:42:00
如何创建一个Flask项目并进行简单配置
2023-08-19 01:05:47
python数据挖掘使用Evidently创建机器学习模型仪表板
2022-10-23 14:21:52
![](https://img.aspxhome.com/file/2023/6/109856_0s.png)
PHP 检查扩展库或函数是否可用的代码
2023-07-22 23:34:34
如何连续展示数据库里的图片?
2010-01-01 15:50:00
Python数据处理numpy.median的实例讲解
2022-07-24 06:36:04
Pygame的程序开始示例代码
2021-12-20 19:45:01
![](https://img.aspxhome.com/file/2023/8/99538_0s.png)
Python安装selenium包详细过程
2023-04-12 00:05:31
![](https://img.aspxhome.com/file/2023/1/72941_0s.png)
使用python加密主机文件几种方法实现
2021-03-06 03:16:12
微信小程序实现图片上传功能实例(前端+PHP后端)
2023-11-05 14:19:27
![](https://img.aspxhome.com/file/2023/8/56278_0s.jpg)
python如何对数组进行降维
2022-12-27 18:51:32
![](https://img.aspxhome.com/file/2023/0/112940_0s.png)
CSS Border使用小分享
2010-08-12 14:34:00
![](https://img.aspxhome.com/file/UploadPic/20108/12/box-model-47s.png)
6款jQuery图表插件[译]
2009-06-01 10:34:00
![](https://img.aspxhome.com/file/UploadPic/20096/1/08200312t-81s.png)
使用python开发vim插件及心得分享
2023-11-22 11:30:32
![](https://img.aspxhome.com/file/2023/2/70052_0s.gif)
谈谈网页设计中的字体应用 (3) 实战应用篇·上
2009-11-24 13:09:00
![](https://img.aspxhome.com/file/UploadPic/200911/24/202939796-72s.gif)
基于Python绘制子图及子图刻度的变换等的问题
2023-12-12 14:14:33
![](https://img.aspxhome.com/file/2023/3/90443_0s.png)
使用Python实现管理系统附源码
2023-04-04 04:22:34
![](https://img.aspxhome.com/file/2023/8/117588_0s.jpg)
php连接微软MSSQL(sql server)完全攻略
2023-07-16 17:56:43
![](https://img.aspxhome.com/file/2023/6/55426_0s.jpg)
用sleep间隔进行python反爬虫的实例讲解
2023-02-10 07:00:42
![](https://img.aspxhome.com/file/2023/1/103781_0s.png)