如何基于线程池提升request模块效率
作者:返回主页人生苦短,我用python 时间:2023-06-12 11:13:44
普通方法:爬取梨视频
import re
import time
import random
import requests
from lxml import etree
start_time = time.time()
url = "https://www.pearvideo.com/category_3"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"
}
ex = 'srcUrl="(.*?)",vdoUrl=srcUrl'
def request_video(url):
"""
向视频链接发送请求
"""
return requests.get(url=url, headers=headers).content
def save_video(content):
"""
将视频的二进制数据保存到本地
"""
video_name = str(random.randint(100, 999)) + ".mp4"
with open(video_name, 'wb') as f:
f.write(content)
# 获取首页源码
page_text = requests.get(url=url, headers=headers).text
tree = etree.HTML(page_text)
li_list = tree.xpath('//ul[@class="listvideo-list clearfix"]/li')
video_url_list = list()
for li in li_list:
detail_url = "https://www.pearvideo.com/" + li.xpath('./div/a/@href')[0]
# 获取该视频页面的源码
detail_page_text = requests.get(url=detail_url, headers=headers).text
# 正则匹配视频的URL
video_url = re.findall(ex, detail_page_text, re.S)[0]
video_url_list.append(video_url)
content = request_video(video_url)
save_video(content)
print("执行耗时: ", time.time() - start_time)
执行耗时: 147.22410440444946
使用线程池:爬取梨视频
# 使用线程池爬去梨视频的
import re
import time
import random
import requests
from lxml import etree
from multiprocessing.dummy import Pool
start_time = time.time()
url = "https://www.pearvideo.com/category_3"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"
}
ex = 'srcUrl="(.*?)",vdoUrl=srcUrl'
def request_video(url):
"""
向视频链接发送请求
"""
return requests.get(url=url, headers=headers).content
def save_video(content):
"""
将视频的二进制数据保存到本地
"""
video_name = str(random.randint(100, 999)) + ".mp4"
with open(video_name, 'wb') as f:
f.write(content)
# 获取首页源码
page_text = requests.get(url=url, headers=headers).text
tree = etree.HTML(page_text)
li_list = tree.xpath('//ul[@class="listvideo-list clearfix"]/li')
video_url_list = list()
for li in li_list:
detail_url = "https://www.pearvideo.com/" + li.xpath('./div/a/@href')[0]
# 获取该视频页面的源码
detail_page_text = requests.get(url=detail_url, headers=headers).text
# 正则匹配视频的URL
video_url = re.findall(ex, detail_page_text, re.S)[0]
video_url_list.append(video_url)
pool = Pool(4)
#使用线程池将视频的二进制数据下载下来
content_list = pool.map(request_video, video_url_list)
# 使用线程池将视频的二进制数据保存到本地
pool.map(save_video, content_list)
print("执行耗时: ", time.time() - start_time)
来源:https://www.cnblogs.com/youhongliang/p/12708250.html
标签:线程池,request,模块
0
投稿
猜你喜欢
Python使用re模块正则提取字符串中括号内的内容示例
2022-06-15 14:20:04
用javascript来实现仿gogle动画导航
2007-11-30 14:15:00
必备的JS调试技巧汇总
2023-08-07 06:26:50
GO接收GET/POST参数及发送GET/POST请求的实例详解
2024-02-08 10:27:49
MySQL重定位数据库目录的内容
2009-02-26 16:03:00
使用Karma做vue组件单元测试的实现
2024-04-30 10:33:15
python让列表倒序输出的实例
2021-05-06 21:04:30
MySQL Delete 删数据后磁盘空间未释放的原因
2024-01-23 10:56:14
Python的Django框架中消息通知的计数器实现教程
2021-03-22 04:13:43
SQL的常用数据类型列表详解
2024-01-15 05:25:07
Python常用数据分析模块原理解析
2023-07-12 03:46:31
分面搜索(Faceted Search)
2009-07-31 12:44:00
php实现比较全的数据库操作类
2023-11-22 02:15:10
Pytorch中torch.cat()函数举例解析
2023-01-09 15:42:15
python机器学习使数据更鲜活的可视化工具Pandas_Alive
2022-09-26 04:57:12
VUE3中watch监听使用实例详解
2024-05-29 22:42:16
Python使用自带的ConfigParser模块读写ini配置文件
2022-04-01 00:07:01
python 获取当前目录下的文件目录和文件名实例代码详解
2022-07-21 21:31:25
python学习之新式类和旧式类讲解
2021-02-27 11:10:38
浅谈Python之Django
2023-12-13 15:14:43