python出现RuntimeError错误问题及解决

作者:舔狗一无所有 时间:2022-01-01 00:58:08 

下面是出现的错误解释

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.
 
        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:
 
            if __name__ == '__main__':
                freeze_support()
                ...
 
        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

下面是出现错误代码的原代码

import multiprocessing as mp
import time
from urllib.request import urlopen,urljoin
from bs4 import BeautifulSoup
import re

base_url = "https://morvanzhou.github.io/"

#crawl爬取网页
def crawl(url):
   response = urlopen(url)
   time.sleep(0.1)
   return response.read().decode()

#parse解析网页
def parse(html):
   soup = BeautifulSoup(html,'html.parser')
   urls = soup.find_all('a',{"href":re.compile('^/.+?/$')})
   title = soup.find('h1').get_text().strip()
   page_urls = set([urljoin(base_url,url['href'])for url in urls])
   url = soup.find('meta',{'property':"og:url"})['content']
   return title,page_urls,url

unseen = set([base_url])
seen = set()
restricted_crawl = True

pool = mp.Pool(4)
count, t1 = 1, time.time()
while len(unseen) != 0:                 # still get some url to visit
   if restricted_crawl and len(seen) > 20:
           break
   print('\nDistributed Crawling...')
   crawl_jobs = [pool.apply_async(crawl, args=(url,)) for url in unseen]
   htmls = [j.get() for j in crawl_jobs]      # request connection

print('\nDistributed Parsing...')
   parse_jobs = [pool.apply_async(parse, args=(html,)) for html in htmls]
   results = [j.get() for j in parse_jobs]    # parse html

print('\nAnalysing...')
   seen.update(unseen)         # seen the crawled
   unseen.clear()              # nothing unseen

for title, page_urls, url in results:
       print(count, title, url)
       count += 1
       unseen.update(page_urls - seen)     # get new url to crawl
print('Total time: %.1f s' % (time.time()-t1))    # 16 s !!!

这是修改后的正确代码

import multiprocessing as mp
import time
from urllib.request import urlopen,urljoin
from bs4 import BeautifulSoup
import re
 
base_url = "https://morvanzhou.github.io/"
 
#crawl爬取网页
def crawl(url):
    response = urlopen(url)
    time.sleep(0.1)
    return response.read().decode()
 
#parse解析网页
def parse(html):
    soup = BeautifulSoup(html,'html.parser')
    urls = soup.find_all('a',{"href":re.compile('^/.+?/$')})
    title = soup.find('h1').get_text().strip()
    page_urls = set([urljoin(base_url,url['href'])for url in urls])
    url = soup.find('meta',{'property':"og:url"})['content']
    return title,page_urls,url
 
def main():
    unseen = set([base_url])
    seen = set()
    restricted_crawl = True
 
    pool = mp.Pool(4)
    count, t1 = 1, time.time()
    while len(unseen) != 0:                 # still get some url to visit
        if restricted_crawl and len(seen) > 20:
                break
        print('\nDistributed Crawling...')
        crawl_jobs = [pool.apply_async(crawl, args=(url,)) for url in unseen]
        htmls = [j.get() for j in crawl_jobs]      # request connection
 
        print('\nDistributed Parsing...')
        parse_jobs = [pool.apply_async(parse, args=(html,)) for html in htmls]
        results = [j.get() for j in parse_jobs]    # parse html
 
        print('\nAnalysing...')
        seen.update(unseen)         # seen the crawled
        unseen.clear()              # nothing unseen
 
        for title, page_urls, url in results:
            print(count, title, url)
            count += 1
            unseen.update(page_urls - seen)     # get new url to crawl
    print('Total time: %.1f s' % (time.time()-t1))    # 16 s !!!
 
 
if __name__ == '__main__':
    main()

综上可知,就是把你的运行代码整合成一个函数,然后加入

if __name__ == '__main__':
    main()

这行代码即可解决这个问题。

python报错:RuntimeError

python报错:RuntimeError:fails to pass a sanity check due to a bug in the windows runtime这种类型的错误

这种错误原因

1.当前的python与numpy版本之间有什么问题,比如我自己用的python3.9与numpy1.19.4会导致这种报错。

2.numpy1.19.4与当前很多python版本都有问题。

解决办法

在File->Settings->Project:pycharmProjects->Project Interpreter下将numpy版本降下来就好了。

1.打开interpreter,如下图:

python出现RuntimeError错误问题及解决

2.双击numpy修改其版本:

python出现RuntimeError错误问题及解决

3.勾选才能修改版本,将需要的低版本导入即可:

python出现RuntimeError错误问题及解决

弄完了之后,重新运行就好。

来源:https://blog.csdn.net/weixin_42099082/article/details/89365643

标签:python,RuntimeError,错误
0
投稿

猜你喜欢

  • Symfony核心类概述

    2023-11-17 13:59:48
  • pyinstaller打包django项目的实现步骤

    2022-08-17 14:28:15
  • Python使用openpyxl模块处理Excel文件

    2021-10-03 06:45:10
  • php让json_encode不自动转义斜杠“/”的方法

    2023-08-19 17:04:28
  • pymongo实现多结果进行多列排序的方法

    2023-08-27 21:33:42
  • Dreamweaver实现flash透明背景

    2008-05-04 09:35:00
  • 配置高可用性的MySQL服务器负载均衡群集

    2009-01-04 12:43:00
  • ASP中利用正则表达式实现论坛UBB代码转换

    2008-02-29 11:49:00
  • 浏览器用户体验:Firefox初体验 VS The world

    2008-08-02 11:58:00
  • python中的GUI实现计算器

    2022-04-18 02:29:05
  • js特殊字符过滤的示例代码

    2023-09-07 21:53:30
  • Access函数大全

    2009-12-23 19:22:00
  • sqlserver中获取月份的天数的方法分享

    2011-09-30 11:27:52
  • django中静态文件配置static的方法

    2022-07-29 08:52:51
  • Python OpenCV特征检测之特征匹配方式详解

    2021-07-20 00:51:58
  • Python如何实现感知器的逻辑电路

    2021-01-09 00:41:38
  • Dreamweaver MX网页图片热区使用方法

    2008-05-20 12:50:00
  • 探讨关于404错误页面设置的问题

    2011-12-01 10:59:38
  • python字符串连接的N种方式总结

    2023-10-12 08:47:44
  • PHP中substr_count()函数获取子字符串出现次数的方法

    2023-11-14 14:28:17
  • asp之家 网络编程 m.aspxhome.com