spark编程python实例解读
作者:王小雷-多面手 时间:2023-06-02 06:12:52
spark编程python实例
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[])
1.pyspark在jupyter notebook中开发,测试,提交
1.1.启动
IPYTHON_OPTS="notebook" /opt/spark/bin/pyspark
下载应用,将应用下载为.py文件(默认notebook后缀是.ipynb)
2.在shell中提交应用
wxl@wxl-pc:/opt/spark/bin$ spark-submit /bin/spark-submit /home/wxl/Downloads/pysparkdemo.py
3.遇到的错误及解决
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*])
d*
3.1.错误
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*])
d*
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*]) created by <module> at /usr/local/lib/python2.7/dist-packages/IPython/utils/py3compat.py:288
3.2.解决,成功运行
在from之后添加
try:
sc.stop()
except:
pass
sc=SparkContext('local[2]','First Spark App')
贴上错误解决方法来源StackOverFlow
4.源码
pysparkdemo.ipynb
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from pyspark import SparkContext"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"try:\n",
" sc.stop()\n",
"except:\n",
" pass\n",
"sc=SparkContext('local[2]','First Spark App')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"data = sc.textFile(\"data/UserPurchaseHistory.csv\").map(lambda line: line.split(\",\")).map(lambda record: (record[0], record[1], record[2]))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total purchases: 5\n"
]
}
],
"source": [
"numPurchases = data.count()\n",
"print \"Total purchases: %d\" % numPurchases"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
pysparkdemo.py
# coding: utf-8
# In[1]:
from pyspark import SparkContext
# In[2]:
try:
sc.stop()
except:
pass
sc=SparkContext('local[2]','First Spark App')
# In[3]:
data = sc.textFile("data/UserPurchaseHistory.csv").map(lambda line: line.split(",")).map(lambda record: (record[0], record[1], record[2]))
# In[4]:
numPurchases = data.count()
print "Total purchases: %d" % numPurchases
# In[ ]:
来源:https://xiaolei.blog.csdn.net/article/details/51935530
标签:spark,编程,python
0
投稿
猜你喜欢
浅谈Python3中datetime不同时区转换介绍与踩坑
2021-05-16 15:29:46
Python 中闭包与装饰器案例详解
2021-08-08 12:28:32
pycharm 如何查看某一函数源码的快捷键
2023-10-25 02:41:28
教你利用PyTorch实现sin函数模拟
2021-06-23 18:17:25
python爬虫之pyppeteer库简单使用
2023-05-29 14:07:20
python 实现将txt文件多行合并为一行并将中间的空格去掉方法
2021-12-16 12:17:09
将Pytorch模型从CPU转换成GPU的实现方法
2023-07-07 00:11:00
深入浅析Django MTV模式
2021-08-18 07:55:32
When we`re only No.2, we try harder之淘宝节日LOGO互动设计小探讨
2010-01-20 10:31:00
python中dict字典的查询键值对 遍历 排序 创建 访问 更新 删除基础操作方法
2023-10-06 12:13:03
PyTorch的深度学习入门教程之构建神经网络
2021-11-21 09:34:22
Python中encode()方法的使用简介
2023-09-22 15:26:38
SQL Server数据库分离和附加数据库的操作步骤
2024-01-27 19:59:23
分享JavaScript与Java中MD5使用两个例子
2024-05-22 10:40:17
浅析python3中的os.path.dirname(__file__)的使用
2021-11-10 04:35:23
详解Django中的unittest及应用
2022-10-24 08:26:54
golang的HTTP基本认证机制实例详解
2024-04-26 17:28:39
你是真正的用户体验设计者吗? Ⅵ
2008-04-19 18:23:00
一劳永逸彻底解决pip install慢的办法
2023-08-10 20:30:52
pyinstaller通过spec文件打包py程序的步骤
2021-02-05 01:49:57