浅谈DataFrame和SparkSql取值误区
作者:silentwolfyh 时间:2021-03-21 22:04:18
1、DataFrame返回的不是对象。
2、DataFrame查出来的数据返回的是一个dataframe数据集。
3、DataFrame只有遇见Action的算子才能执行
4、SparkSql查出来的数据返回的是一个dataframe数据集。
原始数据
scala> val parquetDF = sqlContext.read.parquet("hdfs://hadoop14:9000/yuhui/parquet/part-r-00004.gz.parquet")
df: org.apache.spark.sql.DataFrame = [timestamp: string, appkey: string, app_version: string, channel: string, lang: string, os_type: string, os_version: string, display: string, device_type: string, mac: string, network: string, nettype: string, suuid: string, register_days: int, country: string, area: string, province: string, city: string, event: string, use_interval_cat: string, use_duration_cat: string, use_interval: bigint, use_duration: bigint, os_upgrade_from: string, app_upgrade_from: string, page_name: string, event_name: string, error_type: string]
代码
package DataFrame
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by yuhui on 2016/6/14.
*/
object DataFrameTest {
def main(args: Array[String]) {
DataFrameInto()
}
def DataFrameInto() {
val conf = new SparkConf()
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.parquet("hdfs://hadoop14:9000/yuhui/parquet")
//df.map(line => printinfo(line.getString(0)))
//df.foreach(line => printinfo(line.getString(0)+" , "+line.getString(14)+" , "+line.getString(15)))
//df.select("timestamp","country","area").foreach(line=>printinfo(line.toString))
df.registerTempTable("infotable")
sqlContext.sql("SELECT timestamp , country , area from infotable").foreach(line=>printinfo(line.toString))
}
def printinfo(msg: String) {println("printinfo函数-->" + msg) }
}
代码解析
1、df.map(line => printinfo(line.getString(0)))
这段代码不行执行printinfo()函数,因为只有map算子,没有Action算子。
2、df.foreach(line => printinfo(line.getString(0)+" , "+line.getString(14)+" , "+line.getString(15)))
通过Spark的Action算子接收数据进行操作,执行结果如下:
3、df.select("timestamp","country","area").foreach(line=>printinfo(line.toString))
通过DataFrame的API进行操作,再通过Spark的Action算子打印出来,执行结果如下:
4、sqlContext.sql("SELECT timestamp , country , area from infotable").foreach(line=>printinfo(line.toString))
执行结果如下:
来源:https://blog.csdn.net/silentwolfyh/article/details/51669839
标签:DataFrame,SparkSql,取值
![](/images/zang.png)
![](/images/jiucuo.png)
猜你喜欢
python的open函数使用案例代码
2022-01-20 16:41:30
Python 实现12306登录功能实例代码
2021-04-07 08:55:35
python3 简单实现组合设计模式
2023-06-12 19:15:50
![](https://img.aspxhome.com/file/2023/2/59482_0s.png)
RDFa介绍——构建更友好的web页面
2009-09-19 17:01:00
对matplotlib改变colorbar位置和方向的方法详解
2023-07-13 21:00:04
Python shapefile转GeoJson的2种方式实例
2023-02-20 01:07:52
![](https://img.aspxhome.com/file/2023/7/71427_0s.jpg)
巧用Dreamweaver MX设计导航栏特效
2009-07-10 13:17:00
![](https://img.aspxhome.com/file/UploadPic/20072/20072311330521s.jpg)
asp如何制作一个安全的页面?
2010-06-29 21:22:00
python pycharm中使用opencv时没有代码自动补全提示的解决方案
2022-05-10 14:34:15
![](https://img.aspxhome.com/file/2023/2/68342_0s.jpg)
Python中字符串List按照长度排序
2023-11-28 21:43:02
纯CSS圆角框2-透明圆角化背景图片
2009-12-11 19:10:00
![](https://img.aspxhome.com/file/UploadPic/200912/11/3-72s.png)
oracle数据库创建备份与恢复脚本整理
2023-07-13 00:57:20
CSS像素图制作攻略
2009-05-19 19:32:00
![](https://img.aspxhome.com/file/UploadPic/20095/19/01-75s.gif)
解决MySQL 5.0不能使用自动增加字段问题
2008-12-02 14:30:00
Python统计时间内的并发数代码实例
2022-02-17 18:24:16
python多线程对多核cpu的利用解析
2023-03-10 02:50:13
![](https://img.aspxhome.com/file/2023/5/67935_0s.jpg)
教你用Python写安卓游戏外挂
2023-10-21 17:43:29
![](https://img.aspxhome.com/file/2023/5/71445_0s.png)
对pandas中iloc,loc取数据差别及按条件取值的方法详解
2021-06-15 01:58:05
困惹的A标签
2007-12-04 12:36:00
![](https://img.aspxhome.com/file/UploadPic/200712/4/2007124125051635s.png)
运行asp.net时出现 http错误404-文件或目录未找到
2023-07-24 01:53:36