美文网首页
pyspark: dataframe的join操作

pyspark: dataframe的join操作

作者: 张虾米试错 | 来源:发表于2019-03-01 20:18 被阅读0次

本文主要是想看看dataframe中join操作后的结果。

left join

from pyspark.sql import Row
rdd = sc.parallelize([Row(name='Alice', age=5, height=80),Row(name='Alice', age=10, height=80)])
df = rdd.toDF()
rdd1 = sc.parallelize([Row(name='Alice', weight=45)])
df1 = rdd1.toDF()
df.join(df1, ["name"], "left").show()
"""
+-----+---+------+------+                                                       
| name|age|height|weight|
+-----+---+------+------+
|Alice|  5|    80|    45|
|Alice| 10|    80|    45|
+-----+---+------+------+

"""
rdd2 = sc.parallelize([Row(name='Alice', weight=45), Row(name='Alice', weight=45)])
df2 = rdd2.toDF()
df.join(df2, ["name"], "left").show()
"""
+-----+---+------+------+                                                       
| name|age|height|weight|
+-----+---+------+------+
|Alice|  5|    80|    45|
|Alice|  5|    80|    45|
|Alice| 10|    80|    45|
|Alice| 10|    80|    45|
+-----+---+------+------+
"""

上面的例子,join也同样适用。

outer join

rdd = sc.parallelize([Row(name='Alice', age=5, height=80),Row(name='Bob', age=5, height=80)])
df = rdd.toDF()
rdd = sc.parallelize([Row(name='Alice', age=5, height=80),Row(name='Cycy', age=10, height=80)])
df1 = rdd.toDF()
df.join(df1, ["name"], "outer").show()
"""
+-----+----+------+----+------+                                                 
| name| age|height| age|height|
+-----+----+------+----+------+
| Cycy|null|  null|  10|    80|
|  Bob|   5|    80|null|  null|
|Alice|   5|    80|   5|    80|
+-----+----+------+----+------+
"""

rdd1 = sc.parallelize([Row(name='Alice', age=5, height=80),Row(name='Cycy', age=10, height=80)])
df2 = rdd1.toDF()
print df.join(df1, ["name"], "outer").join(df2, ["name"], "outer").show()
"""
+-----+----+------+----+------+----+------+                                     
| name| age|height| age|height| age|height|
+-----+----+------+----+------+----+------+
| Cycy|null|  null|  10|    80|  10|    80|
|  Bob|   5|    80|null|  null|null|  null|
|Alice|   5|    80|   5|    80|   5|    80|
+-----+----+------+----+------+----+------+
"""

相关文章

网友评论

      本文标题:pyspark: dataframe的join操作

      本文链接:https://www.haomeiwen.com/subject/ghfduqtx.html