美文网首页
DAY 6 R 语言

DAY 6 R 语言

作者: Peng_001 | 来源:发表于2020-05-07 10:43 被阅读0次

Joining Data with dplyr

inner join

inner_join,通过某个列表信息作为纽带,将两个不同的表格连接起来。

sets %>%
  inner_join(theme, by = c("theme_id" = "id"), suffix = c("_set", "_theme"))
# 通过theme_id与id将sets 与theme 连接起来。

c("first_table_column" = "second_table_column"). 需要注意的是,by后面内容有顺序要求。通过by 后面内容中的向量使用,分割,可以设定列表通过多行内容连接。

  1. 连接多行列表



    只要是相连的两列有相同内容的信息即可,即便所有列表可能不全部包含该列信息。

left join

如果相连接的列表中有某个列表没有某个内容,inner join 会自动忽略掉那一行oberservation 的信息。


使用left join 可以保留first table 中的信息,而被连接列表中没有的信息则显示为NA


例子
inventory_version_1 <- inventories %>%
    filter(version == 1)

# Join versions to sets
sets %>%
  left_join(inventory_version_1, by = "set_num") %>%
  # Filter for where version is na
  filter(is.na(version))

right join

与left join 相反,保留连接的second table中内容


  • 三者对比


replace_na

replace_na(list(n=0))
例子

batman_parts %>%
  # Combine the star_wars_parts table 
  full_join(star_wars_parts, by = c("part_num", "color_id"), suffix = c("_batman", "_star_wars")) %>%
  # Replace NAs with 0s in the n_batman and n_star_wars columns 
  replace_na(list(n_batman = 0, n_star_wars = 0))

join table to themselves


通过自连接可以将这种具有层级关系的表格连接在一起。
子数据关于父数据的id(parent_id)直接与对于的父数据id 相等。
例子

# themes 表格
      id name           parent_id
   <dbl> <chr>              <dbl>
 1     1 Technic               NA
 2     2 Arctic Technic         1
 3     3 Competition            1
 4     4 Expert Builder         1
 5     5 Model                  1
 6     6 Airport                5
 7     7 Construction           5
 8     8 Farm                   5
 9     9 Fire                   5
10    10 Harbor                 5
themes %>% 
    # Inner join the themes table
    inner_join(themes, by = c("id" = "parent_id"),suffix = c("_parent", "_child")) %>%
    # Filter for the "Harry Potter" parent name 
    filter(name_parent == "Harry Potter")
# 自连接后输出
    id name_parent  parent_id id_child name_child          
  <dbl> <chr>            <dbl>    <dbl> <chr>               
1   246 Harry Potter        NA      247 Chamber of Secrets  
2   246 Harry Potter        NA      248 Goblet of Fire      
3   246 Harry Potter        NA      249 Order of the Phoenix
4   246 Harry Potter        NA      250 Prisoner of Azkaban 
5   246 Harry Potter        NA      251 Sorcerer's Stone    
6   246 Harry Potter        NA      667 Fantastic Beasts

full join

全连接,不管连的(first table)还是被连的(second table)若数据存在缺失,均连接起来。
缺失内容用NA表示。

batman_parts %>%
  # Combine the star_wars_parts table 
  full_join(star_wars_parts, by = c("part_num", "color_id"), suffix = c("_batman", "_star_wars")) %>%
  # Replace NAs with 0s in the n_batman and n_star_wars columns 
  replace_na(list(n_batman = 0, n_star_wars = 0))

inner_join, left/right_join, full_join 都可以称为mutating join。

filtering join

semi join & anti join


例子

# Filter the batwing set for parts that are also in the batmobile set
batwing %>%
semi_join(batmobile, by = c("part_num"))

# Filter the batwing set for parts that aren't in the batmobile set
batwing %>%
anti_join(batmobile, by = c("part_num"))

总结


ps: 使用n() 获得总共的变量数量

questions_with_tags %>%
    # Group by tag_name
    group_by(tag_name) %>%
    # Get mean score and num_questions
    summarize(score = mean(score),
              num_questions = n()) %>%
    # Sort num_questions in descending order
    arrange(desc(num_questions))

其他语句

bind_rows() 可以将两个表格的内容上下拼接起来(行与行)
例子

# Combine the two tables into posts_with_tags
posts_with_tags <- bind_rows(questions_with_tags %>% mutate(type = "question"),
                              answers_with_tags %>% mutate(type = "answer"))

# Add a year column, then aggregate by type, year, and tag_name
posts_with_tags %>%
  mutate(year = year(creation_date)) %>%
  count(type, year, tag_name)

总结

相关文章

网友评论

      本文标题:DAY 6 R 语言

      本文链接:https://www.haomeiwen.com/subject/ubcughtx.html