dplyr 0.8.1 grouping functions update

Changes to group_modify() and group_map()

RStudio has just released a minor update to dplyr. They had a rethink of new grouping purrr-style functions used to iterate on grouped tibbles.

The changes include:

  • group_map() is now used for iterating on grouped tibbles. It however makes no assumptions about the return type of each operation, combining results in a list - similar to purrr::map().
  • The previous behaviour was renamed to group_modify(), always returning and combining grouped tibbles by evaluating each operation with a reconstructed grouping structure - similar to purrr::modify().

An extract from the wiki states that:

group_modify() is an evolution of do(), if you have used that before.

A typical implementation changes from do(function(.$column)) to group_modify( ~ {function(.$column)}), both yielding the same result. group_map similarly applies a function across each grouped tibble, albeit dropping the grouping variables and outputting lists for each group.

Let’s prepare data for the brief demonstration.

iris_grouped <-
  iris %>% 
  
  # Make the table long and thin by collapsing numeric attributes and associated
  # values describing Sepal and Petal lengh and width into the `Field` and `val`
  # columns respectively.
  gather(Field, val, -Species) %>% 
  
  # group by Species and Field to execute an operation over the val data for
  # each respective group
  group_by(Species, Field)

iris_grouped %>%
  
  # The function returns the first 4 rows of the dataframe, formatting the
  # output to HTML using knitr and kableextra, printing a passed header as shown
  # below
  func_tidy_present(., 
                    header = "Tidied *Gathered & Grouped* Dataframe")

Tidied Gathered & Grouped Dataframe
Species Field val
setosa Sepal.Length 5.1
setosa Sepal.Length 4.9
setosa Sepal.Length 4.7
setosa Sepal.Length 4.6



The demonstration below shows the subtle change, introducing more syntactic cohesion to the coding flow.

iris_grouped %>%
  
  # Apply the function for group using the do method
  do(psych::describe(.$val)) %>%
  
  # Tidyup to present
  func_tidy_present(., header = "Result of the *Do* method")

Result of the Do method
Species Field vars n mean sd median trimmed mad min max range skew kurtosis se
setosa Petal.Length 1 50 1.46 0.17 1.5 1.46 0.15 1.0 1.9 0.9 0.10 0.65 0.02
setosa Petal.Width 1 50 0.25 0.11 0.2 0.24 0.00 0.1 0.6 0.5 1.18 1.26 0.01
setosa Sepal.Length 1 50 5.01 0.35 5.0 5.00 0.30 4.3 5.8 1.5 0.11 -0.45 0.05
setosa Sepal.Width 1 50 3.43 0.38 3.4 3.42 0.37 2.3 4.4 2.1 0.04 0.60 0.05
iris_grouped %>% 
  
  # Apply the function for group using the Group_Modify method
  group_modify( ~ {psych::describe(.$val)}) %>% 
  
  # Tidyup to present
  func_tidy_present(., header = "Result of the *Group Modify* method")

Result of the Group Modify method
Species Field vars n mean sd median trimmed mad min max range skew kurtosis se
setosa Petal.Length 1 50 1.46 0.17 1.5 1.46 0.15 1.0 1.9 0.9 0.10 0.65 0.02
setosa Petal.Width 1 50 0.25 0.11 0.2 0.24 0.00 0.1 0.6 0.5 1.18 1.26 0.01
setosa Sepal.Length 1 50 5.01 0.35 5.0 5.00 0.30 4.3 5.8 1.5 0.11 -0.45 0.05
setosa Sepal.Width 1 50 3.43 0.38 3.4 3.42 0.37 2.3 4.4 2.1 0.04 0.60 0.05



An implentation of the quantile function returns a list that is subsequently enframed and spread before presentation.

iris_grouped %>%
  
  # Apply the function for each group and enframe results within the
  # group_modify function
  group_modify( ~ {
    quantile(.x$val, probs = c(0.25, 0.5, 0.75)) %>%
      tibble::enframe(name = "prob", value = "quantile")
  }) %>%
  spread(Field, quantile) %>%
  
  # Tidyup to present
  func_tidy_present(.,
                    header = "Result of the *Group Modify* method applying the *quantile* function",
                    return_row_count = 9)

Result of the Group Modify method applying the quantile function
Species prob Petal.Length Petal.Width Sepal.Length Sepal.Width
setosa 25% 1.40 0.2 4.80 3.20
setosa 50% 1.50 0.2 5.00 3.40
setosa 75% 1.58 0.3 5.20 3.68
versicolor 25% 4.00 1.2 5.60 2.52
versicolor 50% 4.35 1.3 5.90 2.80
versicolor 75% 4.60 1.5 6.30 3.00
virginica 25% 5.10 1.8 6.23 2.80
virginica 50% 5.55 2.0 6.50 3.00
virginica 75% 5.88 2.3 6.90 3.18

comments powered by Disqus