dplyr_wrapper {fxtract}R Documentation

Wrapper for dplyr's summarize

Description

This function wraps dplyr's summarize() function in a convenient way. The user only needs to define functions on the dataset with a named vector or list (with atomic entries of length 1) as return.

Usage

dplyr_wrapper(data, group_by, fun, check_fun = TRUE)

Arguments

data

('dataframe'). A dataframe with a grouping variable.

group_by

('character()'). Name of column, which contains identifiers on which the dataset should be grouped by. E.g. different user IDs.

fun

('function'). Must be a function, which has a dataframe as input and a (named) vector of desired length as output.

check_fun

('logical(1)'). If TRUE, fun(data) will be evaluated and checked if the outcome is of correct form. Set to FALSE if evaluation on the whole dataset takes too long.

Value

('dataframe')

Examples

# Number of used chrome apps
fun1 = function(data) {
  c(uses_chrome = nrow(
    dplyr::filter(data, RUNNING_TASKS_baseActivity_mPackage == "com.android.chrome"))
  )
}
dplyr_wrapper(data = studentlife_small, group_by = "userId", fun = fun1)

# mean, max, sd of a column
fun2 = function(data) {
  c(mean_sepal_length = mean(data$Sepal.Length),
    max_sepal_length = max(data$Sepal.Length),
    sd_sepal_length = sd(data$Sepal.Length)
  )
}
dplyr_wrapper(data = iris, group_by = "Species", fun = fun2)

# return list
fun3 = function(data) {
  list(mean_sepal_length = mean(data$Sepal.Length),
    max_sepal_length = max(data$Sepal.Length),
    sd_sepal_length = sd(data$Sepal.Length)
  )
}
dplyr_wrapper(data = iris, group_by = "Species", fun = fun3)

# group by two columns
df = data.frame(id = c(rep(1, 10), rep(2, 10)))
df$task = rep(c(rep("task1", 5), rep("task2", 5)), 2)
df$hour = rep(c(rep("hour1", 3), rep("hour2", 2), rep("hour1", 2), rep("hour2", 3)), 2)
df$x = 1:20
fun4 = function(data) c(mean_x = mean(data$x))
dplyr_wrapper(data = df, group_by = c("id", "task"), fun = fun4)


[Package fxtract version 0.9.2 Index]