in_parallel {purrr}R Documentation

Parallelization in purrr

Description

[Experimental]

All map functions allow parallelized operation using mirai.

Wrap functions passed to the .f argument of map() and its variants with in_parallel().

in_parallel() is a purrr adverb that plays two roles:

For maps to actually be performed in parallel, the user must also set mirai::daemons(), otherwise they fall back to sequential processing. mirai::require_daemons() may be used to enforce the use of parallel processing. See the section 'Daemons settings' below.

Usage

in_parallel(.f, ...)

Arguments

.f

A fresh formula or function. "Fresh" here means that they should be declared in the call to in_parallel().

...

Named arguments to declare in the environment of the function.

Value

A 'crate' (classed function).

Creating self-contained functions

in_parallel() is a simple wrapper of carrier::crate() and you may refer to that package for more details.

Example usage:

# The function needs to be freshly-defined, so instead of:
mtcars |> map_dbl(in_parallel(sum))
# Use an anonymous function:
mtcars |> map_dbl(in_parallel(\(x) sum(x)))

# Package functions need to be explicitly namespaced, so instead of:
map(1:3, in_parallel(\(x) vec_init(integer(), x)))
# Use :: to namespace all package functions:
map(1:3, in_parallel(\(x) vctrs::vec_init(integer(), x)))

fun <- function(x) { x + x %% 2 }
# Operating in parallel, locally-defined objects will not be found:
map(1:3, in_parallel(\(x) x + fun(x)))
# Use the ... argument to supply those objects:
map(1:3, in_parallel(\(x) x + fun(x), fun = fun))

When to use

Parallelizing a map using 'n' processes does not automatically lead to it taking 1/n of the time. Additional overhead from setting up the parallel task and communicating with parallel processes eats into this benefit, and can outweigh it for very short tasks or those involving large amounts of data. The threshold at which parallelization becomes clearly beneficial will differ according to your individual setup and task, but a rough guide would be in the order of 100 microseconds to 1 millisecond for each map iteration.

Daemons settings

How and where parallelization occurs is determined by mirai::daemons(). This is a function from the mirai package that sets up daemons (persistent background processes that receive parallel computations) on your local machine or across the network.

Daemons must be set prior to performing any parallel map operation, otherwise in_parallel() will fall back to sequential processing. To ensure that maps are always performed in parallel, put mirai::require_daemons() before the map.

It is usual to set daemons once per session. You can leave them running on your local machine as they consume almost no resources whilst waiting to receive tasks. The following sets up 6 daemons locally:

mirai::daemons(6)

Function arguments:

Resetting daemons:

Daemons persist for the duration of your session. To reset and tear down any existing daemons:

mirai::daemons(0)

All daemons automatically terminate when your session ends. You do not need to explicitly terminate daemons in this instance, although it is still good practice to do so.

Note: it should always be for the user to set daemons. If you are using parallel map within a package, do not make any mirai::daemons() calls within the package, as it should always be up to the user how they wish to set up parallel processing e.g. using local or remote daemons. This also helps prevent inadvertently spawning too many daemons if functions are used recursively within each other.

References

purrr's parallelization is powered by mirai. See the mirai website for more details.

See Also

map() for usage examples.

Examples


# Run in interactive sessions only as spawns additional processes

slow_lm <- function(formula, data) {
  Sys.sleep(0.5)
  lm(formula, data)
}

# Example of a 'crate' returned by in_parallel(). The object print method
# shows the size of the crate and any objects contained within:
crate <- in_parallel(\(df) slow_lm(mpg ~ disp, data = df), slow_lm = slow_lm)
crate

# Use mirai::mirai() to test that a crate is self-contained
# by running it in a daemon and collecting its return value:
mirai::mirai(crate(mtcars), crate = crate) |> mirai::collect_mirai()


[Package purrr version 1.1.0 Index]