chopin-package {chopin}R Documentation

Computation of spatial data by hierarchical and objective partitioning of inputs for parallel processing

Description

The chopin package provides a set of functions to compute on divided geospatial data.

Basic functionalities

chopin workflow

library(chopin)
library(terra)
library(sf)
library(collapse)
library(dplyr)
library(future)
library(future.mirai)
library(future.apply)
lastpar <- par(mfrow = c(1, 1))
options(sf_use_s2 = FALSE)

Example data

temp_dir <- tempdir(check = TRUE)

url_nccnty <-
  paste0(
   "https://raw.githubusercontent.com/",
   "ropensci/chopin/refs/heads/main/",
   "tests/testdata/nc_hierarchy.gpkg"
 )  

url_ncelev <-
  paste0(
   "https://raw.githubusercontent.com/",
   "ropensci/chopin/refs/heads/main/",
   "tests/testdata/nc_srtm15_otm.tif"
 )

nccnty_path <- file.path(temp_dir, "nc_hierarchy.gpkg")
ncelev_path <- file.path(temp_dir, "nc_srtm15_otm.tif")

# download data
download.file(url_nccnty, nccnty_path, quiet = TRUE)
download.file(url_ncelev, ncelev_path, quiet = TRUE)
nccnty <- terra::vect(nccnty_path)
ncelev <- terra::rast(ncelev_path)
ncsamp <-
  terra::spatSample(
    nccnty,
    1e4L
  )
ncsamp$pid <- 1:nrow(ncsamp)

Creating grids

ncgrid <- par_pad_grid(ncsamp, mode = "grid", nx = 4L, ny = 2L, padding = 10000)
plot(ncgrid$original)

Extracting values from raster

future::plan(future.mirai::mirai_multisession, workers = 2L)
pg <-
  par_grid(
    grids = ncgrid,
    pad_y = FALSE,
    .debug = TRUE,
    fun_dist = extract_at,
    x = ncelev_path,
    y = sf::st_as_sf(ncsamp),
    id = "pid",
    radius = 1e4,
    func = "mean"
  )

Hierarchical processing

nccnty <- sf::st_read(nccnty_path, layer = "county")
nctrct <- sf::st_read(nccnty_path, layer = "tracts")
px <-
  par_hierarchy(
    # from here the par_hierarchy-specific arguments
    regions = nctrct,
    regions_id = "GEOID",
    length_left = 5,
    pad = 10000,
    pad_y = FALSE,
    .debug = TRUE,
    # from here are the dispatched function definition
    # for parallel workers
    fun_dist = extract_at,
    # below should follow the arguments of the dispatched function
    x = ncelev,
    y = sf::st_as_sf(ncsamp),
    id = "pid",
    radius = 1e4,
    func = "mean"
  )

Multiraster processing

ncelev <- terra::rast(ncelev_path)
tdir <- tempdir(check = TRUE)
terra::writeRaster(ncelev, file.path(tdir, "test1.tif"), overwrite = TRUE)
terra::writeRaster(ncelev, file.path(tdir, "test2.tif"), overwrite = TRUE)
terra::writeRaster(ncelev, file.path(tdir, "test3.tif"), overwrite = TRUE)
terra::writeRaster(ncelev, file.path(tdir, "test4.tif"), overwrite = TRUE)
terra::writeRaster(ncelev, file.path(tdir, "test5.tif"), overwrite = TRUE)
rasts <- list.files(tdir, pattern = "tif$", full.names = TRUE)

pm <-
  par_multirasters(
    filenames = rasts,
    fun_dist = extract_at,
    x = NA,
    y = sf::st_as_sf(ncsamp)[1:500, ],
    id = "pid",
    radius = 1e4,
    func = "mean",
    .debug = TRUE
  )
par(lastpar)

Function selection guide for ⁠par_*()⁠

We provide two flowcharts to help users choose the right function for parallel processing. The raster-oriented flowchart is for users who want to start with raster data, and the vector-oriented flowchart is for users with large vector data.

In raster-oriented selection, we suggest four factors to consider:

For vector-oriented selection, we suggest three factors to consider:

Caveats

Why parallelization is slower than the ordinary function run? Parallelization may underperform when the datasets are too small to take advantage of divide-and-compute approach, where parallelization overhead is involved. Overhead here refers to the required amount of computational resources for transferring objects to multiple processes. Since the demonstrations above use quite small datasets, the advantage of parallelization was not as noticeable as it was expected. Should a large amount of data (spatial/temporal resolution or number of files, for example) be processed, users could find the efficiency of this package. A vignette in this package demonstrates use cases extracting various climate/weather datasets.

Notes on data restrictions

chopin works best with two-dimensional (planar) geometries. Users should disable s2 spherical geometry mode in sf by setting sf::sf_use_s2(FALSE). Running any chopin functions at spherical or three-dimensional (e.g., including M/Z dimensions) geometries may produce incorrect or unexpected results.

Author(s)

Maintainer: Insang Song geoissong@gmail.com (ORCID)

Authors:

Other contributors:

See Also

Useful links:


[Package chopin version 0.9.4 Index]