distance_thinning {GeoThinneR}R Documentation

Perform Distance-Based Thinning

Description

This function applies a distance-based thinning algorithm using a kd-tree or brute-force approach. Two modified algorithms based on kd-trees (local kd-trees and estimating the maximum number of neighbors) are implemented which scale better for large datasets. The function removes points that are closer than a specified distance to each other while maximizing spatial representation.

Usage

distance_thinning(
  coordinates,
  thin_dist = 10,
  trials = 10,
  all_trials = FALSE,
  search_type = c("local_kd_tree", "k_estimation", "kd_tree", "brute"),
  target_points = NULL,
  distance = c("haversine", "euclidean"),
  R = 6371,
  n_cores = 1
)

Arguments

coordinates

A matrix of coordinates to thin, with two columns representing longitude and latitude.

thin_dist

A positive numeric value representing the thinning distance in kilometers.

trials

An integer specifying the number of trials to run for thinning. Default is 10.

all_trials

A logical indicating whether to return results of all attempts ('TRUE') or only the best attempt with the most points retained ('FALSE'). Default is 'FALSE'.

search_type

A character string indicating the neighbor search method 'c("local_kd_tree", "k_estimation", "kd_tree", "brute")'. The default value is 'local_kd_tree'. See details.

target_points

Optional integer specifying the number of points to retain. If 'NULL' (default), the function tries to maximize the number of points retained.

distance

Distance metric to use 'c("haversine", "euclidean")'. Default is Haversine for geographic coordinates.

R

Radius of the Earth in kilometers (default: 6371 km).

n_cores

Number of cores for parallel processing (only for '"local_kd_tree"'). Default is 1.

Details

- '"kd_tree"': Uses a single kd-tree for efficient nearest-neighbor searches. - '"local_kd_tree"': Builds multiple smaller kd-trees for better scalability. - '"k_estimation"': Approximates a maximum number of neighbors per point to reduce search complexity. - '"brute"': Computes all pairwise distances (inefficient for large datasets).

Value

A list. If 'all_trials' is 'FALSE', the list contains a single logical vector indicating which points are kept in the best trial. If 'all_trials' is 'TRUE', the list contains a logical vector for each trial.

Examples

# Generate sample coordinates
set.seed(123)
result  <- matrix(runif(20, min = -180, max = 180), ncol = 2) # 10 random points

# Perform thinning with local kd-trees
result_partitioned <- distance_thinning(result , thin_dist = 5000, trials = 5,
                                       search_type = "local_kd_tree", all_trials = TRUE)
print(result_partitioned)

# Perform thinning estimating max number of neighbors
result_estimated <- distance_thinning(result , thin_dist = 5000, trials = 5,
                                       search_type = "k_estimation", all_trials = TRUE)
print(result_estimated)


[Package GeoThinneR version 2.0.0 Index]