distance_thinning {GeoThinneR} | R Documentation |
Perform Distance-Based Thinning
Description
This function applies a distance-based thinning algorithm using a kd-tree or brute-force approach. Two modified algorithms based on kd-trees (local kd-trees and estimating the maximum number of neighbors) are implemented which scale better for large datasets. The function removes points that are closer than a specified distance to each other while maximizing spatial representation.
Usage
distance_thinning(
coordinates,
thin_dist = 10,
trials = 10,
all_trials = FALSE,
search_type = c("local_kd_tree", "k_estimation", "kd_tree", "brute"),
target_points = NULL,
distance = c("haversine", "euclidean"),
R = 6371,
n_cores = 1
)
Arguments
coordinates |
A matrix of coordinates to thin, with two columns representing longitude and latitude. |
thin_dist |
A positive numeric value representing the thinning distance in kilometers. |
trials |
An integer specifying the number of trials to run for thinning. Default is 10. |
all_trials |
A logical indicating whether to return results of all attempts ('TRUE') or only the best attempt with the most points retained ('FALSE'). Default is 'FALSE'. |
search_type |
A character string indicating the neighbor search method 'c("local_kd_tree", "k_estimation", "kd_tree", "brute")'. The default value is 'local_kd_tree'. See details. |
target_points |
Optional integer specifying the number of points to retain. If 'NULL' (default), the function tries to maximize the number of points retained. |
distance |
Distance metric to use 'c("haversine", "euclidean")'. Default is Haversine for geographic coordinates. |
R |
Radius of the Earth in kilometers (default: 6371 km). |
n_cores |
Number of cores for parallel processing (only for '"local_kd_tree"'). Default is 1. |
Details
- '"kd_tree"': Uses a single kd-tree for efficient nearest-neighbor searches. - '"local_kd_tree"': Builds multiple smaller kd-trees for better scalability. - '"k_estimation"': Approximates a maximum number of neighbors per point to reduce search complexity. - '"brute"': Computes all pairwise distances (inefficient for large datasets).
Value
A list. If 'all_trials' is 'FALSE', the list contains a single logical vector indicating which points are kept in the best trial. If 'all_trials' is 'TRUE', the list contains a logical vector for each trial.
Examples
# Generate sample coordinates
set.seed(123)
result <- matrix(runif(20, min = -180, max = 180), ncol = 2) # 10 random points
# Perform thinning with local kd-trees
result_partitioned <- distance_thinning(result , thin_dist = 5000, trials = 5,
search_type = "local_kd_tree", all_trials = TRUE)
print(result_partitioned)
# Perform thinning estimating max number of neighbors
result_estimated <- distance_thinning(result , thin_dist = 5000, trials = 5,
search_type = "k_estimation", all_trials = TRUE)
print(result_estimated)