eucop_data_preparation {RRgeo}R Documentation

Import and preprocess mammal occurrence data

Description

The function is meant to automatically import and preprocess fossil mammal occurrences and paleoclimatic/vegetational data available in EutherianCop dataset (Mondanaro et al., 2025). It also provides two distinct approaches, both implemented within a user-defined study area, for sampling a specified number of pseudoabsences or alternatively defining the background points. This flexibility enables users to assemble a list of sf objects that can be easily used to train ENFA, ENphylo or any other SDM algorithms of their choice.

Usage

eucop_data_preparation(input.dir,species_name,variables="all",which.vars=NULL,
calibration=FALSE,add.modern.occs=FALSE,
combine.ages=NULL,remove.duplicates=TRUE, bk_points=NULL,output.dir)

Arguments

input.dir

the file path wherein EutherianCop mammal occurrences and paleoclimatic data are to be stored.

species_name

character. The name of the single (or multiple) species used by eucop_data_preparation.

variables

character. The name of paleoclimatic simulations to be used. The viable options are "climveg", "bio", or "all".

which.vars

character vector indicating the name of the variables to be downloaded. The list of accepted names can be found [here](https://www.nature.com/articles/s41597-024-04181-4/tables/1).

calibration

logical. If TRUE, eucop_data_preparation performs the 14C calibration process to convert the conventional radiocarbon age estimates included in EutherianCop raw data file.

add.modern.occs

logical. If TRUE, eucop_data_preparation adds the modern records (if present) related to species in species_name.

combine.ages

one of "mean" or "median". The method to be used to aggregate multiple ages for each site or layer within the site.

remove.duplicates

logical. If TRUE, eucop_data_preparation removes duplicated record for each grid cell within a given time bin.

bk_points

a list including parameters to add background/pseudoabsence (i.e. absence) points (following the procedure described in Mondanaro et al. 2024). The list includes:

  • buff: the proportional distance to set a buffer around the minimum convex polygon that encompasses all occurrences of the target species.

  • bk_strategy: the strategy to add the absence points. It can be one of "background" or "pseudoabsence".

  • bk_n: number of absence points.

If provided as an empty list(), the function automatically sets buff = 0.1, bk_strategy="background",bk_n=10000.

output.dir

the file path wherein eucop_data_preparation stores the results.

Details

The variables argument allows the selection of climatic and environmental variables ("climveg"), bioclimatic variables ("bio"), or both sets of variables.

Through the bk_strategy argument, eucop_data_preparation offers two different approaches to generate absence points. The definition of the study area is the same for both methods. Under bk_strategy = "background", the bk_n argument defines the maximum number of background points sampled from the study area within each time bin. Under bk_strategy = "pseudoabsence", the bk_n argument represents the maximum number of pseudoabsence points across all time bins. This flexibility allows users to accommodate the different requirements for training the traditional envelope models (i.e. ENFA, ENphylo) and the common correlative or machine learning models (i.e. generalized linear model, MaxEnt, Random Forest).

Additionally, if bk_points is not NULL, the ages of presences and pseudoabsences or background points are forced to 1 kyr resolution according to the temporal resolution of the paleoclimatic/vegetational or bioclimatic data.

Value

eucop_data_preparation does not store any results in the global environment. Instead, a list of GeoPackage files, one per selected species, is saved in the directory specified by output.dir. The names of these files depend on the combination of arguments chosen by users: they include the suffix "cal/uncal" and "combined/multi" depending on whether calibration (calibration) and age aggregation (combine.ages) steps are performed. In any case, output files include information about ages, a column called "OBS" including species occurrence data in binary format, spatial geometry, and all the data information derived from EutherianCop dataset.

Author(s)

Alessandro Mondanaro, Silvia Castiglione, Pasquale Raia

References

Mondanaro, A., Di Febbraro, M., Castiglione, S., Belfiore, A. M., Girardi, G., Melchionna, M., Serio, C., Esposito, A., & Raia, P. (2024). Modelling reveals the effect of climate and land use change on Madagascar’s chameleons fauna. Communications Biology, 7: 889. doi:10.1038/s42003-024-06597-5.

Mondanaro, A., Girardi, G., Castiglione, S., Timmermann, A., Zeller, E., Venugopal, T., Serio, C., Melchionna, M., Esposito, A., Di Febbraro, M., & Raia, P. (2025). EutherianCoP. An integrated biotic and climate database for conservation paleobiology based on eutherian mammals. Scientific Data, 12: 6. doi:10.1038/s41597-024-04181-4.

See Also

eucop_data_preparation vignette

Examples



newwd<-tempdir()
# newwd<-"YOUR_DIRECTORY"

eucop_data_preparation(input.dir=newwd, species_name="Ursus ingressus",
                       variables="bio",which.vars = "bio1", calibration=FALSE, combine.ages="mean",
                       bk_points=NULL,output.dir=newwd)



[Package RRgeo version 0.0.5 Index]