target_encoding_lab {collinear}R Documentation

Target Encoding Lab: Transform Categorical Variables to Numeric

Description

Target encoding involves replacing the values of categorical variables with numeric ones derived from a "target variable", usually a model's response.

In essence, target encoding works as follows:

The methods to compute the group statistic implemented here are:

Accepts a parallelization setup via future::plan() and a progress bar via progressr::handlers() (see examples).

Usage

target_encoding_lab(
  df = NULL,
  response = NULL,
  predictors = NULL,
  methods = c("loo", "mean", "rank"),
  smoothing = 0,
  white_noise = 0,
  seed = 0,
  overwrite = FALSE,
  quiet = FALSE
)

Arguments

df

(required; data frame, tibble, or sf) A data frame with responses and predictors. Default: NULL.

response

(optional, character string) Name of a numeric response variable in df. Default: NULL.

predictors

(optional; character vector) Names of the predictors to select from df. If omitted, all numeric columns in df are used instead. If argument response is not provided, non-numeric variables are ignored. Default: NULL

methods

(optional; character vector or NULL). Name of the target encoding methods. If NULL, target encoding is ignored, and df is returned with no modification. Default: c("loo", "mean", "rank")

smoothing

(optional; integer vector) Argument of the method "mean". Groups smaller than this number have their means pulled towards the mean of the response across all cases. Default: 0

white_noise

(optional; numeric vector) Argument of the methods "mean", "rank", and "loo". Maximum white noise to add, expressed as a fraction of the range of the response variable. Range from 0 to 1. Default: 0.

seed

(optional; integer vector) Random seed to facilitate reproducibility when white_noise is not 0. If NULL, the function selects one at random, and the selected seed does not appear in the encoded variable names. Default: 0

overwrite

(optional; logical) If TRUE, the original predictors in df are overwritten with their encoded versions, but only one encoding method, smoothing, white noise, and seed are allowed. Otherwise, encoded predictors with their descriptive names are added to df. Default: FALSE

quiet

(optional; logical) If FALSE, messages generated during the execution of the function are printed to the console Default: FALSE

Value

data frame

Author(s)

Blas M. Benito, PhD

References

See Also

Other target_encoding: target_encoding_mean()

Examples


data(
  vi,
  vi_predictors
  )

#subset to limit example run time
vi <- vi[1:1000, ]

#applying all methods for a continuous response
df <- target_encoding_lab(
  df = vi,
  response = "vi_numeric",
  predictors = "koppen_zone",
  methods = c(
    "mean",
    "loo",
    "rank"
  ),
  white_noise = c(0, 0.1, 0.2)
)

#identify encoded predictors
predictors.encoded <- grep(
  pattern = "*__encoded*",
  x = colnames(df),
  value = TRUE
)



[Package collinear version 2.0.0 Index]