factor.encoder {midr}R Documentation

Encoder for Qualitative Variables

Description

factor.encoder() returns an encoder for a qualitative variable.

Usage

factor.encoder(
  x,
  k,
  use.catchall = TRUE,
  catchall = "(others)",
  tag = "x",
  frame = NULL,
  weights = NULL
)

factor.frame(levels, catchall = "(others)", tag = "x")

Arguments

x

a vector to be encoded as a qualitative variable.

k

an integer specifying the maximum number of distinct levels. If not positive, all unique values of x are used as levels.

use.catchall

logical. If TRUE, less frequent levels are dropped and replaced by the catchall level.

catchall

a character string to be used as the catchall level.

tag

character string. The name of the variable.

frame

a "factor.frame" object or a character vector that defines the levels of the variable.

weights

optional. A numeric vector of sample weights for each value of x.

levels

a vector to be used as the levels of the variable.

Details

factor.encoder() extracts the unique values (levels) from the vector x and returns a list containing the encode() function to convert a vector into a dummy matrix using one-hot encoding. If use.catchall is TRUE and the number of levels exceeds k, only the most frequent k - 1 levels are used and the other values are replaced by the catchall.

Value

factor.encoder() returns a list containing the following components:

frame

an object of class "factor.frame".

encode

a function to encode x into a dummy matrix.

n

the number of encoding levels.

type

the type of encoding.

factor.frame() returns a "factor.frame" object containing the encoding information.

Examples

data(iris, package = "datasets")
enc <- factor.encoder(x = iris$Species, use.catchall = FALSE, tag = "Species")
enc$frame
enc$encode(x = c("setosa", "virginica", "ensata", NA, "versicolor"))

frm <- factor.frame(c("setosa", "virginica"), "other iris")
enc <- factor.encoder(x = iris$Species, frame = frm)
enc$encode(c("setosa", "virginica", "ensata", NA, "versicolor"))

enc <- factor.encoder(x = iris$Species, frame = c("setosa", "versicolor"))
enc$encode(c("setosa", "virginica", "ensata", NA, "versicolor"))

[Package midr version 0.5.0 Index]