factor.encoder {midr} | R Documentation |
Encoder for Qualitative Variables
Description
factor.encoder()
returns an encoder for a qualitative variable.
Usage
factor.encoder(
x,
k,
use.catchall = TRUE,
catchall = "(others)",
tag = "x",
frame = NULL,
weights = NULL
)
factor.frame(levels, catchall = "(others)", tag = "x")
Arguments
x |
a vector to be encoded as a qualitative variable. |
k |
an integer specifying the maximum number of distinct levels. If not positive, all unique values of |
use.catchall |
logical. If |
catchall |
a character string to be used as the catchall level. |
tag |
character string. The name of the variable. |
frame |
a "factor.frame" object or a character vector that defines the levels of the variable. |
weights |
optional. A numeric vector of sample weights for each value of |
levels |
a vector to be used as the levels of the variable. |
Details
factor.encoder()
extracts the unique values (levels) from the vector x
and returns a list containing the encode()
function to convert a vector into a dummy matrix using one-hot encoding.
If use.catchall
is TRUE
and the number of levels exceeds k
, only the most frequent k - 1 levels are used and the other values are replaced by the catchall
.
Value
factor.encoder()
returns a list containing the following components:
frame |
an object of class "factor.frame". |
encode |
a function to encode |
n |
the number of encoding levels. |
type |
the type of encoding. |
factor.frame()
returns a "factor.frame" object containing the encoding information.
Examples
data(iris, package = "datasets")
enc <- factor.encoder(x = iris$Species, use.catchall = FALSE, tag = "Species")
enc$frame
enc$encode(x = c("setosa", "virginica", "ensata", NA, "versicolor"))
frm <- factor.frame(c("setosa", "virginica"), "other iris")
enc <- factor.encoder(x = iris$Species, frame = frm)
enc$encode(c("setosa", "virginica", "ensata", NA, "versicolor"))
enc <- factor.encoder(x = iris$Species, frame = c("setosa", "versicolor"))
enc$encode(c("setosa", "virginica", "ensata", NA, "versicolor"))