missingFix {folda} | R Documentation |
Impute Missing Values and Add Missing Flags to a Data Frame
Description
This function imputes missing values in a data frame based on specified methods for numerical and categorical variables. Additionally, it can add flag columns to indicate missing values. For numerical variables, missing values can be imputed using the mean or median. For categorical variables, missing values can be imputed using the mode or a new level. This function also removes constant columns (all NAs or all observed but the same value).
Usage
missingFix(data, missingMethod = c("medianFlag", "newLevel"))
Arguments
data |
A data frame containing the data to be processed. Missing values
( |
missingMethod |
A character vector of length 2 specifying the methods
for imputing missing values. The first element specifies the method for
numerical variables ( |
Value
A list with two elements:
data |
The original data frame with missing values imputed, and flag columns added if applicable. |
ref |
A reference row containing the imputed values and flag levels, which can be used for future predictions or reference. |
Examples
dat <- data.frame(
X1 = rep(NA, 5),
X2 = factor(rep(NA, 5), levels = LETTERS[1:3]),
X3 = 1:5,
X4 = LETTERS[1:5],
X5 = c(NA, 2, 3, 10, NA),
X6 = factor(c("A", NA, NA, "B", "B"), levels = LETTERS[1:3])
)
missingFix(dat)