declared {declared} | R Documentation |
The labelled vectors are mainly used to analyse social science data, and the missing values declaration is an important step in the analysis.
declared( x = double(), labels = NULL, na_values = NULL, na_range = NULL, label = NULL, ... ) is_declared(x) is.declared(x) as_declared(x, ...) as.declared(x, ...) as_haven(x, ...) undeclare(x, drop = FALSE, ...)
x |
A numeric vector to label, or a declared labelled vector (for |
labels |
A named vector or |
na_values |
A vector of values that should also be considered as missing. |
na_range |
A numeric vector of length two giving the (inclusive) extents
of the range. Use |
label |
A short, human-readable description of the vector. |
drop |
Logical, drop all attributes. |
... |
Other arguments used by various other methods. |
The declared
objects are very similar to the haven_labelled_spss
objects from package haven. It has exactly the same arguments, but it
features a fundamental difference in the treatment of (declared) missing values.
In package haven, existing values are treated as if they were missing. By contrast, in package declared the NA values are treated as existing values.
This difference is fundamental and points to an inconsistency in package
haven: while existing values can be identified as missing using the
function is.na()
, they are in fact present in the vector and other
packages (most importantly the base ones) do not know these values should be
treated as missing.
Consequently, the existing values are interpreted as missing only by package haven. Statistical procedures will use those values as if they were valid values.
Package declared approaches the problem in exactly the opposite way: instead of treating existing values as missing, it treats (certain) NA values as existing. It does that by storing an attribute containing the indices of those NA values which are to be treated as declared missing values, and it refreshes this attribute each time the declared object is changed.
This is a trade off and has important implications when subsetting datasets: all declared variables get this attribute refreshed, which consumes some time depending on the number of variables in the data.
The function undeclare()
replaces the NA entries into their original
numeric values, and drops all attributes related to missing values:
na_values
, na_range
and na_index
. The result can be a
regular vector (thus dropping all attributes, including the class "declared")
by activating the argument drop
.
declared()
, as_declared()
and is_declared()
will return a
labelled vector.
is_declared()
and is.declared()
will return a logical scalar.
undeclare()
will return a an object of class declared
without the
declared missing values
as_haven()
returns an object of class haven_labelled_spss
x <- declared( c(1:5, -1), labels = c(Good = 1, Bad = 5, DK = -1), na_values = -1 ) x is.na(x) x > 0 x == -1 # Values are actually placeholder for categories, so labels work as if they were factors: x == "DK" # when newly added values are already declared as missing, they are automatically coerced c(x, 2, -1) # switch NAs with their original values undeclare(x) set.seed(123) DF <- data.frame( Area = declared( sample(1:2, 123, replace = TRUE), labels = c(Urban = 1, Rural = 2) ), Gender = declared( sample(c(1:2, -1), 123, replace = TRUE), labels = c(Male = 1, Female = 2, Nonresponse = -1), na_values = -1 ), Age = sample(18:90, 123, replace = TRUE), Children = sample(0:5, 123, replace = TRUE) ) using(DF, mean(Children), split.by = Area) using(DF, mean(Age), split.by = Gender & Area)