declared {declared}R Documentation

Labelled vectors with declared missing values

Description

The labelled vectors are mainly used to analyse social science data, and the missing values declaration is an important step in the analysis.

Usage

declared(
    x = double(),
    labels = NULL,
    na_values = NULL,
    na_range = NULL,
    label = NULL,
    ...
)

is_declared(x)

is.declared(x)

as_declared(x, ...)

as.declared(x, ...)

as_haven(x, ...)

undeclare(x, drop = FALSE, ...)

Arguments

x

A numeric vector to label, or a declared labelled vector (for undeclare)

labels

A named vector or NULL. The vector should be the same type as x. Unlike factors, labels don't need to be exhaustive: only a fraction of the values might be labelled.

na_values

A vector of values that should also be considered as missing.

na_range

A numeric vector of length two giving the (inclusive) extents of the range. Use -Inf and Inf if you want the range to be open ended.

label

A short, human-readable description of the vector.

drop

Logical, drop all attributes.

...

Other arguments used by various other methods.

Details

The declared objects are very similar to the haven_labelled_spss objects from package haven. It has exactly the same arguments, but it features a fundamental difference in the treatment of (declared) missing values.

In package haven, existing values are treated as if they were missing. By contrast, in package declared the NA values are treated as existing values.

This difference is fundamental and points to an inconsistency in package haven: while existing values can be identified as missing using the function is.na(), they are in fact present in the vector and other packages (most importantly the base ones) do not know these values should be treated as missing.

Consequently, the existing values are interpreted as missing only by package haven. Statistical procedures will use those values as if they were valid values.

Package declared approaches the problem in exactly the opposite way: instead of treating existing values as missing, it treats (certain) NA values as existing. It does that by storing an attribute containing the indices of those NA values which are to be treated as declared missing values, and it refreshes this attribute each time the declared object is changed.

This is a trade off and has important implications when subsetting datasets: all declared variables get this attribute refreshed, which consumes some time depending on the number of variables in the data.

The function undeclare() replaces the NA entries into their original numeric values, and drops all attributes related to missing values: na_values, na_range and na_index. The result can be a regular vector (thus dropping all attributes, including the class "declared") by activating the argument drop.

Value

declared(), as_declared() and is_declared() will return a labelled vector.

is_declared() and is.declared() will return a logical scalar.

undeclare() will return a an object of class declared without the declared missing values

as_haven() returns an object of class haven_labelled_spss

Examples


x <- declared(
    c(1:5, -1),
    labels = c(Good = 1, Bad = 5, DK = -1),
    na_values = -1
)

x

is.na(x)

x > 0

x == -1

# Values are actually placeholder for categories, so labels work as if they were factors:
x == "DK"


# when newly added values are already declared as missing, they are automatically coerced
c(x, 2, -1)

# switch NAs with their original values
undeclare(x)


set.seed(123)
DF <- data.frame(
    Area = declared(
        sample(1:2, 123, replace = TRUE),
        labels = c(Urban = 1, Rural = 2)
    ),
    Gender = declared(
        sample(c(1:2, -1), 123, replace = TRUE),
        labels = c(Male = 1, Female = 2, Nonresponse = -1),
        na_values = -1
    ),
    Age = sample(18:90, 123, replace = TRUE),
    Children = sample(0:5, 123, replace = TRUE)
)

using(DF, mean(Children), split.by = Area)

using(DF, mean(Age), split.by = Gender & Area)


[Package declared version 0.13 Index]