wlasso {svyVarSel}R Documentation

Weighted LASSO prediction models for complex survey data

Description

This function allows as to fit LASSO prediction (linear or logistic) models to complex survey data, considering sampling weights in the estimation process and selects the lambda that minimizes the error based on different replicating weights methods.

Usage

wlasso(
  data = NULL,
  col.y = NULL,
  col.x = NULL,
  cluster = NULL,
  strata = NULL,
  weights = NULL,
  design = NULL,
  family = c("gaussian", "binomial"),
  lambda.grid = NULL,
  method = c("dCV", "JKn", "bootstrap", "subbootstrap", "BRR", "split", "extrapolation"),
  k = 10,
  R = 1,
  B = 200,
  dCV.sw.test = FALSE,
  train.prob = 0.7,
  method.split = c("dCV", "bootstrap", "subbootstrap"),
  print.rw = FALSE
)

Arguments

data

A data frame with information about the response variable and covariates, as well as sampling weights and strata and cluster indicators. It could be NULL if the sampling design is indicated in the design argument.

col.y

A numeric value indicating the number of the column in which information on the response variable can be found or a character string indicating the name of that column.

col.x

A numeric vector indicating the numbers of the columns in which information on the covariates can be found or a vector of character strings indicating the names of these columns.

cluster

A character string indicating the name of the column with cluster identifiers. It could be NULL if the sampling design is indicated in the design argument.

strata

A character string indicating the name of the column with strata identifiers. It could be NULL if the sampling design is indicated in the design argument.

weights

A character string indicating the name of the column with sampling weights. It could be NULL if the sampling design is indicated in the design argument.

design

An object of class survey.design generated by survey::svydesign(). It could be NULL if information about cluster, strata, weights and data are given.

family

A character string indicating the family to fit LASSO models. Choose between gaussian (to fit linear models) or binomial (for logistic models).

lambda.grid

A numeric vector indicating a grid for penalization parameters. The default option is lambda.grid = NULL, which considers the default grid selected by the function glmnet::glmnet().

method

A character string indicating the method to be applied to define replicate weights. Choose between one of these: JKn, dCV, bootstrap, subbootstrap, BRR, split, extrapolation.

k

A numeric value indicating the number of folds to be defined. Default is k=10. Only applies for the dCV method.

R

A numeric value indicating the number of times the sample is partitioned. Default is R=1. Only applies for dCV, split or extrapolation methods.

B

A numeric value indicating the number of bootstrap resamples. Default is B=200. Only applies for bootstrap and subbootstrap methods.

dCV.sw.test

A logical value indicating the method for estimating the error for dCV method. FALSE, (the default option) estimates the error for each test set and defines the cross-validated error based on the average strategy. Option TRUE estimates the cross-validated error based on the pooling strategy

train.prob

A numeric value between 0 and 1, indicating the proportion of clusters (for the method split) or strata (for the method extrapolation) to be set in the training sets. Default is train.prob = 0.7. Only applies for split and extrapolation methods.

method.split

A character string indicating the way in which replicate weights should be defined in the split method. Choose one of the following: dCV, bootstrap or subbootstrap. Only applies for split method.

print.rw

A logical value. If TRUE, the data set with the replicate weights is saved in the output object. Default print.rw=FALSE.

Value

The output object of the function wlasso() is an object of class wlasso. This object is a list containing 4 or 5 elements, depending on the value set to the argument print.rw. Below we describe the contents of these elements:

Examples

data(simdata_lasso_binomial)
mcv <- wlasso(data = simdata_lasso_binomial,
              col.y = "y", col.x = 1:50,
              family = "binomial",
              cluster = "cluster", strata = "strata", weights = "weights",
              method = "dCV", k=10, R=1)

# Or equivalently:

mydesign <- survey::svydesign(ids=~cluster, strata = ~strata, weights = ~weights,
                              nest = TRUE, data = simdata_lasso_binomial)
mcv <- wlasso(col.y = "y", col.x = 1:50, design = mydesign,
              family = "binomial",
              method = "dCV", k=10, R=1)


[Package svyVarSel version 1.0.1 Index]