div {DistributionIV}R Documentation

Distributional IV Model Function

Description

This function fits a distributional IV model to the data. It allows for the tuning of several parameters related to model complexity and model training. Variables are per default internally standardized (predictions are on the original scale).

Usage

div(
  Z,
  X,
  Y,
  W = NULL,
  epsx_dim = 50,
  epsy_dim = 50,
  epsh_dim = 50,
  hidden_dim = 100,
  num_layer = 3,
  num_epochs = 1000,
  lr = 10^(-3),
  beta = 1,
  silent = FALSE,
  standardize = TRUE
)

Arguments

Z

A data frame, matrix, vector, or factor variable representing the instrumental variable.

X

A data frame, matrix, vector, or factor variable representing the predictor.

Y

A data frame, matrix, vector, or factor variable representing the target variable.

W

(Optional) A data frame, matrix, vector, or factor variable representing the exogenous variable(s).

epsx_dim

The dimension of the noise corresponding to the predictor introduced in the model (default: 50).

epsy_dim

The dimension of the noise corresponding to the outcome introduced in the model (default: 50).

epsh_dim

The dimension of the noise corresponding to the hidden confounder introduced in the model (default: 50).

hidden_dim

The size of the hidden layer in the model (default: 100).

num_layer

The number of layers in the model (default: 3).

num_epochs

The number of epochs to be used in training (default: 1000).

lr

The learning rate to be used in training (default: 10^-3).

beta

The beta scaling factor for energy loss, numeric value from (0,2) (default: 1).

silent

A boolean indicating whether to suppress output during model training (default: FALSE).

standardize

A boolean indicating whether to standardize the input data (default: TRUE).

Value

A distributional IV model object with class 'DIV'.

Examples


# Simulate data -------------------------------------------------------------
p_Z <- 4
p_X <- 2

set.seed(2209)
n_train <- 1000
Z <- matrix(rnorm(n_train * p_Z, mean = 2), ncol = p_Z)
H <- rnorm(n_train, mean = 0, sd = 1.5)
X1 <- 0.1 * (Z[, 1] + rnorm(n_train, sd = 0.1)) ^ 2 +
 (Z[, 2] + rnorm(n_train, sd = 1)) ^ 2 + H + rnorm(n_train, sd = 0.1)
X2 <- 0.5 * (Z[, 3] + Z[, 4]) ^ 2 + 0.1 * H ^ 2 + rnorm(n_train, sd = 0.1)
X <- matrix(cbind(X1, X2), ncol = p_X)
Y <- 0.5 * X[, 1] + 0.2 * (X[, 2] + rnorm(n_train, sd = 0.2) + H) ^ 2 +
 rnorm(n_train, sd = 0.1)

n_test <- n_train
Ztest <- matrix(rnorm(n_test * p_Z, mean = 2), ncol = p_Z)
Htest <- rnorm(n_test, mean = 0, sd = 1.5)
X1test <- 0.1 * (Ztest[, 1] + rnorm(n_test, sd = 0.1)) ^ 2 +
 (Ztest[, 2] + rnorm(n_test, sd = 1)) ^ 2 + Htest + rnorm(n_test, sd = 0.1)
X2test <- 0.5 * (Ztest[, 3] + Ztest[, 4]) ^ 2 + 0.1 * Htest ^ 2 + rnorm(n_test, sd = 0.1)
Xtest <- matrix(cbind(X1test, X2test), ncol = p_X)
Ytest <- 0.5 * Xtest[, 1] + 0.2 * (Xtest[, 2] + rnorm(n_test, sd = 0.2) + Htest) ^ 2 +
 rnorm(n_test, sd = 0.1)

# Fit generativeIV model ----------------------------------------------------
# Consider increasing number of epochs. Here: num_epochs = 100 for fast computation only.
DIV_model <- div(Z = Z, X = X, Y = Y, num_epochs = 100)
print(DIV_model)

# Prediction on test data ---------------------------------------------------
Yhat <- predict(object = DIV_model, Xtest = Xtest, type = "mean")
cat("\n Correlation between predicted and realized values: ", signif(cor(Yhat, Ytest), 3))
plot(Yhat, Ytest, xlab = "model prediction", ylab = "observation")
# Quantile prediction -------------------------------------------------------
Yhat_quant <- predict(object = DIV_model, Xtest = Xtest, type = "quantile")
ord <- order(Yhat)
matplot(Yhat[ord], Yhat_quant[ord,], type = "l", col = 2, lty = 1,
xlab = "model prediction", ylab = "observation")
points(Yhat[ord], Ytest[ord], pch = 20, cex = 0.5)

#' # Sampling from estimated model ---------------------------------------------
Ysample <- predict(object = DIV_model, Xtest = Xtest, type = "sample", nsample = 1)

#' # Plots of realized & sampled values against first variable -----------------
oldpar <- par(no.readonly = TRUE)
par(mfrow = c(1, 2))
plot(Xtest[, 1], Ytest, xlab = "Predictor Variable 1", ylab = "Observation")
plot(Xtest[, 1], Ysample, xlab = "Predictor Variable 1", ylab = "Model sample")
par(oldpar)



[Package DistributionIV version 0.1.0 Index]