roseRF_gplm {roseRF} | R Documentation |
ROSE random forest estimator for the generalised partially linear model
Description
Estimates the parameter of interest \theta_0
in the generalised partially linear model
g(\mathbb{E}[Y|X,Z]) = X\theta_0 + f_0(Z),
for some (strictly increasing, differentiable) link function g
, which can be reposed in terms of
the ‘nuisance functions’ (\mathbb{E}[X|Z], \mathbb{E}[g(\mathbb{E}[Y|X,Z])|Z])
as
g\big(\mathbb{E}[Y|X,Z])-\mathbb{E}[g(\mathbb{E}[Y|X,Z])|Z]\big) = (X-\mathbb{E}[X|Z])\theta_0.
Usage
roseRF_gplm(
y_on_xz_formula,
y_on_xz_learner,
y_on_xz_pars = list(),
Gy_on_z_formula,
Gy_on_z_learner,
Gy_on_z_pars = list(),
x_formula,
x_learner,
x_pars = list(),
M1_formula = x_formula,
M1_learner = x_learner,
M1_pars = x_pars,
M2_formula = NA,
M2_learner = NA,
M2_pars = list(),
M3_formula = NA,
M3_learner = NA,
M3_pars = list(),
M4_formula = NA,
M4_learner = NA,
M4_pars = list(),
M5_formula = NA,
M5_learner = NA,
M5_pars = list(),
link = "identity",
data,
K = 5,
S = 1,
max.depth = 10,
num.trees = 500,
min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))),
replace = TRUE,
sample.fraction = 0.8
)
Arguments
y_on_xz_formula |
a two-sided formula object describing the model for |
y_on_xz_learner |
a string specifying the regression method to fit the regression as given by |
y_on_xz_pars |
a list containing hyperparameters for the |
Gy_on_z_formula |
a two-sided formula object describing the model for |
Gy_on_z_learner |
a string specifying the regression method to fit the regression as given by |
Gy_on_z_pars |
a list containing hyperparameters for the |
x_formula |
a two-sided formula object describing the model for |
x_learner |
a string specifying the regression method to fit the regression of |
x_pars |
a list containing hyperparameters for the |
M1_formula |
a two-sided formula object for the model |
M1_learner |
a string specifying the regression method for |
M1_pars |
a list containing hyperparameters for the |
M2_formula |
a two-sided formula object for the model |
M2_learner |
a string specifying the regression method for |
M2_pars |
a list containing hyperparameters for the |
M3_formula |
a two-sided formula object for the model |
M3_learner |
a string specifying the regression method for |
M3_pars |
a list containing hyperparameters for the |
M4_formula |
a two-sided formula object for the model |
M4_learner |
a string specifying the regression method for |
M4_pars |
a list containing hyperparameters for the |
M5_formula |
a two-sided formula object for the model |
M5_learner |
a string specifying the regression method for |
M5_pars |
a list containing hyperparameters for the |
link |
link function ( |
data |
a data frame containing the variables for the partially linear model. |
K |
the number of folds used for |
S |
the number of repeats to mitigate the randomness in the estimator on the sample splits used for |
max.depth |
Maximum depth parameter used for ROSE random forests. Default is 5. |
num.trees |
Number of trees used for a single ROSE random forest. Default is 50. |
min.node.size |
Minimum node size of a leaf in each tree. Default is |
replace |
Whether sampling for a single random tree are performed with (bootstrap) or without replacement. Default is |
sample.fraction |
Proportion of data used for each random tree. Default is 0.8. |
Details
The estimator of interest \theta_0
solves the estimating equation
\sum_{i}\psi(Y_i,X_i,Z_i; \theta,\hat{\eta}(Z),\hat{w}(Z)) = 0,
\psi(Y,X,Z;\theta,\eta_0,w) := \sum_{j=1}^J w_j(Z) \big( M_j(X) - \mathbb{E}[M_j(X)|Z] \big) g'(\mu(X,Z;\theta,\eta_0)) \big(Y-\mu(X,Z;\theta,\eta_0)\big) ,
\mu(X,Z;\theta,\eta_0) := g^{-1}\big(\mathbb{E}[g(\mathbb{E}[Y|X,Z])|Z] + (X-\mathbb{E}[X|Z])\theta\big),
\eta_0 := \big(\mathbb{E}[Y|Z=\cdot], \mathbb{E}[X|Z=\cdot]\big),
where M_1(X),\ldots,M_J(X)
denotes user-chosen functions of (X)
and w(Z)=\big(w_1(Z),\ldots,w_J(Z)\big)
denotes weights estimated via ROSE random forests.
The default takes J=1
and M_1(X)=X
; if taking J\geq 2
we recommend care
in checking the applicability and appropriateness of any additional user-chosen
regression tasks.
The parameter of interest \theta_0
is estimated using a DML2 / K
-fold cross-fitting
framework, to allow for arbitrary (faster than n^{1/4}
-consistent) learners for \hat{\eta}
i.e. solving
the estimating equation
\sum_{k \in [K]}\sum_{i \in I_k}\psi(Y_i,X_i,Z_i; \theta,\hat{\eta}^{(k)}(Z),\hat{w}^{(k)}(Z)) = 0,
where I_1,\ldots,I_K
denotes a partition of the index set for the datapoints (Y_i,X_i,Z_i)
,
\hat{\eta}^{(k)}
denotes an estimator for \eta_0
trained on the data indexed by
I_k^c
, and \hat{w}^{(k)}
denotes a ROSE random forest (again trained on the data
indexed by I_k^c
).
Value
A list containing:
theta
The estimator of
\theta_0
.stderror
Huber robust estimate of the standard error of the
\theta_0
-estimator.coefficients
Table of
\theta_0
coefficient estimator, standard error, z-value and p-value.