cvSDTree {SDModels} | R Documentation |
Cross-validation for the SDTree
Description
Estimates the optimal complexity parameter for the SDTree using cross-validation. The transformations are estimated for each training set and validation set separately to ensure independence of the validation set.
Usage
cvSDTree(
formula = NULL,
data = NULL,
x = NULL,
y = NULL,
max_leaves = NULL,
cp = 0,
min_sample = 5,
mtry = NULL,
fast = TRUE,
Q_type = "trim",
trim_quantile = 0.5,
q_hat = 0,
Qf = NULL,
A = NULL,
gamma = 0.5,
gpu = FALSE,
mem_size = 1e+07,
max_candidates = 100,
nfolds = 3,
cp_seq = NULL,
mc.cores = 1,
Q_scale = TRUE
)
Arguments
formula |
Object of class |
data |
Training data of class |
x |
Predictor data, alternative to |
y |
Response vector, alternative to |
max_leaves |
Maximum number of leaves for the grown tree. |
cp |
Complexity parameter, minimum loss decrease to split a node.
A split is only performed if the loss decrease is larger than |
min_sample |
Minimum number of observations per leaf.
A split is only performed if both resulting leaves have at least
|
mtry |
Number of randomly selected covariates to consider for a split,
if |
fast |
If |
Q_type |
Type of deconfounding, one of 'trim', 'pca', 'no_deconfounding'.
'trim' corresponds to the Trim transform (Ćevid et al. 2020)
as implemented in the Doubly debiased lasso (Guo et al. 2022),
'pca' to the PCA transformation(Paul et al. 2008).
See |
trim_quantile |
Quantile for Trim transform,
only needed for trim and DDL_trim, see |
q_hat |
Assumed confounding dimension, only needed for pca,
see |
Qf |
Spectral transformation, if |
A |
Numerical Anchor of class |
gamma |
Strength of distributional robustness, |
gpu |
If |
mem_size |
Amount of split candidates that can be evaluated at once. This is a trade-off between memory and speed can be decreased if either the memory is not sufficient or the gpu is to small. |
max_candidates |
Maximum number of split points that are proposed at each node for each covariate. |
nfolds |
Number of folds for cross-validation. It is recommended to not use more than 5 folds if the number of covariates is larger than the number of observations. In this case the spectral transformation could differ to much if the validation data is substantially smaller than the training data. |
cp_seq |
Sequence of complexity parameters cp to compare using cross-validation,
if |
mc.cores |
Number of cores to use for parallel computation. |
Q_scale |
Should data be scaled to estimate the spectral transformation?
Default is |
Value
A list containing
cp_min |
The optimal complexity parameter. |
cp_table |
A table containing the complexity parameter, the mean and the standard deviation of the loss on the validation sets for the complexity parameters. If multiple complexity parameters result in the same loss, only the one with the largest complexity parameter is shown. |
Author(s)
Markus Ulmer
References
Guo Z, Ćevid D, Bühlmann P (2022).
“Doubly debiased lasso: High-dimensional inference under hidden confounding.”
The Annals of Statistics, 50(3).
ISSN 0090-5364, doi:10.1214/21-AOS2152.
Paul D, Bair E, Hastie T, Tibshirani R (2008).
““Preconditioning” for feature selection and regression in high-dimensional problems.”
The Annals of Statistics, 36(4).
ISSN 0090-5364, doi:10.1214/009053607000000578.
Ćevid D, Bühlmann P, Meinshausen N (2020).
“Spectral Deconfounding via Perturbed Sparse Linear Models.”
J. Mach. Learn. Res., 21(1).
ISSN 1532-4435, http://jmlr.org/papers/v21/19-545.html.
See Also
SDTree
prune.SDTree
regPath.SDTree
Examples
set.seed(1)
n <- 50
X <- matrix(rnorm(n * 5), nrow = n)
y <- sign(X[, 1]) * 3 + rnorm(n, 0, 5)
cp <- cvSDTree(x = X, y = y, Q_type = 'no_deconfounding')
cp