rrda.cv {rrda} | R Documentation |
Cross-validation for Ridge Redundancy Analysis
Description
This function performs cross-validation to evaluate the performance of Ridge Redundancy Analysis (RDA) models. It calculates the mean squared error (MSE) for different ranks and ridge penalty values through cross-validation folds. The function also supports centering and scaling of the input matrices.
The range of lambda for the cross-validation is automatically calculated following the method of "glmnet" (Friedman et al., 2010). When we have a matrix of response variables (Y; n times q matrix) and a matrix of explanatory variables (X; n times p matrix), the largest lambda for the validation is obtained as follows
\lambda_{\text{max}} = \frac{\max_{j \in \{1, 2, \dots, p\}} \sqrt{\sum_{k=1}^{q} \left( \sum_{i=1}^{n} (x_{ij}\cdot y_{ik}) \right)^2}}{N \times 10^{-3}}
Then, we define \lambda_{min}=10^{-4}\lambda_{max}
, and the sequence \lambda
is generated based on the range.
Also, to reduce the computation, the variable sampling is performed for the large matrix of X and Y (by default, when the number of the variables is over 1000). Alternatively, the range of lambda can be specified manually.
Usage
rrda.cv(
Y,
X,
maxrank = NULL,
lambda = NULL,
nfold = 5,
folds = NULL,
sample.X = 1000,
sample.Y = 1000,
scale.X = FALSE,
scale.Y = FALSE,
center.X = TRUE,
center.Y = TRUE,
verbose = TRUE
)
Arguments
Y |
A numeric matrix of response variables. |
X |
A numeric matrix of explanatory variables. |
maxrank |
A numeric vector specifying the maximum rank of the coefficient Bhat. Default is |
lambda |
A numeric vector of ridge penalty values. Default is |
nfold |
The number of folds for cross-validation. Default is 5. |
folds |
A vector specifying the folds. Default is |
sample.X |
A number of variables sampled from X for the lamdba range estimate. Default is 1000. |
sample.Y |
A number of variables sampled from Y for the lamdba range estimate. Default is 1000. |
scale.X |
Logical indicating if |
scale.Y |
Logical indicating if |
center.X |
Logical indicating if |
center.Y |
Logical indicating if |
verbose |
Logical indicating. If |
Value
A list containing the cross-validated MSE matrix, lambda values, rank values, and the optimal lambda and rank.
Examples
set.seed(10)
simdata<-rdasim1(n = 10,p = 30,q = 30,k = 3)
X <- simdata$X
Y <- simdata$Y
cv_result<- rrda.cv(Y = Y, X = X, maxrank = 5, nfold = 5)
rrda.summary(cv_result = cv_result)
##Complete Example##
# library(future) # <- if you want to compute in parallel
# plan(multisession) # <- if you want to compute in parallel
# cv_result<- rrda.cv(Y = Y, X = X, maxrank = 5, nfold = 5) # cv
# plan(multisession) # <- To come back to sequential computing
# rrda.summary(cv_result = cv_result) # cv result
p <- rrda.plot(cv_result) # cv result plot
print(p)
h <- rrda.heatmap(cv_result) # cv result heatmao
print(h)
estimated_lambda<-cv_result$opt_min$lambda # selected parameter
estimated_rank<-cv_result$opt_min$rank # selected parameter
Bhat <- rrda.fit(Y = Y, X = X, nrank = estimated_rank,lambda = estimated_lambda) # fitting
Bhat_mat<-rrda.coef(Bhat)
Yhat_mat <- rrda.predict(Bhat = Bhat, X = X) # prediction
Yhat<-Yhat_mat[[1]][[1]][[1]] # predicted values
cor_Y_Yhat<-diag(cor(Y,Yhat)) # correlation
summary(cor_Y_Yhat)