rrda.cv {rrda}R Documentation

Cross-validation for Ridge Redundancy Analysis

Description

This function performs cross-validation to evaluate the performance of Ridge Redundancy Analysis (RDA) models. It calculates the mean squared error (MSE) for different ranks and ridge penalty values through cross-validation folds. The function also supports centering and scaling of the input matrices.

The range of lambda for the cross-validation is automatically calculated following the method of "glmnet" (Friedman et al., 2010). When we have a matrix of response variables (Y; n times q matrix) and a matrix of explanatory variables (X; n times p matrix), the largest lambda for the validation is obtained as follows

\lambda_{\text{max}} = \frac{\max_{j \in \{1, 2, \dots, p\}} \sqrt{\sum_{k=1}^{q} \left( \sum_{i=1}^{n} (x_{ij}\cdot y_{ik}) \right)^2}}{N \times 10^{-3}}

Then, we define \lambda_{min}=10^{-4}\lambda_{max}, and the sequence \lambda is generated based on the range.

Also, to reduce the computation, the variable sampling is performed for the large matrix of X and Y (by default, when the number of the variables is over 1000). Alternatively, the range of lambda can be specified manually.

Usage

rrda.cv(
  Y,
  X,
  maxrank = NULL,
  lambda = NULL,
  nfold = 5,
  folds = NULL,
  sample.X = 1000,
  sample.Y = 1000,
  scale.X = FALSE,
  scale.Y = FALSE,
  center.X = TRUE,
  center.Y = TRUE,
  verbose = TRUE
)

Arguments

Y

A numeric matrix of response variables.

X

A numeric matrix of explanatory variables.

maxrank

A numeric vector specifying the maximum rank of the coefficient Bhat. Default is NULL, which sets it to (min(15, min(dim(X), dim(Y)))).

lambda

A numeric vector of ridge penalty values. Default is NULL, where the lambda values are automatically chosen.

nfold

The number of folds for cross-validation. Default is 5.

folds

A vector specifying the folds. Default is NULL, which randomly assigns folds.

sample.X

A number of variables sampled from X for the lamdba range estimate. Default is 1000.

sample.Y

A number of variables sampled from Y for the lamdba range estimate. Default is 1000.

scale.X

Logical indicating if X should be scaled. If TRUE, scales X. Default is FALSE.

scale.Y

Logical indicating if Y should be scaled. If TRUE, scales Y. Default is FALSE.

center.X

Logical indicating if X should be centered. If TRUE, scales X. Default is TRUE.

center.Y

Logical indicating if Y should be centered. If TRUE, scales Y. Default is TRUE.

verbose

Logical indicating. If TRUE, the function displays information about the function call. Default is TRUE.

Value

A list containing the cross-validated MSE matrix, lambda values, rank values, and the optimal lambda and rank.

Examples


set.seed(10)
simdata<-rdasim1(n = 10,p = 30,q = 30,k = 3)
X <- simdata$X
Y <- simdata$Y
cv_result<- rrda.cv(Y = Y, X = X, maxrank = 5, nfold = 5)
rrda.summary(cv_result = cv_result)

##Complete Example##
# library(future) # <- if you want to compute in parallel

# plan(multisession) # <- if you want to compute in parallel
# cv_result<- rrda.cv(Y = Y, X = X, maxrank = 5, nfold = 5) # cv
# plan(multisession) # <- To come back to sequential computing

# rrda.summary(cv_result = cv_result) # cv result

p <- rrda.plot(cv_result) # cv result plot
print(p)
h <- rrda.heatmap(cv_result) # cv result heatmao
print(h)

estimated_lambda<-cv_result$opt_min$lambda  # selected parameter
estimated_rank<-cv_result$opt_min$rank # selected parameter

Bhat <- rrda.fit(Y = Y, X = X, nrank = estimated_rank,lambda = estimated_lambda) # fitting
Bhat_mat<-rrda.coef(Bhat)
Yhat_mat <- rrda.predict(Bhat = Bhat, X = X) # prediction
Yhat<-Yhat_mat[[1]][[1]][[1]] # predicted values

cor_Y_Yhat<-diag(cor(Y,Yhat)) # correlation
summary(cor_Y_Yhat)

[Package rrda version 0.1.1 Index]