ppls.splines.cv {ppls}R Documentation

Cross-Validation for Penalized PLS with Spline-Transformed Predictors

Description

Performs cross-validation to select the optimal number of components and penalization parameter for a penalized partial least squares model (PPLS) fitted to spline-transformed predictors.

Usage

ppls.splines.cv(
  X,
  y,
  lambda = 1,
  ncomp = NULL,
  degree = 3,
  order = 2,
  nknot = NULL,
  k = 5,
  kernel = FALSE,
  scale = FALSE,
  reduce.knots = FALSE,
  select = FALSE
)

Arguments

X

A numeric matrix of input predictors.

y

A numeric response vector.

lambda

A numeric vector of penalty parameters. Default is 1.

ncomp

Integer. Maximum number of PLS components. Default is min(nrow(X) - 1, ncol(X)).

degree

Integer. Degree of B-splines (e.g., 3 for cubic splines). Default is 3.

order

Integer. Order of the differences used in the penalty matrix. Default is 2.

nknot

Integer or vector. Number of knots per variable (before adjustment). If NULL, defaults to rep(20, ncol(X)).

k

Number of folds for cross-validation. Default is 5.

kernel

Logical. Whether to use the kernel representation of PPLS. Default is FALSE.

scale

Logical. Whether to standardize predictors to unit variance. Default is FALSE.

reduce.knots

Logical. If TRUE, adaptively reduces the number of knots when overfitting is detected. Default is FALSE.

select

Logical. If TRUE, applies block-wise variable selection. Default is FALSE.

Details

This function performs the following steps for each cross-validation fold:

  1. Transforms predictors using B-spline basis functions via X2s.

  2. Computes the penalty matrix using Penalty.matrix.

  3. Fits a penalized PLS model using penalized.pls with the given lambda and number of components.

  4. Evaluates prediction performance on the test fold using new.penalized.pls.

The optimal parameters are those minimizing the average squared prediction error across all folds.

Value

A list with the following components:

error.cv

Matrix of prediction errors: rows = lambda values, columns = components.

min.ppls

The minimum cross-validated error.

lambda.opt

Optimal lambda value.

ncomp.opt

Optimal number of components.

References

N. Kraemer, A.-L. Boulesteix, and G. Tutz (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94(1), 60–69. doi:10.1016/j.chemolab.2008.06.009

See Also

X2s, Penalty.matrix, penalized.pls, penalized.pls.cv

Examples

# Simulated data
set.seed(123)
X <- matrix(rnorm(30 * 100), ncol = 30)
y <- rnorm(100)

# Run CV with 3 lambdas and max 4 components
result <- ppls.splines.cv(X, y, lambda = c(1, 10, 100), ncomp = 4)
result$lambda.opt
result$ncomp.opt


[Package ppls version 2.0.0 Index]