sim.data.ppls {ppls}R Documentation

Simulate Data for Penalized Partial Least Squares (PPLS)

Description

Generates a training and test dataset with non-linear relationships between predictors and response, as used in PPLS simulation studies.

Usage

sim.data.ppls(ntrain, ntest, stnr, p, a = NULL, b = NULL)

Arguments

ntrain

Integer. Number of training observations.

ntest

Integer. Number of test observations.

stnr

Numeric. Signal-to-noise ratio (higher means less noise).

p

Integer. Number of predictors (must be >= 5).

a

Optional numeric vector of length 5. Linear coefficients for the first 5 variables. If NULL, drawn uniformly from [-1, 1].

b

Optional numeric vector of length 5. Nonlinear sine coefficients. If NULL, drawn uniformly from [-1, 1].

Details

The function simulates a response variable y as a combination of additive linear and sinusoidal effects of the first 5 predictors:

f(x) = \sum_{j=1}^{5} a_j x_j + \sin(6 b_j x_j)

The response y is then generated by adding Gaussian noise scaled to match the specified signal-to-noise ratio (stnr).

Remaining variables (p - 5) are included as noise variables, making the dataset suitable to evaluate selection or regularization methods.

Value

A list with the following components:

Xtrain

ntrain x p matrix of training predictors (uniform in [-1, 1]).

ytrain

Numeric vector of training responses.

Xtest

ntest x p matrix of test predictors.

ytest

Numeric vector of test responses.

sigma

Standard deviation of the added noise.

a

Linear coefficients used in the simulation.

b

Nonlinear sine coefficients used in the simulation.

See Also

ppls.splines.cv, graphic.ppls.splines

Examples

set.seed(123)
sim <- sim.data.ppls(ntrain = 100, ntest = 100, stnr = 3, p = 10)
str(sim)
plot(sim$Xtrain[, 1], sim$ytrain, main = "Effect of x1 on y")


[Package ppls version 2.0.0 Index]