sim.data.ppls {ppls} | R Documentation |
Simulate Data for Penalized Partial Least Squares (PPLS)
Description
Generates a training and test dataset with non-linear relationships between predictors and response, as used in PPLS simulation studies.
Usage
sim.data.ppls(ntrain, ntest, stnr, p, a = NULL, b = NULL)
Arguments
ntrain |
Integer. Number of training observations. |
ntest |
Integer. Number of test observations. |
stnr |
Numeric. Signal-to-noise ratio (higher means less noise). |
p |
Integer. Number of predictors (must be |
a |
Optional numeric vector of length 5. Linear coefficients for the first 5 variables. If |
b |
Optional numeric vector of length 5. Nonlinear sine coefficients. If |
Details
The function simulates a response variable y
as a combination of additive linear and sinusoidal effects of the first 5 predictors:
f(x) = \sum_{j=1}^{5} a_j x_j + \sin(6 b_j x_j)
The response y
is then generated by adding Gaussian noise scaled to match the specified signal-to-noise ratio (stnr).
Remaining variables (p - 5
) are included as noise variables, making the dataset suitable to evaluate selection or regularization methods.
Value
A list with the following components:
- Xtrain
ntrain x p
matrix of training predictors (uniform in[-1, 1]
).- ytrain
Numeric vector of training responses.
- Xtest
ntest x p
matrix of test predictors.- ytest
Numeric vector of test responses.
- sigma
Standard deviation of the added noise.
- a
Linear coefficients used in the simulation.
- b
Nonlinear sine coefficients used in the simulation.
See Also
ppls.splines.cv
, graphic.ppls.splines
Examples
set.seed(123)
sim <- sim.data.ppls(ntrain = 100, ntest = 100, stnr = 3, p = 10)
str(sim)
plot(sim$Xtrain[, 1], sim$ytrain, main = "Effect of x1 on y")