simulate_spareg_data {spareg} | R Documentation |
Simulate Sparse Regression Data
Description
Generates synthetic data for sparse linear regression problems. Returns training and test sets along with model parameters.
Usage
simulate_spareg_data(
n,
p,
ntest,
a = min(100, p/4),
snr = 10,
rho = 0.5,
mu = 1,
seed = NULL
)
Arguments
n |
Integer. Number of training samples. |
p |
Integer. Number of predictors (features). |
ntest |
Integer. Number of test samples. |
a |
Integer. Number of non-zero coefficients in the true beta vector. Default is min(100, p/4). |
snr |
Numeric. Signal-to-noise ratio. Default is 10. |
rho |
Numeric between 0 and 1. Pairwise correlation coefficient among predictors. Default is 0.5. A compound symmetry correlation matrix is used. The variance of the predictors is fixed to 1. |
mu |
Numeric. Intercept term (mean of response). Default is 1. |
seed |
Integer. Random seed for reproducibility. Default is NULL. |
Value
A list with the following components:
- x
Training design matrix (n x p).
- y
Training response vector (length n).
- xtest
Test design matrix (ntest x p).
- ytest
Test response vector (length ntest).
- mu
Intercept used in data generation.
- beta
True coefficient vector (length p).
- sigma2
Noise variance used in data generation. Equals beta' Sigma beta / snr.
Examples
set.seed(123)
data <- simulate_spareg_data(n = 200, p = 2000, ntest = 100)
str(data)