simulate_spareg_data {spareg}R Documentation

Simulate Sparse Regression Data

Description

Generates synthetic data for sparse linear regression problems. Returns training and test sets along with model parameters.

Usage

simulate_spareg_data(
  n,
  p,
  ntest,
  a = min(100, p/4),
  snr = 10,
  rho = 0.5,
  mu = 1,
  beta_vals = NULL,
  seed = NULL
)

Arguments

n

Integer. Number of training samples.

p

Integer. Number of predictors (features).

ntest

Integer. Number of test samples.

a

Integer. Number of non-zero coefficients in the true beta vector. Default is min(100, p/4).

snr

Numeric. Signal-to-noise ratio. Default is 10.

rho

Numeric between 0 and 1. Pairwise correlation coefficient among predictors. Default is 0.5. A compound symmetry correlation matrix is used. The variance of the predictors is fixed to 1.

mu

Numeric. Intercept term (mean of response). Default is 1.

beta_vals

Numeric. Possible values for non-zero coefficients in the true beta vector. Default to NULL, in which case the values -3, -2, -1, 1, 2, 3 will be used.

seed

Integer. Random seed for reproducibility. Default is NULL.

Value

A list with the following components:

x

Training design matrix (n x p).

y

Training response vector (length n).

xtest

Test design matrix (ntest x p).

ytest

Test response vector (length ntest).

mu

Intercept used in data generation.

beta

True coefficient vector (length p).

sigma2

Noise variance used in data generation. Equals beta' Sigma beta / snr.

Examples

set.seed(123)
data <- simulate_spareg_data(n = 200, p = 2000, ntest = 100)
str(data)


[Package spareg version 1.1.0 Index]