simulate_spareg_data {spareg}R Documentation

Simulate Sparse Regression Data

Description

Generates synthetic data for sparse linear regression problems. Returns training and test sets along with model parameters.

Usage

simulate_spareg_data(
  n,
  p,
  ntest,
  a = min(100, p/4),
  snr = 10,
  rho = 0.5,
  mu = 1,
  seed = NULL
)

Arguments

n

Integer. Number of training samples.

p

Integer. Number of predictors (features).

ntest

Integer. Number of test samples.

a

Integer. Number of non-zero coefficients in the true beta vector. Default is min(100, p/4).

snr

Numeric. Signal-to-noise ratio. Default is 10.

rho

Numeric between 0 and 1. Pairwise correlation coefficient among predictors. Default is 0.5. A compound symmetry correlation matrix is used. The variance of the predictors is fixed to 1.

mu

Numeric. Intercept term (mean of response). Default is 1.

seed

Integer. Random seed for reproducibility. Default is NULL.

Value

A list with the following components:

x

Training design matrix (n x p).

y

Training response vector (length n).

xtest

Test design matrix (ntest x p).

ytest

Test response vector (length ntest).

mu

Intercept used in data generation.

beta

True coefficient vector (length p).

sigma2

Noise variance used in data generation. Equals beta' Sigma beta / snr.

Examples

set.seed(123)
data <- simulate_spareg_data(n = 200, p = 2000, ntest = 100)
str(data)


[Package spareg version 1.0.0 Index]