cat_glm_initialization {catalytic}R Documentation

Initialization for Catalytic Generalized Linear Models (GLMs)

Description

This function prepares and initializes a catalytic Generalized Linear Models (GLMs) by processing input data, extracting necessary variables, generating synthetic datasets, and fitting a model.

Usage

cat_glm_initialization(
  formula,
  family = "gaussian",
  data,
  syn_size = NULL,
  custom_variance = NULL,
  gaussian_known_variance = FALSE,
  x_degree = NULL,
  resample_only = FALSE,
  na_replace = stats::na.omit
)

Arguments

formula

A formula specifying the GLMs. Should include response and predictor variables.

family

The type of GLM family. Defaults to Gaussian.

data

A data frame containing the data for modeling.

syn_size

An integer specifying the size of the synthetic dataset to be generated. Default is four times the number of predictor columns.

custom_variance

A custom variance value to be applied if using a Gaussian model. Defaults to NULL.

gaussian_known_variance

A logical value indicating whether the data variance is known. Defaults to FALSE. Only applicable to Gaussian family.

x_degree

A numeric vector indicating the degree for polynomial expansion of predictors. Default is 1 for each predictor.

resample_only

A logical indicating whether to perform resampling only. Default is FALSE.

na_replace

A function to handle NA values in the data. Default is stats::na.omit.

Value

A list containing the values of all the input arguments and the following components:

Examples

gaussian_data <- data.frame(
  X1 = stats::rnorm(10),
  X2 = stats::rnorm(10),
  Y = stats::rnorm(10)
)

cat_init <- cat_glm_initialization(
  formula = Y ~ 1, # formula for simple model
  data = gaussian_data,
  syn_size = 100, # Synthetic data size
  custom_variance = NULL, # User customized variance value
  gaussian_known_variance = TRUE, # Indicating whether the data variance is known
  x_degree = c(1, 1), # Degrees for polynomial expansion of predictors
  resample_only = FALSE, # Whether to perform resampling only
  na_replace = stats::na.omit # How to handle NA values in data
)
cat_init

[Package catalytic version 0.1.0 Index]