runVICatMixVarSel {VICatMix}R Documentation

runVICatMixVarSel

Description

Perform a run of the VICatMixVarSel model on a data-frame including variable selection. Includes an option to include an outcome variable for semi-supervised profile regression.

Usage

runVICatMixVarSel(
  data,
  K,
  alpha,
  a = 2,
  maxiter = 2000,
  tol = 5e-08,
  outcome = NA,
  verbose = FALSE
)

Arguments

data

A data frame or data matrix with N rows of observations, and P columns of covariates.

K

Maximum number of clusters desired. Must be an integer greater than 1.

alpha

The Dirichlet prior parameter. Recommended to set this to a number < 1. Must be > 0.

a

Hyperparameter for variable selection hyperprior. Default is 2.

maxiter

The maximum number of iterations for the algorithm. Default is 2000.

tol

A convergence parameter. Default is 5x10^-8.

outcome

Optional outcome variable. Default is NA; having an outcome triggers semi-supervised profile regression.

verbose

Default FALSE. Set to TRUE to output ELBO values for each iteration.

Value

A list with the following components: (maxNCat refers to the maximum number of categories for any covariate in the data)

labels

A numeric vector listing the cluster assignments for the observations.

ELBO

A numeric vector tracking the value of the ELBO in every iteration.

Cl

A numeric vector tracking the number of clusters in every iteration.

model

A list containing all variational model parameters and the cluster labels:

alpha

A K-length vector of Dirichlet parameters for alpha.

eps

A K x maxNCat x P array of Dirichlet parameters for epsilon.

c

A P-length vector of expected values for the variable selection parameter, gamma.

labels

A numeric vector listing the cluster assignments for the observations.

nullphi

A P x maxNCat matrix of maximum likelihood parameters for irrelevant variables.

rnk

A N x K matrix of responsibilities for the latent variables Z.

factor_labels

A data frame showing how variable categories correspond to numeric factor labels in the model.

Examples

# example code

set.seed(12)
generatedData <- generateSampleDataBin(500, 4, c(0.1, 0.2, 0.3, 0.4), 90, 10)
result <- runVICatMixVarSel(generatedData$data, 10, 0.01)

print(result$labels)
#clustering labels

print(result$c)
#expected values for variable selection parameter; 1 (or close to 1) indicates variable is relevant




[Package VICatMix version 1.0 Index]