runVICatMixVarSel {VICatMix} | R Documentation |
runVICatMixVarSel
Description
Perform a run of the VICatMixVarSel model on a data-frame including variable selection. Includes an option to include an outcome variable for semi-supervised profile regression.
Usage
runVICatMixVarSel(
data,
K,
alpha,
a = 2,
maxiter = 2000,
tol = 5e-08,
outcome = NA,
verbose = FALSE
)
Arguments
data |
A data frame or data matrix with N rows of observations, and P columns of covariates. |
K |
Maximum number of clusters desired. Must be an integer greater than 1. |
alpha |
The Dirichlet prior parameter. Recommended to set this to a number < 1. Must be > 0. |
a |
Hyperparameter for variable selection hyperprior. Default is 2. |
maxiter |
The maximum number of iterations for the algorithm. Default is 2000. |
tol |
A convergence parameter. Default is 5x10^-8. |
outcome |
Optional outcome variable. Default is NA; having an outcome triggers semi-supervised profile regression. |
verbose |
Default FALSE. Set to TRUE to output ELBO values for each iteration. |
Value
A list with the following components: (maxNCat refers to the maximum number of categories for any covariate in the data)
labels |
A numeric vector listing the cluster assignments for the observations. |
ELBO |
A numeric vector tracking the value of the ELBO in every iteration. |
Cl |
A numeric vector tracking the number of clusters in every iteration. |
model |
A list containing all variational model parameters and the cluster labels:
|
factor_labels |
A data frame showing how variable categories correspond to numeric factor labels in the model. |
Examples
# example code
set.seed(12)
generatedData <- generateSampleDataBin(500, 4, c(0.1, 0.2, 0.3, 0.4), 90, 10)
result <- runVICatMixVarSel(generatedData$data, 10, 0.01)
print(result$labels)
#clustering labels
print(result$c)
#expected values for variable selection parameter; 1 (or close to 1) indicates variable is relevant