generateSampleDataBin {VICatMix} | R Documentation |
generateSampleDataBin
Description
Generate sample clustered binary data with cluster labels. The probability of a '1' in each cluster for each variable is randomly generated via a Beta(1, 5) distribution, encouraging sparse probabilities which vary across clusters. For noisy variables, the probability of a '1' is also generated by a Beta(1, 5) distribution but this probability is the same regardless of the cluster membership of the observation.
Usage
generateSampleDataBin(n, K, w, p, Irrp, yout = FALSE)
Arguments
n |
Number of observations in dataset. |
K |
Number of clusters desired. |
w |
A vector of mixture weights (proportion of population in each cluster). |
p |
Number of clustering variables/covariates in dataset. |
Irrp |
Number of irrelevant/noisy variables/covariates in dataset. Note that these variables will be the final Irrp columns in the simulated dataset. Total data dimension is p + Irrp. |
yout |
Default FALSE. Indicate whether a binary outcome associated with clustering is required. |
Value
A list with the following components:
data |
A matrix consisting of the simulated data. |
trueClusters |
A vector with the simulated cluster assignments. |
outcome |
If yout = TRUE, this will be a vector with the outcome variable. |
Examples
# example code
generatedData <- generateSampleDataBin(1000, 4, c(0.1, 0.2, 0.3, 0.4), 100, 0)