generateSampleDataBin {VICatMix}R Documentation

generateSampleDataBin

Description

Generate sample clustered binary data with cluster labels. The probability of a '1' in each cluster for each variable is randomly generated via a Beta(1, 5) distribution, encouraging sparse probabilities which vary across clusters. For noisy variables, the probability of a '1' is also generated by a Beta(1, 5) distribution but this probability is the same regardless of the cluster membership of the observation.

Usage

generateSampleDataBin(n, K, w, p, Irrp, yout = FALSE)

Arguments

n

Number of observations in dataset.

K

Number of clusters desired.

w

A vector of mixture weights (proportion of population in each cluster).

p

Number of clustering variables/covariates in dataset.

Irrp

Number of irrelevant/noisy variables/covariates in dataset. Note that these variables will be the final Irrp columns in the simulated dataset. Total data dimension is p + Irrp.

yout

Default FALSE. Indicate whether a binary outcome associated with clustering is required.

Value

A list with the following components:

data

A matrix consisting of the simulated data.

trueClusters

A vector with the simulated cluster assignments.

outcome

If yout = TRUE, this will be a vector with the outcome variable.

Examples

# example code
generatedData <- generateSampleDataBin(1000, 4, c(0.1, 0.2, 0.3, 0.4), 100, 0)


[Package VICatMix version 1.0 Index]