preprocess {zebu} | R Documentation |
Subroutine called by lassie
. Discretizes, subsets and remove missing data from a data.frame.
preprocess(x, select, continuous, breaks, default_breaks = 4)
x |
data.frame or matrix. |
select |
optional vector of column numbers or column names specifying a subset of data to be used. By default, uses all columns. |
continuous |
optional vector of column numbers or column names specifying continuous variables that should be discretized. By default, assumes that every variable is categorical. |
breaks |
numeric vector or list passed on to |
default_breaks |
default break points for discretizations.
Same syntax as in |
List containing the following values:
raw: raw subsetted data.frame
pp: discretized, subsetted and complete data.frame
select
continuous
breaks
default_breaks
# This is what happens behind the curtains in the 'lassie' function # Here we compute the association between the 'Girth' and 'Height' variables # of the 'trees' dataset # 'select' and 'continuous' take column numbers or names select <- c('Girth', 'Height') # select subset of trees continuous <-c(1, 2) # both 'Girth' and 'Height' are continuous # equal-width discretization with 3 bins breaks <- 3 # Preprocess data: subset, discretize and remove missing data pre <- preprocess(trees, select, continuous, breaks) # Estimates marginal and multivariate probabilities from preprocessed data.frame prob <- estimate_prob(pre$pp) # Computes local and global association using Ducher's Z lam <- local_association(prob, measure = 'z')