lassie {zebu} | R Documentation |
Estimates local (and global) association measures: Ducher's Z and pointwise mutual information, normalized pointwise mutual information and chi-squared residuals.
lassie(x, select, continuous, breaks, measure = "z", default_breaks = 4)
x |
data.frame or matrix. |
select |
optional vector of column numbers or column names specifying a subset of data to be used. By default, uses all columns. |
continuous |
optional vector of column numbers or column names specifying continuous variables that should be discretized. By default, assumes that every variable is categorical. |
breaks |
numeric vector or list passed on to |
measure |
name of measure to be used:
|
default_breaks |
default break points for discretizations.
Same syntax as in |
An instance of S3 class lassie
with
the following objects:
data: raw and preprocessed data.frames (see preprocess).
prob probability arrays (see estimate_prob).
global global association (see local_association).
local local association arrays (see local_association).
lassie_params parameters used in lassie.
Results can be visualized using plot.lassie
and
print.lassie
methods. plot.lassie
is only available
in the bivariate case and returns
a tile plot representing the probability or local association measure matrix.
print.lassie
shows an array or a data.frame.
Results can be saved using write.lassie
.
The permtest
function accesses the significance of local and global
association values using p-values estimated by permutations.
The subgroups
function identifies if the
association between variables is dependent on the value of another variable.
# In this example, we will use the 'mtcars' dataset # Selecting a subset of mtcars. # Takes column names or numbers. # If nothing was specified, all variables would have been used. select <- c('mpg', 'cyl') # or select <- c(1, 2) # Specifying 'mpg' as a continuous variables using column numbers # Takes column names or numbers. # If nothing was specified, all variables would have been used. continuous <- 'mpg' # or continuous <- 1 # How should breaks be specified? # Specifying equal-width discretization with 5 bins for all continuous variables ('mpg') # breaks <- 5 # Specifying user-defined breakpoints for all continuous variables. # breaks <- c(10, 15, 25, 30) # Same thing but only for 'mpg'. # Here both notations are equivalent because 'mpg' is the only continuous variable. # This notation is useful if you wish to specify different break points for different variables # breaks <- list('mpg' = 5) # breaks <- list('mpg' = c(10, 15, 25, 30)) # Calling lassie # Not specifying breaks means that the value in default_breaks (4) will be used. las <- lassie(mtcars, select = c(1, 2), continuous = 1) # Print local association to console as an array print(las) # Print local association and probabilities # Here only rows having a positive local association are printed # The data.frame is also sorted by observed probability print(las, type = 'df', range = c(0, 1), what_sort = 'obs') # Plot results as heatmap plot(las) # Plot observed probabilities using different colors plot(las, what_x = 'obs', low = 'white', mid = 'grey', high = 'black', text_colour = 'red')