Dat_Tree {fusedTree} | R Documentation |
Construct design data used for fitting fusedTree models
Description
Prepares the full data design used to fit a fusedTree model, including dummy-encoded clinical leaf node indicators, optional continuous clinical variables, and a block-diagonal omics matrix structured per tree node.
Usage
Dat_Tree(Tree, X, Z, LinVars = TRUE)
Arguments
Tree |
A fitted tree object, created using |
X |
A numeric omics data matrix with dimensions
(sample size × number of omics variables). Must be a |
Z |
A |
LinVars |
Logical. Whether to include continuous clinical variables
linearly in the model (in addition to tree clustering). Recommended,
as trees may not capture linear effects well. Defaults to |
Details
This function allows users to inspect the exact data structure used in
fusedTree model fitting. The PenOpt()
and fusedTreeFit()
functions call this function internally so no need to call this function
to set-up the right data format. It is just meant for users to check what
is going on.
Value
A list with the following components:
- Clinical
A matrix encoding the clinical structure:
Dummy variables representing membership to leaf nodes of the tree,
Continuous clinical covariates (if
LinVars = TRUE
).
Each row corresponds to a sample.
- Omics
A matrix of omics data per leaf node. This matrix has dimensions: sample size × (number of leaf nodes × number of omics variables). For each observation, only the block of omics variables corresponding to its tree node is populated (other blocks are set to zero).
#'
Examples
p = 5 # number of omics variables (low for illustration)
p_Clin = 5 # number of clinical variables
N = 100 # sample size
# simulate from Friedman-like function
g <- function(z) {
15 * sin(pi * z[,1] * z[,2]) + 10 * (z[,3] - 0.5)^2 + 2 * exp(z[,4]) + 2 * z[,5]
}
Z <- as.data.frame(matrix(runif(N * p_Clin), nrow = N))
X <- matrix(rnorm(N * p), nrow = N) # omics data
betas <- c(1,-1,3,4,2) # omics effects
Y <- g(Z) + X %*% betas + rnorm(N) # continuous outcome
Y <- as.vector(Y)
dat = cbind.data.frame(Y, Z) #set-up data correctly for rpart
library(rpart)
rp <- rpart::rpart(Y ~ ., data = dat,
control = rpart::rpart.control(xval = 5, minbucket = 10),
model = TRUE)
cp = rp$cptable[,1][which.min(rp$cptable[,4])] # best model according to pruning
Treefit <- rpart::prune(rp, cp = cp)
plot(Treefit)
Dat_fusedTree <- Dat_Tree(Tree = Treefit, X = X, Z = Z, LinVars = FALSE)
Omics <- Dat_fusedTree$Omics
Clinical <- Dat_fusedTree$Clinical