performHClustering {doblin} | R Documentation |
Perform Hierarchical Clustering on Barcoded Lineages
Description
This function performs hierarchical clustering on time-series data representing barcoded lineages. A distance matrix is computed using either Pearson correlation or Dynamic Time Warping (DTW), and hierarchical clustering is applied using a specified agglomeration method. A dendrogram and heatmap are generated for visual inspection. If no threshold is specified, clusters are computed for all possible thresholds between 0.1 and the maximum tree height.
Usage
performHClustering(
filtered_data,
agglomeration_method,
similarity_metric,
output_directory,
input_name,
missing_values = NULL,
dtw_norm = NULL
)
Arguments
filtered_data |
A data frame preprocessed with |
agglomeration_method |
A character string specifying the agglomeration method (e.g., |
similarity_metric |
A character string specifying the similarity metric ( |
output_directory |
A string specifying the directory where plots will be saved. |
input_name |
A string used as the base name for output files (e.g., "replicate1") |
missing_values |
Optional. A character string specifying how missing values should be handled in Pearson correlation (e.g., |
dtw_norm |
Optional. A character string specifying the norm to use with DTW distance ("L1" for Manhattan, "L2" for Euclidean).
Required if |
Value
A data frame with clustering assignments at multiple thresholds (columns named by height).
Examples
# Load demo barcode count data (installed with the package)
demo_file <- system.file("extdata", "demo_input.csv", package = "doblin")
input_dataframe <- readr::read_csv(demo_file, show_col_types = FALSE)
# Filter data to retain dominant and persistent barcodes
filtered_df <- filterData(
input_df = input_dataframe,
freq_threshold = 0.00005,
time_threshold = 5,
output_directory = tempdir(),
input_name = "demo"
)
# Perform hierarchical clustering using Pearson correlation
cluster_assignments <- performHClustering(
filtered_data = filtered_df,
agglomeration_method = "average",
similarity_metric = "pearson",
output_directory = tempdir(),
input_name = "demo",
missing_values = "pairwise.complete.obs",
dtw_norm = NULL
)