performHClustering {doblin}R Documentation

Perform Hierarchical Clustering on Barcoded Lineages

Description

This function performs hierarchical clustering on time-series data representing barcoded lineages. A distance matrix is computed using either Pearson correlation or Dynamic Time Warping (DTW), and hierarchical clustering is applied using a specified agglomeration method. A dendrogram and heatmap are generated for visual inspection. If no threshold is specified, clusters are computed for all possible thresholds between 0.1 and the maximum tree height.

Usage

performHClustering(
  filtered_data,
  agglomeration_method,
  similarity_metric,
  output_directory,
  input_name,
  missing_values = NULL,
  dtw_norm = NULL
)

Arguments

filtered_data

A data frame preprocessed with filterData(), containing filtered lineage frequencies.

agglomeration_method

A character string specifying the agglomeration method (e.g., "ward.D", "complete").

similarity_metric

A character string specifying the similarity metric ("pearson" or "dtw").

output_directory

A string specifying the directory where plots will be saved.

input_name

A string used as the base name for output files (e.g., "replicate1")

missing_values

Optional. A character string specifying how missing values should be handled in Pearson correlation (e.g., "pairwise.complete.obs").

dtw_norm

Optional. A character string specifying the norm to use with DTW distance ("L1" for Manhattan, "L2" for Euclidean). Required if similarity_metric = "dtw".

Value

A data frame with clustering assignments at multiple thresholds (columns named by height).

Examples

# Load demo barcode count data (installed with the package)
demo_file <- system.file("extdata", "demo_input.csv", package = "doblin")
input_dataframe <- readr::read_csv(demo_file, show_col_types = FALSE)

# Filter data to retain dominant and persistent barcodes
filtered_df <- filterData(
  input_df = input_dataframe,
  freq_threshold = 0.00005,
  time_threshold = 5,
  output_directory = tempdir(),
  input_name = "demo"
)

# Perform hierarchical clustering using Pearson correlation
cluster_assignments <- performHClustering(
  filtered_data = filtered_df,
  agglomeration_method = "average",
  similarity_metric = "pearson",
  output_directory = tempdir(),
  input_name = "demo",
  missing_values = "pairwise.complete.obs",
  dtw_norm = NULL
)

[Package doblin version 0.1.1 Index]