plotHCQuantification {doblin} | R Documentation |
Quantify and Visualize Hierarchical Clustering Results
Description
This script contains several functions to help quantify and visualize the results of hierarchical clustering
on barcode time-series data. The main function is plotHCQuantification()
, which computes a LOESS-smoothed
average of barcode frequencies per cluster and evaluates inter-cluster distances across different clustering thresholds.
The melt_dist() function takes a distance matrix and converts it into a long-format data frame where each row corresponds to a unique pair of elements and their associated distance. It essentially "melts" the lower triangle of the matrix into a tidy format, which is useful for plotting or further analysis.
Applies LOESS smoothing to barcode frequencies within each cluster over time, using only the persistent barcodes (those present at the last time point). Clusters are re-ranked within each threshold based on their average final frequency.
Usage
plotHCQuantification(clusters_filtered, output_directory, input_name)
melt_dist(dist, order = NULL, dist_name = "dist")
applyLOESS(clusters_filtered)
Arguments
clusters_filtered |
A data frame filtered by |
output_directory |
A string specifying the directory where plots will be saved. |
input_name |
A string used as the base name for output files (e.g., "replicate1"). |
dist |
A distance matrix (typically a result of a distance computation). |
order |
Optional character vector indicating the order of row/column names to rearrange the matrix before melting. |
dist_name |
A string naming the distance variable in the resulting data frame. Default is "dist". |
Value
No return value. This function saves a plot and a CSV file containing the smallest inter-cluster distances per threshold.
A data frame with columns: iso1
, iso2
, and the specified distance column.
A data frame with smoothed values for each cluster and time point: columns include cluster
, cutoff
, model
, and time
.
Examples
# Load demo barcode count data (installed with the package)
demo_file <- system.file("extdata", "demo_input.csv", package = "doblin")
input_dataframe <- readr::read_csv(demo_file, show_col_types = FALSE)
# Filter data to retain dominant and persistent barcodes
filtered_df <- filterData(
input_df = input_dataframe,
freq_threshold = 0.00005,
time_threshold = 5,
output_directory = tempdir(),
input_name = "demo"
)
# Perform hierarchical clustering using Pearson correlation
cluster_assignments <- performHClustering(
filtered_data = filtered_df,
agglomeration_method = "average",
similarity_metric = "pearson",
output_directory = tempdir(),
input_name = "demo",
missing_values = "pairwise.complete.obs",
dtw_norm = NULL
)
# Filter clusters to retain only those with at least 8 members,
# unless they contain a dominant lineage
filtered_clusters <- filterHC(
series_filtered = filtered_df,
clusters = cluster_assignments,
n_members = 8,
min_freq_ignored_clusters = 0.0001
)
# Quantify and visualize clustering quality across thresholds
plotHCQuantification(
clusters_filtered = filtered_clusters,
output_directory = tempdir(),
input_name = "demo"
)