SFclust.permute {FKmL}R Documentation

Perform Permutation-Based Clustering Evaluation for SFclust

Description

Performs a permutation-based analysis to evaluate clustering results across different values of the \ell_1 norm constraint (s). This function is designed to help determine the most appropriate \ell_1 norm value by comparing the observed clustering outcome with those obtained under random permutations.

The function computes gap statistics for each \ell_1 norm constraint value based on permuted versions of the input distance array, and identifies the optimal s as the one maximizing the gap statistic. Two ggplot objects are returned to visualize the gap patterns.

Usage

SFclust.permute(dist.ary, k, nperms, l1b)

Arguments

dist.ary

A 3-dimensional distance array representing pairwise distances between trajectories across multiple variables. Follows the same format used in SFclust.

k

An integer specifying the number of clusters.

nperms

An integer specifying the number of permutations to perform.

l1b

A numeric vector of \ell_1 norm constraint values to test during clustering. These values control the sparsity of the weights during clustering.

Details

This function helps assess the robustness of clustering structure and select an optimal level of sparsity. If any clustering attempt fails (e.g., due to convergence issues or weight update errors), the corresponding l1b values are reported in failed_l1b and failed_j. This function returns two ggplot objects (gapplot.l1b and gapplot.nnz) that can be used to visualize the gap statistics. These are not automatically printed, allowing users to decide when and how to display them. This function involves random sampling internally. For reproducible results, set the random seed before calling the function using set.seed().

Value

A list containing the following components:

totss

A numeric vector of total within-cluster sum of squared distances for each \ell_1 norm value.

permtotss

A matrix of total sum of squared distances for each permutation and each \ell_1 norm value.

nnonzerowss

A numeric vector of the number of nonzero weights for each \ell_1 norm value.

gaps

A numeric vector of gap statistics: the difference between observed and permuted clustering results.

sdgaps

A numeric vector of standard deviations of the gaps across permutations.

l1bounds

A vector of \ell_1 norm constraint values that were successfully processed without error.

bestl1b

The \ell_1 norm constraint value that yielded the largest gap.

failed_j

Indices of l1b values that caused errors during the clustering process.

failed_l1b

The actual \ell_1 norm values that caused errors.

gapplot.l1b

A ggplot object showing the gap statistics plotted against \ell_1 norm constraint values.

gapplot.nnz

A ggplot object showing the gap statistics plotted against the number of nonzero weights.


[Package FKmL version 0.1.1 Index]