mfkml {FKmL} | R Documentation |
Multidimensional Fréchet Distance-Based K-means for Longitudinal Data
Description
Extends kmlShape
to multidimensional (p \ge
2) longitudinal data.
It performs scale adjustment and trajectory alignment across all variables prior to clustering
to reduce distortions caused by differences in time grids and amplitude scales.
When variables exhibit substantially different ranges, standardization is required to prevent any single variable
from disproportionately influencing the clustering outcome.
The clustering process follows an iterative K-means framework, where cluster assignments are updated based on Fréchet distances. Cluster centers are computed using the weighted Fréchet mean, which accounts for variable weights assigned to individual trajectories. This allows the mean to be adjusted according to the relative importance of each trajectory in the clustering process.
Usage
mfkml(dt, clt_n, scales, weight, maxIter = 50)
Arguments
dt |
A long-format data.frame containing the following columns in the specified order:
|
clt_n |
An integer specifying the number of clusters.
The number of unique trajectories must be greater than or equal to |
scales |
A numeric vector used for scaling the time and variable columns. The length of |
weight |
Specifies the weights used for calculating the weighted Fréchet mean. It can take one of the following forms:
|
maxIter |
The maximum number of iterations allowed before stopping if convergence is not reached. The default value is 50. |
Details
The input dataset (dt
) must contain only numeric values (except for the ID column)
and must not include any missing values.
Each variable should be measured at least three times per trajectory,
since the method relies on trajectory shapes.
Two observations per trajectory are insufficient to capture shape trends (e.g., increasing, decreasing, or stable).
Because the Fréchet distance is sensitive to measurement units, proper scaling is essential when applying the mfkml
function.
The scales
vector contains scaling factors for time and each variable,
which are used to rescale the corresponding columns.
This scaling prevents distortion due to differences in the units of time and variables,
allowing for more accurate shape-based comparisons.
This function involves random sampling internally.
For reproducible results, set the random seed before calling the function using set.seed()
.
Value
A list with the following components:
Cluster
A data.frame containing the
ID
andCluster
columns, which indicate the final cluster assignment for each trajectory.Center
A data.frame representing the final cluster centers, with columns for the cluster IDs, time points, and variable values.
Iteration
The number of iterations the algorithm performed before reaching convergence.