mfkml {FKmL}R Documentation

Multidimensional Fréchet Distance-Based K-means for Longitudinal Data

Description

Extends kmlShape to multidimensional (p \ge 2) longitudinal data. It performs scale adjustment and trajectory alignment across all variables prior to clustering to reduce distortions caused by differences in time grids and amplitude scales. When variables exhibit substantially different ranges, standardization is required to prevent any single variable from disproportionately influencing the clustering outcome.

The clustering process follows an iterative K-means framework, where cluster assignments are updated based on Fréchet distances. Cluster centers are computed using the weighted Fréchet mean, which accounts for variable weights assigned to individual trajectories. This allows the mean to be adjusted according to the relative importance of each trajectory in the clustering process.

Usage

mfkml(dt, clt_n, scales, weight, maxIter = 50)

Arguments

dt

A long-format data.frame containing the following columns in the specified order:

  • ID: An identifier for each trajectory.

  • Time: The time points at which measurements were recorded (numeric or integer vector).

  • Variable1, Variable2, ... : The measured variables over time (numeric values). The data.frame should not include any missing values. See 'Details' for structure requirements.

clt_n

An integer specifying the number of clusters. The number of unique trajectories must be greater than or equal to clt_n.

scales

A numeric vector used for scaling the time and variable columns. The length of scales must be equal to ncol(dt) - 1, where each value in scales corresponds to the scaling factor for the respective column (excluding the ID column). See 'Details' for structure requirements.

weight

Specifies the weights used for calculating the weighted Fréchet mean. It can take one of the following forms:

  • A data.frame with two columns: ID and Weight, where each Weight value indicates the importance of the corresponding trajectory.

  • A numeric value of 1, indicating equal weights for all trajectories. See 'Details' for structure requirements.

maxIter

The maximum number of iterations allowed before stopping if convergence is not reached. The default value is 50.

Details

The input dataset (dt) must contain only numeric values (except for the ID column) and must not include any missing values. Each variable should be measured at least three times per trajectory, since the method relies on trajectory shapes. Two observations per trajectory are insufficient to capture shape trends (e.g., increasing, decreasing, or stable).

Because the Fréchet distance is sensitive to measurement units, proper scaling is essential when applying the mfkml function. The scales vector contains scaling factors for time and each variable, which are used to rescale the corresponding columns. This scaling prevents distortion due to differences in the units of time and variables, allowing for more accurate shape-based comparisons.

This function involves random sampling internally. For reproducible results, set the random seed before calling the function using set.seed().

Value

A list with the following components:

Cluster

A data.frame containing the ID and Cluster columns, which indicate the final cluster assignment for each trajectory.

Center

A data.frame representing the final cluster centers, with columns for the cluster IDs, time points, and variable values.

Iteration

The number of iterations the algorithm performed before reaching convergence.


[Package FKmL version 0.1.1 Index]