prep_training_data {veesa}R Documentation

Align training data and apply a method of elastic fPCA

Description

Applies steps 2 and 3 of the VEESA pipeline (alignment and elastic fPCA) to the training data in preparation for inputting the data to the model in step 4.

Usage

prep_training_data(
  f,
  time,
  fpca_method,
  lambda = 0,
  penalty_method = c("roughness", "geodesic", "norm"),
  centroid_type = c("mean", "median"),
  center_warpings = TRUE,
  parallel = FALSE,
  cores = -1,
  optim_method = c("DP", "DPo", "DP2", "RBFGS"),
  max_iter = 20L,
  id = NULL,
  C = NULL,
  ci = c(-2, -1, 0, 1, 2)
)

Arguments

f

Matrix (size M x N) of training data with N functions and M samples.

time

Vector of size M corresponding to the M sample points.

fpca_method

Character string specifying the type of elastic fPCA method to use. Options are 'jfpca', 'hfpca', or 'vfpca'.

lambda

Numeric value specifying the elasticity. Default is 0.

penalty_method

String specifying the penalty term used in the formulation of the cost function to minimize for alignment. Choices are "roughness" which uses the norm of the second derivative, "geodesic" which uses the geodesic distance to the identity and "norm" which uses the Euclidean distance to the identity. Defaults is "roughness".

centroid_type

String specifying the type of centroid to align to. Options are "mean" or "median". Defaults is "mean".

center_warpings

Boolean specifying whether to center the estimated warping functions. Defaults is TRUE.

parallel

Boolean specifying whether to run calculations in parallel. Defaults is FALSE.

cores

Integer specifying the number of cores in parallel. Default is -1, which uses all cores.

optim_method

Method used for optimization when computing the Karcher mean. Options are "DP", "DPo", and "RBFGS".

max_iter

An integer value specifying the maximum number of iterations. Defaults to 20L.

id

Integration point for f0. Default is midpoint.

C

Balance value. Default = NULL.

ci

Geodesic standard deviations to be computed. Default is c(-2, -1, 0, 1, 2).

Value

List with three objects:

Examples

# Load packages
library(dplyr)
library(tidyr)

# Select a subset of functions from shifted peaks data
sub_ids <-
  shifted_peaks$data |>
  select(data, group, id) |>
  distinct() |>
  group_by(data, group) |>
  slice(1:4) |>
  ungroup()

# Create a smaller version of shifted data
shifted_peaks_sub <-
  shifted_peaks$data |>
  filter(id %in% sub_ids$id)

# Extract times
shifted_peaks_times = unique(shifted_peaks_sub$t)

# Convert training data to matrix
shifted_peaks_train_matrix <-
  shifted_peaks_sub |>
  filter(data == "Training") |>
  select(-t) |>
  mutate(index = paste0("t", index)) |>
  pivot_wider(names_from = index, values_from = y) |>
  select(-data, -id, -group) |>
  as.matrix() |>
  t()

# Obtain veesa pipeline training data
veesa_train <-
  prep_training_data(
    f = shifted_peaks_train_matrix,
    time = shifted_peaks_times,
    fpca_method = "jfpca"
  )

[Package veesa version 0.1.6 Index]