sequence_stats {MSCA} | R Documentation |
Compute sequence statistics
Description
Computes descriptive statistics for sequences, including sequence frequency for any sequence length, and conditional probability and relative risk for sequences of length 2 (pairwise transitions).
Usage
sequence_stats(
seq_data,
min_seq_freq = 0.01,
min_conditional_prob = 0,
min_relative_risk = 0,
forward = TRUE
)
Arguments
seq_data |
A list of data frames containing sequences, must be the output of |
min_seq_freq |
Numeric threshold (default = 0.01). Filters out sequences with relative frequency below this value. |
min_conditional_prob |
Numeric threshold (default = 0). Applies only for pairwise sequences ( |
min_relative_risk |
Numeric threshold (default = 0). Applies only for pairwise sequences ( |
forward |
If |
Details
For k = 2
, the function computes:
-
seq_freq: Proportion of all sequences that match the pair
-
conditional_prob: P(to | from)
-
relative_risk: conditional probability divided by the marginal probability of
to
For k > 2
, only seq_freq
is computed.
Value
A list of data frames, each containing the sequence statistics for one cluster.
See Also
get_cluster_sequences