sequence_length_summary_covariate {AnimalSequences} | R Documentation |
Summarize Sequence Lengths by Covariate
Description
This function calculates summary statistics for the lengths of sequences of elements, grouped by a specified covariate. It includes mean, standard deviation, median, minimum, and maximum lengths, along with the number of distinct elements and the p-value comparing to shuffled sequences.
Usage
sequence_length_summary_covariate(sequences, covariate)
Arguments
sequences |
A character vector where each element is a sequence of elements separated by spaces. |
covariate |
A vector of covariates with the same length as 'sequences', used to group the sequences. |
Value
A data frame with the following columns:
covariate |
The value of the covariate. |
mean_seq_elements |
The mean length of sequences for this covariate value. |
sd_seq_elements |
The standard deviation of the sequence lengths for this covariate value. |
median_seq_elements |
The median length of sequences for this covariate value. |
min_seq_elements |
The minimum length of sequences for this covariate value. |
max_seq_elements |
The maximum length of sequences for this covariate value. |
distinct_elements |
The number of distinct elements for this covariate value. |
pvalue_distinct_elements |
The p-value comparing the number of distinct elements to shuffled sequences for this covariate value. |
Examples
sequences <- c('hello world', 'hello world hello', 'hello world hello world')
covariate <- c('A', 'B', 'A')
sequence_length_summary_covariate(sequences, covariate)