top_perc {mintyr} | R Documentation |
Select Top Percentage of Data and Statistical Summarization
Description
The top_perc
function selects the top percentage of data based on a specified trait and computes summary statistics.
It allows for grouping by additional columns and offers flexibility in the type of statistics calculated.
The function can also retain the selected data if needed.
Usage
top_perc(data, perc, trait, by = NULL, type = "mean_sd", keep_data = FALSE)
Arguments
data |
A
|
perc |
Numeric vector of percentages for data selection
|
trait |
Character string specifying the 'selection column'
|
by |
Optional character vector for 'grouping columns'
|
type |
Statistical summary type
|
keep_data |
Logical flag for data retention
|
Value
A list or data frame:
If
keep_data
is FALSE, a data frame with summary statistics.If
keep_data
is TRUE, a list where each element is a list containing summary statistics (stat
) and the selected top data (data
).
Note
The
perc
parameter accepts values between -1 and 1. Positive values select the top percentage, while negative values select the bottom percentage.The function performs initial checks to ensure required arguments are provided and valid.
Grouping by additional columns (
by
) is optional and allows for more granular analysis.The
type
parameter specifies the type of summary statistics to compute, with "mean_sd" as the default.If
keep_data
is set to TRUE, the function will return both the summary statistics and the selected top data for each percentage.
See Also
-
rstatix::get_summary_stats()
Statistical summary computation -
dplyr::top_frac()
Percentage-based data selection
Examples
# Example 1: Basic usage with single trait
# This example selects the top 10% of observations based on Petal.Width
# keep_data=TRUE returns both summary statistics and the filtered data
top_perc(iris,
perc = 0.1, # Select top 10%
trait = c("Petal.Width"), # Column to analyze
keep_data = TRUE) # Return both stats and filtered data
# Example 2: Using grouping with 'by' parameter
# This example performs the same analysis but separately for each Species
# Returns nested list with stats and filtered data for each group
top_perc(iris,
perc = 0.1, # Select top 10%
trait = c("Petal.Width"), # Column to analyze
by = "Species") # Group by Species
# Example 3: Complex example with multiple percentages and grouping variables
# Reshape data from wide to long format for Sepal.Length and Sepal.Width
iris |>
tidyr::pivot_longer(1:2,
names_to = "names",
values_to = "values") |>
mintyr::top_perc(
perc = c(0.1, -0.2),
trait = "values",
by = c("Species", "names"),
type = "mean_sd")