count_n {spicy} | R Documentation |
Row-wise Count of Specific or Special Values
Description
count_n()
counts, for each row of a data frame or matrix, how many times one or more values appear across selected columns.
It supports type-safe comparison, case-insensitive string matching, and detection of special values such as NA
, NaN
, Inf
, and -Inf
.
Usage
count_n(
data = NULL,
select = tidyselect::everything(),
exclude = NULL,
count = NULL,
special = NULL,
allow_coercion = TRUE,
ignore_case = FALSE,
regex = FALSE,
verbose = FALSE
)
Arguments
data |
A data frame or matrix. Optional inside |
select |
Columns to include. Uses tidyselect helpers like |
exclude |
Character vector of column names to exclude after selection. |
count |
Value(s) to count. Ignored if |
special |
Character vector of special values to count: |
allow_coercion |
Logical (default |
ignore_case |
Logical (default |
regex |
Logical (default |
verbose |
Logical (default |
Details
This function is particularly useful for summarizing data quality or patterns in row-wise structures,
and is designed to work fluently inside dplyr::mutate()
pipelines.
Internally, count_n()
wraps the stable and dependency-free base function base_count_n()
, allowing high flexibility and testability.
Value
A numeric vector of row-wise counts (unnamed).
Note
This function is inspired by datawizard::row_count()
, but provides additional flexibility:
-
Element-wise type-safe matching using
identical()
whenallow_coercion = FALSE
. This ensures that both the value and its type match exactly, enabling precise comparisons in mixed-type columns. -
Support for multiple values in
count
, allowing queries likecount = c(2, 3)
orcount = c("yes", "no")
to count any of several values per row. -
Detection of special values such as
NA
,NaN
,Inf
, and-Inf
through thespecial
argument — a feature not available inrow_count()
. -
Tidyverse-native behavior: can be used inside
mutate()
without explicitly passing adata
argument.
Value coercion behavior
R automatically coerces mixed-type vectors passed to count
into a common type.
For example, count = c(2, "2")
becomes c("2", "2")
, because R converts numeric and character values to a unified type.
This means that mixed-type checks are not possible at runtime once count
is passed to the function.
To ensure accurate type-sensitive matching, users should avoid mixing types in count
explicitly.
Strict matching mode (allow_coercion = FALSE
)
When strict matching is enabled, each value in count
must match the type of the target column exactly.
For factor columns, this means that count
must also be a factor. Supplying count = "b"
(a character string) will not match a factor value, even if the label appears identical.
A common and intuitive approach is to use count = factor("b")
, which works in many cases. However, identical()
— used internally for strict comparisons — also checks the internal structure of the factor, including the order and content of its levels.
As a result, comparisons may still fail if the levels differ, even when the label is the same.
To ensure a perfect match (label and levels), you can reuse a value taken directly from the data (e.g., df$x[2]
). This guarantees that both the class and the factor levels align. However, this approach only works reliably if all selected columns have the same factor structure.
Case-insensitive matching (ignore_case = TRUE
)
When ignore_case = TRUE
, all values involved in the comparison are converted to lowercase using tolower()
before matching.
This behavior applies to both character and factor columns. Factors are first converted to character internally.
Importantly, this case-insensitive mode takes precedence over strict type comparison: values are no longer compared using identical()
, but rather using lowercase string equality. This enables more flexible matching — for example, "b"
and "B"
will match even when allow_coercion = FALSE
.
Example: strict vs. case-insensitive matching with factors
df <- tibble::tibble( x = factor(c("a", "b", "c")), y = factor(c("b", "B", "a")) ) # Strict match fails with character input count_n(df, count = "b", allow_coercion = FALSE) #> [1] 0 0 0 # Match works only where factor levels match exactly count_n(df, count = factor("b", levels = levels(df$x)), allow_coercion = FALSE) #> [1] 0 1 0 # Case-insensitive match succeeds for both "b" and "B" count_n(df, count = "b", ignore_case = TRUE) #> [1] 1 2 0
Like datawizard::row_count()
, this function also supports regex-based column selection, case-insensitive string comparison, and column exclusion.
Examples
library(dplyr)
library(tibble)
library(haven)
# Basic usage
df <- tibble(
x = c(1, 2, 2, 3, NA),
y = c(2, 2, NA, 3, 2),
z = c("2", "2", "2", "3", "2")
)
df
count_n(df, count = 2)
count_n(df, count = 2, allow_coercion = FALSE)
count_n(df, count = "2", ignore_case = TRUE)
df |> mutate(num_twos = count_n(count = 2))
# Mixed types and special values
df <- tibble(
num = c(1, 2, NA, -Inf, NaN),
char = c("a", "B", "b", "a", NA),
fact = factor(c("a", "b", "b", "a", "c")),
date = as.Date(c("2023-01-01", "2023-01-01", NA, "2023-01-02", "2023-01-01")),
lab = labelled(c(1, 2, 1, 2, NA), labels = c(No = 1, Yes = 2)),
logic = c(TRUE, FALSE, NA, TRUE, FALSE)
)
df
count_n(df, count = 2)
count_n(df, count = 2, allow_coercion = FALSE)
count_n(df, count = "b", ignore_case = FALSE)
count_n(df, count = "b", ignore_case = TRUE)
count_n(df, count = "a", select = fact)
count_n(df, count = as.Date("2023-01-01"), select = date)
count_n(df, count = TRUE, select = logic)
count_n(df, count = 2, select = lab)
df <- df |> mutate(lab_chr = as_factor(lab))
count_n(df, count = "Yes", select = lab_chr, allow_coercion = TRUE)
count_n(df, count = "Yes", select = lab_chr, allow_coercion = FALSE)
# Count special values
count_n(df, special = "NA")
count_n(df, special = "NaN")
count_n(df, special = "-Inf")
count_n(df, special = c("NA", "NaN"))
count_n(df, special = "all")
# Column selection strategies
df <- tibble(
score_math = c(1, 2, 2, 3, NA),
score_science = c(2, 2, NA, 3, 2),
score_lang = c("2", "2", "2", "3", "2"),
name = c("Jean", "Marie", "Ali", "Zoe", "Nina")
)
df
count_n(df, select = c(score_math, score_science), count = 2)
count_n(df, select = starts_with("score_"), exclude = "score_lang", count = 2)
count_n(df, select = everything(), exclude = "name", count = 2)
count_n(df, select = "^score_", regex = TRUE, count = 2)
count_n(df, select = "lang", regex = TRUE, count = "2")
df |> mutate(nb_two = count_n(count = 2))
df |> select(score_math, score_science) |> mutate(nb_two = count_n(count = 2))
df$nb_two <- count_n(df, select = starts_with("score_"), count = 2)
df[1:3, ] |> count_n(select = starts_with("score_"), count = 2)
# Strict type-safe matching with factor columns
df <- tibble(
x = factor(c("a", "b", "c")),
y = factor(c("b", "B", "a"))
)
df
# Coercion: character "b" matches both x and y
count_n(df, count = "b")
# Strict match: fails because "b" is character, not factor (returns only 0s)
count_n(df, count = "b", allow_coercion = FALSE)
# Strict match with factor value: works only where levels match
count_n(df, count = factor("b", levels = levels(df$x)), allow_coercion = FALSE)
# Using a value from the data: guarantees type and levels match for column x
count_n(df, count = df$x[2], allow_coercion = FALSE)
# Case-insensitive match (factors are converted to character internally)
count_n(df, count = "b", ignore_case = TRUE)
count_n(df, count = "B", ignore_case = TRUE)