remove_duplicates {cleanepi} | R Documentation |
Remove duplicates
Description
When removing duplicates, users can specify a set columns to consider with
the target_columns
argument.
Usage
remove_duplicates(data, target_columns = NULL)
Arguments
data |
The input |
target_columns |
A |
Details
Caveat: In many epidemiological datasets, multiple rows may share the same value in one or more columns without being true duplicates. For example, several individuals might have the same symptom onset date and admission date. Be cautious when using this function—especially when applying it to a single target column—to avoid incorrect identification or removal of valid entries.
Value
The input data <data.frame>
or <linelist>
without the
duplicated rows identified from all or the specified columns.
Examples
data <- readRDS(
system.file("extdata", "test_linelist.RDS", package = "cleanepi")
)
no_dups <- remove_duplicates(
data = data,
target_columns = "linelist_tags"
)
# print the removed duplicates
print_report(no_dups, "removed_duplicates")
# print the detected duplicates
print_report(no_dups, "found_duplicates")