bin_by_date {ggsurveillance} | R Documentation |
Aggregate data by time periods
Description
Aggregates data by specified time periods (e.g., weeks, months) and calculates (weighted)
counts. Incidence rates are also calculated using the provided population numbers.
This function is the core date binning engine
used by geom_epicurve()
and stat_bin_date()
for creating epidemiological
time series visualizations.
Usage
bin_by_date(
x,
dates_from,
n = 1,
population = 1,
fill_gaps = FALSE,
date_resolution = "week",
week_start = 1,
.groups = "drop"
)
Arguments
x |
Either a data frame with a date column, or a date vector.
|
dates_from |
Column name containing the dates to bin. Used when x is a data.frame. |
n |
Numeric column with case counts (or weights). Supports quoted and unquoted column names. |
population |
A number or a numeric column with the population size. Used to calculate the incidence. |
fill_gaps |
Logical; If |
date_resolution |
Character string specifying the time unit for date aggregation.
Possible values include:
|
week_start |
Integer specifying the start of the week (1 = Monday, 7 = Sunday).
Only used when |
.groups |
See |
Details
The function performs several key operations:
-
Date coercion: Converts the date column to proper Date format
-
Gap filling (optional): Generates complete temporal sequences to fill missing time periods with zeros
-
Date binning: Rounds dates to the specified resolution using
lubridate::floor_date()
-
Weight and population handling: Processes count weights and population denominators
-
Aggregation: Groups by binned dates and sums weights to get counts and incidence
Grouping behaviour: The function respects existing grouping in the input data frame.
Value
A data frame with the following columns:
A date column with the same name as
dates_from
, where values are binned to the start of the specified time period.-
n
: Count of observations (sum of weights) for each time period -
incidence
: Incidence rate calculated asn / population
for each time period Any existing grouping variables are preserved
Examples
library(dplyr)
# Create sample data
outbreak_data <- data.frame(
onset_date = as.Date("2024-12-10") + sample(0:100, 50, replace = TRUE),
cases = sample(1:5, 50, replace = TRUE)
)
# Basic weekly binning
bin_by_date(outbreak_data, dates_from = onset_date)
# Weekly binning with case weights
bin_by_date(outbreak_data, onset_date, n = cases)
# Monthly binning
bin_by_date(outbreak_data, onset_date,
date_resolution = "month"
)
# ISO week binning (Monday start)
bin_by_date(outbreak_data, onset_date,
date_resolution = "isoweek"
) |>
mutate(date_formatted = strftime(onset_date, "%G-W%V")) # Add correct date labels
# US CDC epiweek binning (Sunday start)
bin_by_date(outbreak_data, onset_date,
date_resolution = "epiweek"
)
# With population data for incidence calculation
outbreak_data$population <- 10000
bin_by_date(outbreak_data, onset_date,
n = cases,
population = population
)