toy_data {MAPCtools}R Documentation

Synthetic Age-Period-Cohort Dataset

Description

A toy dataset generated to illustrate modeling of age, period, and cohort effects, including interactions with education and sex. This data simulates count outcomes (e.g., disease incidence or event counts) as a function of demographic variables using a Poisson process.

Usage

data(toy_data)

Format

A data frame with 10000 rows and 7 variables:

age

Age of individuals, sampled uniformly from 20 to 59.

period

Calendar year of observation, sampled uniformly from 1990 to 2019.

education

Factor for education level, with levels 1, 2 and 3.

sex

Factor indicating biological sex, with levels: "male", "female".

count

Simulated event count, generated from a Poisson distribution.

known_rate

The true Poisson rate used to generate count, computed from the log-linear model.

cohort

Derived variable indicating year of birth (period - age).

Details

The underlying event rate is modeled on the log scale as a linear combination of age, period, sex, education, and an age-education interaction. The count outcome is drawn from a Poisson distribution with this rate. This dataset is handy for testing APC models.

The true log-rate is computed (for observation n) as:

\log(\lambda_n) = \beta_0 + \beta_{\text{period}}\,\bigl(2020 - \text{period}_n\bigr) + \beta_{\text{sex}}\,I(\text{sex}_n = \text{female}) \\[6pt] \quad + \beta_{\text{edu}}\,(\text{edu level}_n) + \beta_{\text{edu-age}}\,(\text{age}_n - 20)\,(\text{edu level}_n - 1)\,I(\text{age}_n \le 40) \\[6pt] \quad + \beta_{\text{edu-age}}\,(60 - \text{age}_n)\,(\text{edu level}_n - 1)\,I(\text{age}_n > 40)

where the rate decreases over time (periods), increases with age up to age 40, and decreases after. The coefficients used are:

Source

Simulated data, created using base R and tibble.


[Package MAPCtools version 0.1.0 Index]