make_state_matrices {MSCA} | R Documentation |
Construct state matrices from longitudinal EHR Data
Description
Builds a binary matrix (0/1/NA) encoding whether each individual had each long-term
condition (LTC) at each time point from 0 to l
, based on their age of onset. The matrix
includes all LTCs, including those used to determine censoring and failure. However, the
presence of fail_code
or cens_code
still triggers NA values after their onset.
Usage
make_state_matrices(
data,
id = "link_id",
ltc = "reg",
aos = "aos",
l = 111,
fail_code = "death",
cens_code = "cens"
)
Arguments
data |
A data frame containing one row per condition occurrence. |
id |
Name of the column identifying individuals. |
ltc |
Name of the column containing LTC labels. |
aos |
Name of the column giving age of onset (or time of onset) for each LTC. |
l |
The maximum time index (inclusive); matrix has |
fail_code |
Label in |
cens_code |
Label in |
Value
A matrix with (l + 1) * number of LTCs
rows and
one column per unique individual. Values are 1 after onset, 0 before, and NA after censor/fail.
Rows are named <ltc>_<time>
, and columns are individual IDs.
Note
For large datasets, computations may be split into multiple jobs to manage memory and performance. Consider reducing the time granularity and/or the number of long-term condition (event of interest) to improve efficiency and stability.
Author(s)
@author Marc Delord
References
Delord M, Douiri A (2025) doi:10.1186/s12874-025-02476-7