make_state_matrices {MSCA}R Documentation

Construct state matrices from longitudinal EHR Data

Description

Builds a binary matrix (0/1/NA) encoding whether each individual had each long-term condition (LTC) at each time point from 0 to l, based on their age of onset. The matrix includes all LTCs, including those used to determine censoring and failure. However, the presence of fail_code or cens_code still triggers NA values after their onset.

Usage

make_state_matrices(
  data,
  id = "link_id",
  ltc = "reg",
  aos = "aos",
  l = 111,
  fail_code = "death",
  cens_code = "cens"
)

Arguments

data

A data frame containing one row per condition occurrence.

id

Name of the column identifying individuals.

ltc

Name of the column containing LTC labels.

aos

Name of the column giving age of onset (or time of onset) for each LTC.

l

The maximum time index (inclusive); matrix has l + 1 time rows per LTC.

fail_code

Label in ltc indicating a failure event (e.g., death).

cens_code

Label in ltc indicating censoring.

Value

A matrix with ⁠(l + 1) * number of LTCs⁠ rows and one column per unique individual. Values are 1 after onset, 0 before, and NA after censor/fail. Rows are named ⁠<ltc>_<time>⁠, and columns are individual IDs.

Note

For large datasets, computations may be split into multiple jobs to manage memory and performance. Consider reducing the time granularity and/or the number of long-term condition (event of interest) to improve efficiency and stability.

Author(s)

@author Marc Delord

References

Delord M, Douiri A (2025) doi:10.1186/s12874-025-02476-7


[Package MSCA version 1.2.1 Index]