simulatedsurvivaldata {CalibrationCurves} | R Documentation |
Breast Cancer Survival Data from Rotterdam and Germany
Description
The training dataset contains real-life survival data from patients who underwent primary surgery for breast cancer between 1978 and 1993 in Rotterdam. The patients were followed until 2007, resulting in a model development cohort of 2982 patients after exclusions. The primary outcome measured was recurrence-free survival, defined as the time from primary surgery to recurrence or death.
The validation dataset consists of 686 patients with primary node-positive breast cancer from the German Breast Cancer Study Group. In this cohort, 285 patients suffered a recurrence or died within 5 years of follow-up, while 280 were censored before 5 years. Five-year predictions were chosen as that was the lowest median survival from the two cohorts (Rotterdam cohort, 6.7 years; German cohort, 4.9 years).
Usage
data(trainDataSurvival)
data(testDataSurvival)
Format
A data frame with observations on the following 26 variables.
- pid
patient identifier
- year
year of surgery
- age
age at surgery
- meno
menopausal status (0 = premenopausal, 1 = postmenopausal)
- size
tumor size, a factor with levels <= 20, 20-50, >50
- grade
differentiation grade
- nodes
number of positive lymph nodes
- pgr
progesterone receptors (fmol/l)
- er
estrogen receptors (fmol/l)
- hormon
hormonal treatment (0 = no, 1 = yes)
- chemo
chemotherapy
- rtime
days to relapse or last follow-up
- recur
0 = no relapse, 1 = relapse
- dtime
days to death or last follow-up
- death
0 = alive, 1 = dead
- ryear
Follow-up time for RFS, in years (numeric)
- rfs
Recurrence-free survival status (0 = no event, 1 = event) (numeric)
- pgr2
Winsorized progesterone receptor level (numeric)
- nodes2
Winsorized node count (numeric)
- csize
Categorized tumor size, copied from
size
(factor)- cnode
Categorized node involvement (factor: "0", "1-3", ">3")
- grade3
Recoded grade factor (levels: "1-2", "3")
- nodes3
Restricted cubic spline basis for
nodes2
(numeric)- pgr3
Restricted cubic spline basis for original
pgr
(numeric)- epoch
Follow-up epoch indicator after splitting at 5 years (numeric)
Details
The data sets are based on the publicly available code and data used in the repository Prediction_performance_survival by Giardiello et al. (2023), which accompanies the Annals of Internal Medicine article "Assessing Performance and Clinical Usefulness in Prediction Models With Survival Outcomes: Practical Guidance for Cox Proportional Hazards Models".
All preprocessing steps, such as converting survival time to years, defining recurrence-free survival status via 'rfs = pmax(recur, death)', correcting 43 discordant cases using death time, 99th-percentile winsorization of 'pgr' and 'nodes', spline transformations ('nodes3', 'pgr3'), splitting follow-up at 5 years ('epoch'), and recoding categorical variables ('csize', 'cnode', 'grade3')—were performed exactly as in the Giardiello code.
The training dataset, trainDataSurvival
, consists of 2982 patients, with 1713 events occurring over a maximum
follow-up time of 19.3 years. The estimated median potential follow-up time, calculated using the reverse Kaplan-
method, was 9.3 years. Out of these patients, 1275 suffered a recurrence or death within the follow-up time of interest
(5 years), and 126 were censored before 5 years.
The validation dataset, testDataSurvival
, consists of 686 patients with primary node-positive breast cancer
from the German Breast Cancer Study Group. In this cohort, 285 patients suffered a recurrence or died within 5 years
of follow-up, while 280 were censored before 5 years. Five-year predictions were chosen as that was the lowest median
survival from the two cohorts (Rotterdam cohort, 6.7 years; German cohort, 4.9 years).
References
David J. McLernon, Daniele Giardiello, Ben Van Calster, et al. (2023). Assessing Performance and Clinical Usefulness in Prediction Models With Survival Outcomes: Practical Guidance for Cox Proportional Hazards Models. Annals of Internal Medicine, 176(1), pp. 105-114, doi:10.7326/M22-0844
Examples
data(testDataSurvival)
## Explore the structure of the dataset
str(testDataSurvival)