Framework | Aim | Advantages | Limitations | Software | Extensions/variations | Examples |
---|---|---|---|---|---|---|
1. Time-dependent covariate modelling (TDCM) | A1 | Allows for updated predictions over time, simple to apply in available software. | Assumes no measurement error, cannot predict the future, correlationsa ignored, measurements assumed constant between time-points, requires complete predictors at event times. | Widely available (e.g. R, Stata, SAS). | Time-varying effects [25], time-since-measurement as a predictor [27], aggregated covariate [26]. Quantile residual life regression [109]. | Applied to assess the prognosis of patients with hepatocellular carcinoma, allowing for prediction at any stage of disease using their most recent information [110]. |
2. Generalised estimating equations (GEE) | A1 | Allows for updated predictions over time, accounts for correlationa, can adjust for patient clustering. | Ignores underlying trajectory, does not account for changes in at-risk population, and ignores time-dependency. | Widely available (e.g. R, stata, SAS). geepack package on R. | Â | Employed to identify patients at high risk of adverse events after cancer therapy [28,29,30]. To account for repeated pre-therapy measurements and outcomes per individual though repeated treatment cycles. |
3. Landmark analysis (LA) | A1 | Avoids misspecification of underlying trajectory, only uses patient information prior to landmark time. | Ignores underlying covariate trajectory, often correlationsa ignored, requires complete follow-up, and LOCF approach induces bias. | dynpred and coxph functions in R. | Competing risks [34, 41], recurrent events [36], combined with TSM [34, 38, 40], pseudo-observations [35, 41], cure fraction models [42]. | Employed to predict relapse/death for those in leukaemia remission after transplant [34]. Landmark times 1, 6 and 12 months after bone marrow transplant [34]. Accounted for complications experienced by patients during follow-up. |
4. Two-stage modelling (TSM) | A2 or A3 | Simple to apply, flexible, can account for correlationsa, can handle irregularly spaced measurements. | Ignores model-specification error in the first-stage, first model cannot account for drop-out bias. | refund in MFPCA R package (FPC), merlin package on R (ME models). | Extends to TDCM [111], and LA [112], calibration error included in stage II [52, 60]. | In conjunction with LA, TSM used to predict adverse events following endovascular abdominal aortic aneurysm repair [44]. ME models for aneurysm sac diameter change over time, with Cox model [44]. |
5. Joint-modelling (JM) | A1 and (A2 OR A3) | Address limitations of TSM framework, allows updated predictions over time, flexible. | Complex to implement, strong parametric assumptions, computationally intensive. | JMbayes or JM R packages. lcmm R package for JLCMs, frailtypack R package for JFMs. | Time-varying effects [13], Bayesian moving average [53, 107], various functions of random effects [13, 53, 73,74,75, 113], third JM to handle missing data and cure fraction models [66, 114,115,116]. | Shared random effects JM employed for real-time predictions of prostate cancer recurrence [74]. A ME sub-model for log PSA over time, and a Cox sub-model used for the time-to-event outcome. Estimated using MCMC. |
6. Trajectory classification (TC) | A1, A2 and A3 | Accounts for correlationa, irregularly-spaced measurements, informative processes, updated predictions, underlying trajectory. | Complex and computationally intensive for multivariate applications, parametric assumptions required for covariate trajectory. | merlin package on R (ME models), Rstan package (Gaussian processes). | Multivariate modelling using Gaussian processes [117]. Multivariate modelling and informative processes [76] | Employed to classify repeated measurements of hormone levels in early pregnancy to predict pregnancy success in the context of in vitro fertilization [54]. A nonlinear ME model for hormone levels over time. The binary outcome (pregnancy) modelled as an interaction. |
7. Machine learning (ML) | A1 and A3 | Few assumptions, handles high-dimensional data, can identify optimal trajectory characteristics. | Often predicts binary outcome, ignores right-censoring, large datasets required to avoid overfitting, often ‘black box’ algorithms. | random-Forest R package, Adaboost or gbm R packages for Boosting, LibSVM on R for SVMs. | Recurrent Neural Networks (RNNs) [98,99,100,101], Multiple measurements and time series SVM [118, 119], ME models and conditional inference trees [81]. | RNNs employed to predict heart failure based on EHR data [98]. RNN identified patterns in previous and current diagnoses and quantified similarities with historic patients diagnosed with heart failure [98]. |