Skip to main content

Table 4 Methodological frameworks available to enhance clinical prediction models using longitudinal information

From: Harnessing repeated measurements of predictor variables for clinical risk prediction: a review of existing methods

Framework Aim Advantages Limitations Software Extensions/variations Examples
1. Time-dependent covariate modelling (TDCM) A1 Allows for updated predictions over time, simple to apply in available software. Assumes no measurement error, cannot predict the future, correlationsa ignored, measurements assumed constant between time-points, requires complete predictors at event times. Widely available (e.g. R, Stata, SAS). Time-varying effects [25], time-since-measurement as a predictor [27], aggregated covariate [26]. Quantile residual life regression [109]. Applied to assess the prognosis of patients with hepatocellular carcinoma, allowing for prediction at any stage of disease using their most recent information [110].
2. Generalised estimating equations (GEE) A1 Allows for updated predictions over time, accounts for correlationa, can adjust for patient clustering. Ignores underlying trajectory, does not account for changes in at-risk population, and ignores time-dependency. Widely available (e.g. R, stata, SAS). geepack package on R.   Employed to identify patients at high risk of adverse events after cancer therapy [28,29,30]. To account for repeated pre-therapy measurements and outcomes per individual though repeated treatment cycles.
3. Landmark analysis
A1 Avoids misspecification of underlying trajectory, only uses patient information prior to landmark time. Ignores underlying covariate trajectory, often correlationsa ignored, requires complete follow-up, and LOCF approach induces bias. dynpred and coxph functions in R. Competing risks [34, 41], recurrent events [36], combined with TSM [34, 38, 40], pseudo-observations [35, 41], cure fraction models [42]. Employed to predict relapse/death for those in leukaemia remission after transplant [34]. Landmark times 1, 6 and 12 months after bone marrow transplant [34]. Accounted for complications experienced by patients during follow-up.
4. Two-stage modelling (TSM) A2
or A3
Simple to apply, flexible, can account for correlationsa, can handle irregularly spaced measurements. Ignores model-specification error in the first-stage, first model cannot account for drop-out bias. refund in MFPCA R package (FPC), merlin package on R (ME models). Extends to TDCM [111], and LA [112], calibration error included in stage II [52, 60]. In conjunction with LA, TSM used to predict adverse events following endovascular abdominal aortic aneurysm repair [44]. ME models for aneurysm sac diameter change over time, with Cox model [44].
5. Joint-modelling
A1 and (A2 OR A3) Address limitations of TSM framework, allows updated predictions over time, flexible. Complex to implement, strong parametric assumptions, computationally intensive. JMbayes or JM R packages. lcmm R package for JLCMs, frailtypack R package for JFMs. Time-varying effects [13], Bayesian moving average [53, 107], various functions of random effects [13, 53, 73,74,75, 113], third JM to handle missing data and cure fraction models [66, 114,115,116]. Shared random effects JM employed for real-time predictions of prostate cancer recurrence [74]. A ME sub-model for log PSA over time, and a Cox sub-model used for the time-to-event outcome. Estimated using MCMC.
6. Trajectory classification
A1, A2 and A3 Accounts for correlationa, irregularly-spaced measurements, informative processes, updated predictions, underlying trajectory. Complex and computationally intensive for multivariate applications, parametric assumptions required for covariate trajectory. merlin package on R (ME models), Rstan package (Gaussian processes). Multivariate modelling using Gaussian processes [117]. Multivariate modelling and informative processes [76] Employed to classify repeated measurements of hormone levels in early pregnancy to predict pregnancy success in the context of in vitro fertilization [54]. A nonlinear ME model for hormone levels over time. The binary outcome (pregnancy) modelled as an interaction.
7. Machine learning
A1 and A3 Few assumptions, handles high-dimensional data, can identify optimal trajectory characteristics. Often predicts binary outcome, ignores right-censoring, large datasets required to avoid overfitting, often ‘black box’ algorithms. random-Forest R package, Adaboost or gbm R packages for Boosting, LibSVM on R for SVMs. Recurrent Neural Networks (RNNs) [98,99,100,101], Multiple measurements and time series SVM [118, 119], ME models and conditional inference trees [81]. RNNs employed to predict heart failure based on EHR data [98]. RNN identified patterns in previous and current diagnoses and quantified similarities with historic patients diagnosed with heart failure [98].
  1. aCorrelations between and within individuals
  2. Abbreviations: LOCF last observation carried forward, ME mixed effect, SVM support vector machine, MCMC Markov chain Monte Carlo, EHR electronic health record, JLCM joint latent class model