Harnessing repeated measurements of predictor variables for clinical risk prediction: a review of existing methods

Bull, Lucy M.; Lunt, Mark; Martin, Glen P.; Hyrich, Kimme; Sergeant, Jamie C.

doi:10.1186/s41512-020-00078-z

Table 4 Methodological frameworks available to enhance clinical prediction models using longitudinal information

From: Harnessing repeated measurements of predictor variables for clinical risk prediction: a review of existing methods

Framework	Aim	Advantages	Limitations	Software	Extensions/variations	Examples
1. Time-dependent covariate modelling (TDCM)	A1	Allows for updated predictions over time, simple to apply in available software.	Assumes no measurement error, cannot predict the future, correlations^a ignored, measurements assumed constant between time-points, requires complete predictors at event times.	Widely available (e.g. R, Stata, SAS).	Time-varying effects [25], time-since-measurement as a predictor [27], aggregated covariate [26]. Quantile residual life regression [109].	Applied to assess the prognosis of patients with hepatocellular carcinoma, allowing for prediction at any stage of disease using their most recent information [110].
2. Generalised estimating equations (GEE)	A1	Allows for updated predictions over time, accounts for correlation^a, can adjust for patient clustering.	Ignores underlying trajectory, does not account for changes in at-risk population, and ignores time-dependency.	Widely available (e.g. R, stata, SAS). geepack package on R.		Employed to identify patients at high risk of adverse events after cancer therapy [28,29,30]. To account for repeated pre-therapy measurements and outcomes per individual though repeated treatment cycles.
3. Landmark analysis (LA)	A1	Avoids misspecification of underlying trajectory, only uses patient information prior to landmark time.	Ignores underlying covariate trajectory, often correlations^a ignored, requires complete follow-up, and LOCF approach induces bias.	dynpred and coxph functions in R.	Competing risks [34, 41], recurrent events [36], combined with TSM [34, 38, 40], pseudo-observations [35, 41], cure fraction models [42].	Employed to predict relapse/death for those in leukaemia remission after transplant [34]. Landmark times 1, 6 and 12 months after bone marrow transplant [34]. Accounted for complications experienced by patients during follow-up.
4. Two-stage modelling (TSM)	A2 or A3	Simple to apply, flexible, can account for correlations^a, can handle irregularly spaced measurements.	Ignores model-specification error in the first-stage, first model cannot account for drop-out bias.	refund in MFPCA R package (FPC), merlin package on R (ME models).	Extends to TDCM [111], and LA [112], calibration error included in stage II [52, 60].	In conjunction with LA, TSM used to predict adverse events following endovascular abdominal aortic aneurysm repair [44]. ME models for aneurysm sac diameter change over time, with Cox model [44].
5. Joint-modelling (JM)	A1 and (A2 OR A3)	Address limitations of TSM framework, allows updated predictions over time, flexible.	Complex to implement, strong parametric assumptions, computationally intensive.	JMbayes or JM R packages. lcmm R package for JLCMs, frailtypack R package for JFMs.	Time-varying effects [13], Bayesian moving average [53, 107], various functions of random effects [13, 53, 73,74,75, 113], third JM to handle missing data and cure fraction models [66, 114,115,116].	Shared random effects JM employed for real-time predictions of prostate cancer recurrence [74]. A ME sub-model for log PSA over time, and a Cox sub-model used for the time-to-event outcome. Estimated using MCMC.
6. Trajectory classification (TC)	A1, A2 and A3	Accounts for correlation^a, irregularly-spaced measurements, informative processes, updated predictions, underlying trajectory.	Complex and computationally intensive for multivariate applications, parametric assumptions required for covariate trajectory.	merlin package on R (ME models), Rstan package (Gaussian processes).	Multivariate modelling using Gaussian processes [117]. Multivariate modelling and informative processes [76]	Employed to classify repeated measurements of hormone levels in early pregnancy to predict pregnancy success in the context of in vitro fertilization [54]. A nonlinear ME model for hormone levels over time. The binary outcome (pregnancy) modelled as an interaction.
7. Machine learning (ML)	A1 and A3	Few assumptions, handles high-dimensional data, can identify optimal trajectory characteristics.	Often predicts binary outcome, ignores right-censoring, large datasets required to avoid overfitting, often ‘black box’ algorithms.	random-Forest R package, Adaboost or gbm R packages for Boosting, LibSVM on R for SVMs.	Recurrent Neural Networks (RNNs) [98,99,100,101], Multiple measurements and time series SVM [118, 119], ME models and conditional inference trees [81].	RNNs employed to predict heart failure based on EHR data [98]. RNN identified patterns in previous and current diagnoses and quantified similarities with historic patients diagnosed with heart failure [98].

^aCorrelations between and within individuals
Abbreviations: LOCF last observation carried forward, ME mixed effect, SVM support vector machine, MCMC Markov chain Monte Carlo, EHR electronic health record, JLCM joint latent class model

Back to article page

ISSN: 2397-7523

Contact us

Submission enquiries: Access here and click Contact Us
General enquiries: info@biomedcentral.com

Diagnostic and Prognostic Research

Contact us