Skip to main content

Comparison of methods for predicting COVID-19-related death in the general population using the OpenSAFELY platform



Obtaining accurate estimates of the risk of COVID-19-related death in the general population is challenging in the context of changing levels of circulating infection.


We propose a modelling approach to predict 28-day COVID-19-related death which explicitly accounts for COVID-19 infection prevalence using a series of sub-studies from new landmark times incorporating time-updating proxy measures of COVID-19 infection prevalence. This was compared with an approach ignoring infection prevalence.

The target population was adults registered at a general practice in England in March 2020. The outcome was 28-day COVID-19-related death. Predictors included demographic characteristics and comorbidities. Three proxies of local infection prevalence were used: model-based estimates, rate of COVID-19-related attendances in emergency care, and rate of suspected COVID-19 cases in primary care.

We used data within the TPP SystmOne electronic health record system linked to Office for National Statistics mortality data, using the OpenSAFELY platform, working on behalf of NHS England.

Prediction models were developed in case-cohort samples with a 100-day follow-up. Validation was undertaken in 28-day cohorts from the target population. We considered predictive performance (discrimination and calibration) in geographical and temporal subsets of data not used in developing the risk prediction models. Simple models were contrasted to models including a full range of predictors.


Prediction models were developed on 11,972,947 individuals, of whom 7999 experienced COVID-19-related death. All models discriminated well between individuals who did and did not experience the outcome, including simple models adjusting only for basic demographics and number of comorbidities: C-statistics 0.92–0.94. However, absolute risk estimates were substantially miscalibrated when infection prevalence was not explicitly modelled.


Our proposed models allow absolute risk estimation in the context of changing infection prevalence but predictive performance is sensitive to the proxy for infection prevalence. Simple models can provide excellent discrimination and may simplify implementation of risk prediction tools.

Peer Review reports

Key messages

  • Evolving policies regarding restrictions on social contact and return to work require good information regarding risks of severe COVID-19 outcomes in the general population; however, obtaining accurate risk estimates is challenging in the context of changing levels of circulating infection, which standard risk prediction models commonly ignore.

  • We describe methods for risk prediction adopting a landmarking approach to dynamically incorporate time- and region-dependent information on infection prevalence.

  • We compare the discrimination and calibration of models that do and do not incorporate changing infection prevalence over time in different geographical and temporal settings, and compare simpler models to more richly specified ones.

  • Our study suggests that models that ignore the infection prevalence provide poorly calibrated estimates of absolute risk; models that include time-varying measures of the infection prevalence can provide more accurate estimates.

  • Simple models based only on a number of comorbidities and basic demographics performed almost as well as more complex risk prediction models, both within models including infection prevalence and models that ignore this, suggesting that policies targeting population-level reduction of COVID-19 mortality risk may not need to distinguish between all comorbidities in detail.


Characterised as a pandemic by the World Health Organization on 11 March 2020 [1], globally, the cumulative number of cases of COVID-19 has exceeded 180 million with almost 4 million deaths attributed to the virus at the time of writing [2]. Evolving policies such as return-to-work strategies and restrictions on social contact are heavily informed by the estimated risk of severe outcomes from COVID-19. Policy-making is often informed by absolute risks in the general population. This risk reflects the result of two processes: being infected and dying once infected, thus depends critically on infection prevalence. Transporting estimates of absolute risk from one context to another, such as a different time period or geographical region, is particularly challenging in COVID-19 due to substantial variation in the infection prevalence over time and by geography [3].

Prediction models that do not explicitly model the underlying infection prevalence may provide inaccurate estimates of absolute risk. Whether such models produce transportable rankings of risk, to different temporal and geographical contexts, is uncertain. Some predictors identified in these models, such as geographical region and characteristics that differ substantially by region, may be indirectly capturing differences in infection prevalence; models explicitly including the infection prevalence may allow simpler predictive models with equally good performance.

Incorporating the underlying infection prevalence is not straightforward because direct estimates are typically not available within small geographical areas. Whether easily accessible proxy measures are sufficiently accurate to produce reasonable risk estimates remains uncertain.

In this study, we compare modelling strategies to predict 28-day COVID-19 death which do and do not allow for dynamic predictions and illustrate the impact that these methodological choices have, in terms of model discrimination and calibration, using data from the first wave of COVID-19 in England held in the OpenSAFELY platform [4] on almost 12 million adults in England. Our overarching goal is to inform subsequent development of risk prediction models in this context, rather than to develop a risk prediction model to inform policy-making. Specifically, we aim to answer the following: (1) Do risk prediction models which do not explicitly model the underlying infection prevalence perform well in different geographical and temporal contexts? (2) Can transportable estimates of absolute risk of COVID-19 death be obtained by explicitly incorporating proxy estimates of the changing infection prevalence? (3) Can simpler prediction algorithms be used for predicting risk without losing substantial predictive ability, compared with more richly specified models?


Our statistical approach is based on a model for 28-day COVID-19-related death detailed in the Supplementary Appendix. Our reporting adheres to the TRIPOD statement for reporting of multivariable prediction models [5].


Study population

The target population is adults in England living in the community; residential settings are excluded since risks experienced in institutions such as care homes are likely to be very different to those in smaller households.

Data source

Primary care records managed by the GP software provider TPP were linked to Office for National Statistics (ONS) death data through OpenSAFELY, a data analytics platform created by our team on behalf of NHS England to address urgent COVID-19 research questions ( OpenSAFELY provides a secure software interface allowing the analysis of pseudonymised primary care patient records from England in near real-time within the electronic health record vendor’s highly secure data centre, avoiding the need for large volumes of potentially disclosive pseudonymised patient data to be transferred off-site. This, in addition to other technical and organisational controls, minimises any risk of re-identification. Similarly, pseudonymised datasets from other data providers are securely provided to the electronic health record vendor and linked to the primary care data. Data includes pseudonymised data such as coded diagnoses, medications, and physiological parameters. No free text data are included.

Base cohort

A base cohort was defined, comprising males and females aged 18 years or older registered as of 1 March 2020 in a general practice employing the TPP system, followed up for 100 days (1 March 2020 until 8 June 2020). Individuals with missing age, sex, postcode, ethnicity, a recorded age over 105 years, or living in households of >10 people were excluded.


The outcome was COVID-19-related death, defined by ICD-10 codes U07.1 or U07.2 anywhere on the death certificate.

Predictor variables

We selected candidate predictors based on known or plausible associations with exposure to COVID-19 infection, risk of severe illness or respiratory tract infection, and factors associated with healthcare access or level of care. Potential predictors were age, sex, ethnicity, deprivation, number in household, presence of young children in household, a rural indicator, obesity, smoking, blood pressure, and comorbidities (details in Supplementary Appendix).

Three different proxy measures of COVID-19 infection prevalence, measured daily, were available: (1) model-based estimates [6] available by region and age group, (2) rate of COVID-19-related A&E (emergency) attendances over the last week by local geographic area, and (3) rate of suspected COVID-19 cases over the last week by local geographic area (details in Supplementary Appendix).

Development of risk prediction algorithms

Our goal is to predict the risk of 28-day COVID-19 death. Note that this is not death within 28 days of infection, but the risk of experiencing COVID-19 death within a 28-day period for individuals in our target population. Two approaches were adopted: approach A—models which do not explicitly account for the time-changing prevalence of COVID-19 infection, and approach B—landmarking models [7], which use a series of sequential overlapping 28-day sub-studies incorporating time-updating proxy measures of infection prevalence. Risk prediction models were developed using data from the base cohort. Case-cohort sampling was required to enable model fitting for approach B (the stacked sub-studies contain nearly 900 million rows of data). To reduce computational burden and increase comparability between approaches, case-cohort sampling was used to fit models for both approaches.

To develop models within approach A, follow-up began 1 March 2020 and ended at the first of COVID-19-related death or 8 June 2020. The outcome was COVID-19-related death (any time during the 100-day follow-up). No censoring was applied at death due to non-COVID causes, to target the sub-distribution hazard [8]; no other censoring events occurred. The analysis sample included all cases of COVID-19-related death and a random age-stratified sample of the base cohort (the ‘sub-cohort’) [9, 10], with sampling fractions of 0.01 in the age group 18–<40, 0.02 in 40–<60, 0.025 in 60–<70, 0.05 in 70–<80 and 0.13 in 80+ years.

Four approach A models were fitted, using different sets of predictor variables. The first used penalised regression (lasso) [11] to select a parsimonious model (the “selected” model; models were subsequently re-fitted including only the selected variables). The second included age group (10-category), sex and their interaction (“age-sex” model). The third additionally included grouped number of comorbidities (0, 1, 2, 3+), ethnicity and rural/urban (“comorbidities” model). The fourth included all 36 potential predictors and all possible interactions with age and sex (“full” model, details in the Supplementary Appendix).

Cox proportional hazards model were fitted including the relevant predictors using time in study as the timescale with Barlow weights to account for the case-cohort design and robust standard errors [9, 10]. The baseline survivor function was estimated at day 28 (\( {\hat{S}}_{28} \)) and the estimated log hazard ratios (\( \hat{\beta} \)) were extracted. For an individual with predictor values xi, the predicted risk is then given by [12]: \( {p}_i=1-{\hat{\ S}}_{28}^{\exp \left({\hat{\beta}}^T{x}_i\right)} \).

To develop models within approach B, using landmarking models [7], a series of 73 overlapping sequential sub-studies were extracted from the base cohort. The sub-studies started 0, 1, 2, 3, 4…, 72 days after 1 March 2020 (Fig. 1). Follow-up started at the sub-study start date and ended at the first of COVID-19-related death or 28 days after sub-study entry. Individuals were not censored at deaths due to other causes. The outcome was COVID-19-related death during the 28-day period. Each sub-study had a case-cohort design, including all eligible individuals (those in the base cohort still alive at the sub-study start) who experienced a COVID-19-related death during the sub-study period and an age-stratified random sample of sub-study eligible individuals, with age group-specific sampling fractions equal to 1/70 of the sampling fractions for approach A. Data from all sub-studies were combined for analysis. Predictor variables were assessed at day 0 of each sub-study.

Fig. 1
figure 1

Schematic showing the design of the 28-day landmarking sub-studies (approach B)

Functional forms for the proxy measures of COVID-19 infection prevalence were selected using Akaike’s Information Criterion, with our selection of candidate functional forms guided by our proposed model for COVID-19 death (Supplementary Appendix).

A Poisson model for 28-day COVID-19-related death was fitted to the combined dataset using Barlow weights and robust standard errors, incorporating predictors—chosen via the lasso for the “selected” model, pre-specified for the “age-sex”, “comorbidities” and “full” models—and proxy measures of COVID-19 infection prevalence. In each case, 3 models were fitted: one for each proxy measure.

The predicted risk is given by the same equation as for the Cox model, replacing the hazard ratios by incidence rate ratios and obtaining the baseline survivor function from the estimated constant coefficient.

Model validation

Calibration and discrimination were assessed for 17 risk prediction algorithms (5 approach A; 12 approach B). For approach A, this was the “selected”, “age-sex”, “comorbidities” and “full” models, as well as the existing COVID-AGE risk tool (10th update) [13, 14]. For approach B, this was the “selected”, “age-sex”, “comorbidities” and “full” models incorporating each of the 3 proxy measures of infection prevalence.

Overall internal validation

From the base cohort, three validation cohorts were defined each lasting 28 days. The three validation cohorts covered periods with higher and lower infection prevalence (Fig. 2), to allow comparison of modelling strategies under these different conditions. Each validation cohort study comprised all individuals from the base cohort who remained alive at the start of the validation period. Predictor variables were assessed at day 0 of the validation cohort; predicted risk of 28-day COVID-19-related death was obtained using each of the 17 algorithms. Model performance was assessed by comparing these predicted risks to the observed binary outcome, 28-day COVID-19-related death (death within the 28-day period of the validation cohort).

Fig. 2
figure 2

Schematic showing the forms of validation undertaken

Discrimination was assessed by Harrell’s C-statistic [15, 16]. Calibration was assessed by estimating the calibration intercept and slope and comparing observed and predicted risk, overall and by groups of predicted risk [16, 17]. Flexible calibration curves were drawn. Model performance was assessed overall and within sex and broad age groups (18–<70, 70–<80 and 80+; insufficient events occurred in the youngest age group to split further) and regions.

COVID-AGE is primarily a risk stratification tool, so we did not obtain calibration measures for this tool [14].

Geographical and temporal internal validation

Although we were unable to perform external validation, we undertook additional forms of internal validation as recommended by Steyerberg et al. [18]. To assess model performance in different geographical contexts, the 16 risk prediction models developed within these data were re-fitted after excluding all individuals from a particular geographical region [18]. The resulting models were used to predict risk in the subset of the 3 validation cohorts comprising individuals from the excluded geographical region. Model performance was assessed within the excluded region (i.e. validation was performed using individuals whose data was not used in developing the risk prediction models). This process was repeated for each of the 7 regions.

To assess model performance in different temporal contexts, the 16 risk prediction models were re-fitted after excluding the last 28 days of data (i.e. the whole time period of validation period 3). The resulting models were used to predict risk in validation cohort 3. Individuals who remained alive at the start of validation cohort 3 appeared in both the model development data and the validation data, but their predictor values were updated at the start of validation period 3; individuals who experienced the event in the development data did not appear in the validation data.

Missing data

Ethnicity, body mass index (BMI) and smoking data are collected in response to clinical need, thus likely to be missing not at random. As smoking and obesity, if present, are likely to be recorded, individuals with missing BMI were assumed non-obese and individuals with no smoking information were assumed non-smokers. Individuals with no serum creatinine measurement were included in the “no evidence of poor kidney function” group. Individuals with diabetes but no glycosylated haemoglobin (HbA1c) measurement were included in a separate “diabetes, no HbA1c” category. Analysis was restricted to individuals with recorded age, sex and ethnicity data; such complete case analysis is valid under a range of missing not at random assumptions [19].

Sensitivity analysis

Additional sensitivity analyses were undertaken (Supplementary Appendix), including refitting the approach A “selected” model in the entire cohort (without case-cohort sampling), leading to very similar results, and using a Weibull model for both approaches A and B to increase comparability between approaches, resulting in similar conclusions about the comparison between approaches. Further details of the methods used can be found in our pre-published protocol [20].


Data management was performed using Python and Google BigQuery, with analysis carried out using Stata 16.1/Python. All code used is openly shared online for review and re-use under MIT open license (

Patient and public involvement

We invite any patient or member of the public to contact us regarding this study or the broader OpenSAFELY project through our website


The base cohort comprised almost 12 million individuals, of whom 7999 experienced a COVID-19-related death (Table 1, Fig. S1 in the Supplementary Appendix). For model development, the approach A case-cohort study included 7999 COVID-19-related deaths and a sub-cohort of 319,917. Approach B sub-studies included all 7999 COVID-19 deaths and 330,132 individuals. The fitted models are provided in the Supplementary Appendix.

Table 1 Description of base cohort


In all validation periods, the C-statistic for the “selected” models was high (0.92–0.94), indicating excellent ability to distinguish between individuals who experienced 28-day COVID-19 death and who did not (Table 2). No difference in discrimination was seen between approaches A and B.

Table 2 Measures of model performance in predicting 28-day risk of COVID-19 mortality, using the “selected” models

Among the oldest age group (80+ years), discrimination was much lower than in younger age groups (Table 3), reflecting the substantial discrimination that comes from age which is partly lost after age restriction. For example, for the approach A “selected” model in validation period 1, the C-statistic was 0.77 for females and 0.65 for males among the 80+ age group compared with 0.88 and 0.91 among the 18–<70 age group. Discrimination was similar across regions (Table S4; range of C-statistics across regions and validation periods: approach A 0.903–0.962, approach B 0.907–0.966, other than in the North West region in validation period 1, which had lower discrimination: approach A C-statistic 0.84; approach B 0.85–0.86).

Table 3 C-statistics for different sets of predictor variable sets, by sex and broad age group

Geographical internal validation showed that discrimination of models was largely insensitive to the removal of a region (Tables 4 and S7, Figure S9), with the exception of omission of the North-West region in validation period 1, which resulted in lower discrimination for all models. Removing validation period 3 from the development data, in the temporal validation, did not reduce discrimination for validation period 3.

Table 4 Measures of model performance in predicting 28-day risk of COVID-19 mortality in the temporal and geographical internal validation, using the “selected” models

Absolute risk estimation

For the approach A Cox model, the mean predicted and observed risks were very similar in the first validation period (the initial portion of the data on which the model was fitted), but different in the second and third (Table 2). In validation period 2, the mean observed risk was ten times higher than in validation period 1, but the mean predicted risk was (by design, since it ignores the infection prevalence) almost identical for all three periods.

For the approach B models incorporating model-based estimates of infection prevalence, the mean and observed risks were very similar in all validation periods. The calibration intercept was slightly less than zero in validation periods 1 and 3, indicating slight over-estimation on average, with calibration slopes close to one. Replacing model-based estimates by either the rate of A&E COVID-19 attendances or the rate of suspected COVID-19 cases in primary care resulted in poorer calibration, particularly in the first validation period which had a very low infection prevalence. All approach B models had worse calibration than the approach A model in validation period 1 but considerably better calibration than the approach A model in the other two validation periods. Figures S3-S6 show flexible calibration curves and predicted versus observed risks by twentieths of predicted risk.

Calibration intercepts varied by region (e.g. approach A in validation period 1, range of estimated calibration intercepts: −1, +1.12), but calibration slopes varied less (Table S4). The intercepts varied least for approach B in validation period 2 (range of estimated calibration intercepts: −0.28, 0.21), which was the period with the highest infection prevalence.

Geographical internal validation showed that estimated calibration intercepts were sensitive to removal of a geographical region, but slopes less so. Using approach A models, the temporal validation results were similar to the overall internal validation findings. In contrast, using approach B, we found worse calibration under the temporal validation (Table S6, Fig. S8).

Models with fewer predictors

A simple model including only age and sex provided reasonable discrimination (C-statistic ~0.80) among the 18–<70 age group (Tables 3 and S5). Among older age groups, in contrast, discrimination was substantially lower in the “age-sex” model compared to the models with more predictors.

Among females in the 18–<70 age group, both the “comorbidities” and “full” models had C-statistics of 0.89 in validation period 1. In males, the analogous numbers were 0.92 (“comorbidities”) and 0.93 (“full”). Among this age group, across the validation periods in both approaches A and B, discrimination was at most 0.03 lower for the “comorbidities” compared to the “full” model.

The COVID-AGE algorithm had C-statistics at most 0.03 lower than the most complex models used among the 18–<70 age group (e.g. validation period 1 C-statistic 0.91 (“COVID-AGE”) vs 0.89 (“full”) among females; 0.92 (“COVID-AGE”) vs 0.93 (“full”) among males). In older age groups, the “full” model had higher discrimination than “COVID-AGE” (e.g. validation period 2 C-statistic 0.70 (“COVID-AGE”) vs 0.78 (“full”) among females; 0.68 (“COVID-AGE”) vs 0.76 (“full”) among males).


We have proposed a modelling approach based on landmarking which enables dynamic incorporation of time- and region-dependent information on infection prevalence. We have demonstrated that our modelling approach can provide well-calibrated estimates, with good discrimination, of absolute risk of 28-day COVID-19 death in the general population. In contrast, absolute risk estimates cannot be transported from models which do not explicitly model the infection prevalence, although they did rank individuals well. We demonstrated that the performance of our proposed modelling approach depended critically on the proxy used for infection prevalence. We found that the calibration of our modelling approach worsened when applied to time periods with patterns of infection not seen in the model development data. We did not undertake external validation; performance may worsen in completely new settings.

Identifying a readily available proxy of infection prevalence is challenging in practice. Models including model-based estimates provided the best performance, but this may be impractical for most situations due to the complex modelling required to obtain these estimates. Models using proxies more readily available through automated data streams gave slightly inferior performance, albeit considerably better than ignoring the burden of infection entirely. Proxies involving more serious consequences of infection, such as A&E attendance, performed poorly when the underlying infection prevalence was low, since these serious consequences are rare and therefore imprecisely estimated in low prevalence periods. Furthermore, one limitation of this work was that we were unable to explore the level of granularity required in the data used as a proxy for infection prevalence. Where COVID-19 testing is geographically widespread, direct estimates of local infection are likely to provide the best measure. While this was unavailable in the UK at our study start, more recent testing data are much more comprehensive.

Joint modelling offers an alternative approach to obtaining dynamic predictions [21]. We chose a landmarking approach because it is computationally efficient and it is easier to extend to make predictions beyond the temporal subset of data used for model development.

Our underlying theoretical model for COVID-19-related death suggested it may be possible to separate the dynamics of the epidemic from the process of risk prediction based on patient characteristics, allowing the infection process to be ignored, provided that models carefully account for geographical region. Surprisingly, we found that our approach A models, which did not account for region, were still able to provide good discrimination which transported well geographically and temporally. This provides reassurance that these potential theoretical biases do not lead to substantial deterioration in practical performance in terms of ranking patients, but region-specific performance suggested accounting for region may be important when good calibration is desired.

We found that the existing COVID-AGE tool [13, 14] had very high discrimination, similar to the best performing models considered, suggesting this provides a reliable ranking of COVID-19 mortality risk. Interestingly, very simple models including only age, sex, ethnicity, a rural indicator, and a count of total comorbidities led to models with very good discrimination. When focusing on specific patient groups with higher morbidity, more complex models may provide useful additional discrimination, but in many cases much simpler models are able to discriminate well.

We did not correct any of the models for optimism [22, 23], although the similarity of the C-statistics in the overall and geographical internal validations suggests optimism was not a major problem. We compared a number of models that differed in terms of parsimony, ranging from no variable selection to a model containing only age and sex. The regression penalization used in the selected model has the effect of shrinking model coefficients; however, over-optimism can still remain [24].

Electronic health record data are not collected for research, so information on certain characteristics can be incomplete or absent. Our approach to missing data reflected the way in which these models might be used in practice if applied within electronic health record systems. Our measures of model performance reflect performance under this implementation [25]. Differences observed between approaches are unlikely to have been affected by the approach taken to missing data.


The pandemic and our response to it are evolving rapidly. As vaccines and booster programmes roll out, risk prediction models will need to be updated and re-calibrated. Our results have a number of implications for researchers developing risk prediction algorithms in COVID-19. First, models that do not explicitly model the infection prevalence can provide good discrimination which transports well geographically and temporally. However, the calibration of such models is poor when applied to different temporal and geographical contexts. When calibration is important, models must explicitly model the infection prevalence. However, performance of these models is very sensitive to the quality of the proxy measure used. Furthermore, our temporal validation suggested that for these models, calibration declines when applied to data which has patterns of infection prevalence not seen in the data used to develop the model, such as occurred in our third validation period. Discrimination, conversely, did not decline. For all models, region-specific calibration was worse than overall calibration, suggesting that there may be regional differences which are important for COVID-19 risk prediction beyond (our measures of) infection levels. While we have focused on COVID-19 death, these findings are relevant for models predicting other severe outcomes of COVID-19 in the general population, such as hospitalisation.

Finally, most of the discriminating power in each model evaluated here was driven by simple features such as age, sex and a count of comorbidities. Complex risk prediction models driven by multiple variables from diverse sources can be difficult, slow and expensive to implement in routine care: our results suggest that the opportunity costs and complexity of such implementations may not be warranted. The finding that simple models produce very high discrimination also suggests that policies targeting population-level reduction of COVID-19 mortality risk may not need to distinguish between all comorbidities in detail. For policy decisions, including ongoing restrictions on social contact and return-to-work strategies, a simple approach to predicting risk incorporating simple eligibility criteria, could accelerate programme rollout and delivery of policies.

Availability of data and materials

Access to the underlying identifiable and potentially re-identifiable pseudonymised electronic health record data is tightly governed by various legislative and regulatory frameworks and restricted by best practice.

The data in OpenSAFELY is drawn from General Practice data across England where TPP is the Data Processor. TPP developers (CB, RC, JP, FH, and SH) initiate an automated process to create pseudonymised records in the core OpenSAFELY database, which are copies of key structured data tables in the identifiable records. These are linked onto key external data resources that have also been pseudonymised via SHA-512 one-way hashing of NHS numbers using a shared salt. DataLab developers and PIs (BG, LS, CEM, SB, AJW, WJH, DE, PI, and CTR) holding contracts with NHS England have access to the OpenSAFELY pseudonymised data tables as needed to develop the OpenSAFELY tools. These tools in turn enable researchers with OpenSAFELY Data Access Agreements to write and execute code for data management and data analysis without direct access to the underlying raw pseudonymised patient data and to review the outputs of this code.

All code for the full data management pipeline—from raw data to completed results for this analysis—and for the OpenSAFELY platform as a whole is available for review at



Accident and Emergency Department


General practitioner


International Classification of Diseases


National Health Service (UK)


Office for National Statistics


  1. WHO. WHO Director-General’s opening remarks at the media briefing on COVID-19: 11 March 2020. Published 2020.

  2. WHO. COVID-19 situation reports. Published 2020.

  3. UK Government. COVID-19: comparison of geographic allocation of cases in England by lower tier local authority. Published 2020.

  4. Williamson EJ, Walker AJ, Bhaskaran K, Bacon S, Bates C, Morton CE, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584(7821):430–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Collins G, Reitsma JB, Altman D, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350(jan07 4).

  6. Davies NG, Kucharski AJ, Eggo RM, Gimma A, Edmunds WJ, Jombart T, et al. Effects of non-pharmaceutical interventions on COVID-19 cases, deaths, and demand for hospital services in the UK: a modelling study. Lancet Public Health. 2020;5(7):e375–85.

    Article  PubMed  PubMed Central  Google Scholar 

  7. van Houwelingen HC, Putter H. Dynamic predicting by landmarking as an alternative for multi-state modeling: an application to acute lymphoid leukemia data. Lifetime Data Anal. 2008;14(4):447–63.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Lau B, Cole SR, Gange SJ. Competing risk regression models for epidemiologic data. Am J Epidemiol. 2009;170(2):244–56.

    Article  Google Scholar 

  9. Barlow W, Ichikawa L, Rosner D, Izumi S. Analysis of case-cohort designs. J Clin Epidemiol. 1999;52(11):1165–72.

    Article  CAS  PubMed  Google Scholar 

  10. Onland-Moret NC, van der A DL, van der Schouw YT, et al. Analysis of case-cohort data: a comparison of different methods. J Clin Epidemiol. 2007;60(4):350–5.

    Article  PubMed  Google Scholar 

  11. Pavlou M, Ambler G, Seaman S, De Iorio M, Omar RZ. Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Stat Med. 2016;35(7):1159–77.

    Article  PubMed  Google Scholar 

  12. Rahman MS, Ambler G, Choodari-Oskooei B, Omar RZ. Review and evaluation of performance measures for survival prediction models in external validation settings. BMC Med Res Methodol. 2017;17(1):60.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Covid-19 Medical Risk Assessment. Accessed Sept 2021.

  14. Coggon D, Croft P, Cullinan P, Williams A. Assessment of workers’ personal vulnerability to covid-19 using ‘covid-age’. Occup Med. 2020;70(7):461–4.

    Article  Google Scholar 

  15. Newson RB. Comparing the predictive powers of survival models using Harrell’s C or Somers’ D. Stata J. 2010;10(3):339–58.

    Article  Google Scholar 

  16. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiol Camb Mass. 2010;21(1):128–38.

    Article  Google Scholar 

  17. Royston P. Tools for checking calibration of a Cox model in external validation: approach based on individual event probabilities. Stata J. 2014;14(4):738–55.

    Article  Google Scholar 

  18. Steyerberg EW, Harrell FE Jr. Prediction models need appropriate internal, internal-external, and external validation. J Clin Epidemiol. 2016;69:245–7.

    Article  PubMed  Google Scholar 

  19. Hughes RA, Heron J, Sterne JAC, Tilling K. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int J Epidemiol. 2019;48(4):1294–304.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Williamson E, Tazare J, Bhaskaran K, et al. Study protocol: comparison of different risk prediction modelling approaches for COVID-19 related death using the OpenSAFELY platform [version 1; peer review: 1 approved]. Wellcome Open Res. 2020;5(243).

  21. Ferrer L, Putter H, Proust-Lima C. Individual dynamic predictions using landmarking and joint modelling: validation of estimators and robustness assessment. Stat Methods Med Res. 2019;28(12):3649–66.

    Article  PubMed  Google Scholar 

  22. Smith GCS, Seaman SR, Wood AM, Royston P, White IR. Correcting for optimistic prediction in small data sets. Am J Epidemiol. 2014;180(3):318–24.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Harrell FJ, Lee K, Mark D. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15(4):361-387, DOI: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.

  24. Musoro JZ, Zwinderman AH, Puhan MA, ter Riet G, Geskus RB. Validation of prediction models based on lasso regression with multiply imputed data. BMC Med Res Methodol. 2014;14(1):116.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Groenwold RHH. Informative missingness in electronic health record systems: the curse of knowing. Diagn Progn Res. 2020;4(1):8.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


This work uses data provided by patients and collected by the NHS as part of their care and support. We are very grateful for all the support received from the TPP Technical Operations team throughout this work and for generous assistance from the information governance and database teams at NHS England/NHSX. We thank Professor Rafael Perera, Nuffield Department of Primary Care Health Science, for useful comments.


This work was supported by the Medical Research Council [grant number MR/V015737/1]. EW was supported by the MRC [project grant MR/S01442X/1]. JT is funded by an MRC PhD Studentship [MR/N013638/1]. TPP provided technical expertise and infrastructure within their data centre pro bono in the context of a national emergency. BG’s work on better use of data in healthcare more broadly is currently funded in part by NIHR Oxford Biomedical Research Centre, NIHR Applied Research Collaboration Oxford and Thames Valley, the Mohn-Westlake Foundation, NHS England, and the Health Foundation; all DataLab staff are supported by BG’s grants on this work. LS reports grants from Wellcome, MRC, NIHR, UKRI, British Council, GSK, British Heart Foundation, and Diabetes UK outside this work. JPB is funded by a studentship from GSK. AS is employed by LSHTM on a fellowship sponsored by GSK. KB holds a Wellcome Senior Research Fellowship. HIM is funded by the National Institute for Health Research (NIHR) Health Protection Research Unit in Immunisation, a partnership between Public Health England and LSHTM. AYSW holds a fellowship from BHF. RG holds grants from NIHR and MRC. ID holds grants from NIHR and GSK. RM holds a Sir Henry Wellcome Fellowship funded by the Wellcome Trust. HF holds a UKRI fellowship. RME is funded by HDR-UK and the MRC. KDO holds a Sir Henry Dale fellowship jointly funded by Wellcome and the Royal Society [grant number 218554/Z/19/Z].

The views expressed are those of the authors and not necessarily those of the NIHR, NHS England, Public Health England or the Department of Health and Social Care.

Funders had no role in the study design, collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

Author information

Authors and Affiliations




BG conceived the platform and the approach; BG and LS led the project overall and are guarantors; SB led the software; EJW, JT and KB led the statistical analysis; CEM and AJW led on codelists and implementation; and AM led on information governance. Contributions are as follows: data curation, CB, JP, JC, SH, SB, DE, PI and CEM; analysis, EJW, KB, AJW and CEM; funding acquisition, BG and LS; information governance, AM, BG, CB and JP; methodology, EJW, KB, AJW, BG, LS, CB, JP, JC, SH, SB, DE, PI, CEM, RG, DH, RK KD-O, ES and RP; disease category conceptualization and codelists, CEM, AJW, PI, SB, DE, CB, JC, JP, SH, HJC, KB, SB, AM, BM, LT, IJD, HIM, RM and HF; ethics approval, HJC, EJW, LS and BG; project administration, CEM, HJC, CB, SB, AM, LS and BG; resources, BG, LS and FH; software, SB, DE, PI, AJW, CEM, CB, FH, JC and SH; supervision, BG, LS and SB; writing (original draft), EJW and JT. All authors were involved in the design and conceptual development and reviewed and approved the final manuscript.

BG affirms that this manuscript is an honest, accurate and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.

Authors’ information

Not applicable

Corresponding author

Correspondence to Elizabeth J. Williamson.

Ethics declarations

Ethics approval and consent to participate

NHS England is the data controller; TPP is the data processor; and the key researchers on OpenSAFELY are acting on behalf of NHS England. This implementation of OpenSAFELY is hosted within the TPP environment which is accredited to the ISO 27001 information security standard and is NHS IG Toolkit compliant [18, 19]; patient data has been pseudonymised for analysis and linkage using industry standard cryptographic hashing techniques; all pseudonymised datasets transmitted for linkage onto OpenSAFELY are encrypted; access to the platform is via a virtual private network (VPN) connection, restricted to a small group of researchers; the researchers hold contracts with NHS England and only access the platform to initiate database queries and statistical models; all database activity is logged; only aggregate statistical outputs leave the platform environment following best practice for anonymisation of results such as statistical disclosure control for low cell counts [20]. The OpenSAFELY research platform adheres to the obligations of the UK General Data Protection Regulation (GDPR) and the Data Protection Act 2018. In March 2020, the Secretary of State for Health and Social Care used powers under the UK Health Service (Control of Patient Information) Regulations 2002 (COPI) to require organisations to process confidential patient information for the purposes of protecting public health, providing healthcare services to the public and monitoring and managing the COVID-19 outbreak and incidents of exposure; this sets aside the requirement for patient consent [21]. Taken together, these provide the legal bases to link patient datasets on the OpenSAFELY platform. GP practices, from which the primary care data are obtained, are required to share relevant health information to support the public health response to the pandemic and have been informed of the OpenSAFELY analytics platform.

This study was approved by the Health Research Authority (REC reference 20/LO/0651) and by the LSHTM Ethics Board (reference 21863).

Consent for publication

Not applicable

Competing interests

All authors have completed the ICMJE uniform disclosure form at and declare the following: BG has received research funding from the Laura and John Arnold Foundation, the NHS National Institute for Health Research (NIHR), the NIHR School of Primary Care Research, the NIHR Oxford Biomedical Research Centre, the Mohn-Westlake Foundation, NIHR Applied Research Collaboration Oxford and Thames Valley, the Wellcome Trust, the Good Thinking Foundation, Health Data Research UK (HDRUK), the Health Foundation, and the World Health Organisation; he also receives personal income from speaking and writing for lay audiences on the misuse of science. IJD has received unrestricted research grants and holds shares in GlaxoSmithKline (GSK).

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Supplementary materials

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

The OpenSAFELY Collaborative., Williamson, E.J., Tazare, J. et al. Comparison of methods for predicting COVID-19-related death in the general population using the OpenSAFELY platform. Diagn Progn Res 6, 6 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: