Development, validation and clinical usefulness of a prognostic model for relapse in relapsing-remitting multiple sclerosis

Background Prognosis for the occurrence of relapses in individuals with relapsing-remitting multiple sclerosis (RRMS), the most common subtype of multiple sclerosis (MS), could support individualized decisions and disease management and could be helpful for efficiently selecting patients for future randomized clinical trials. There are only three previously published prognostic models on this, all of them with important methodological shortcomings. Objectives We aim to present the development, internal validation, and evaluation of the potential clinical benefit of a prognostic model for relapses for individuals with RRMS using real-world data. Methods We followed seven steps to develop and validate the prognostic model: (1) selection of prognostic factors via a review of the literature, (2) development of a generalized linear mixed-effects model in a Bayesian framework, (3) examination of sample size efficiency, (4) shrinkage of the coefficients, (5) dealing with missing data using multiple imputations, (6) internal validation of the model. Finally, we evaluated the potential clinical benefit of the developed prognostic model using decision curve analysis. For the development and the validation of our prognostic model, we followed the TRIPOD statement. Results We selected eight baseline prognostic factors: age, sex, prior MS treatment, months since last relapse, disease duration, number of prior relapses, expanded disability status scale (EDSS) score, and number of gadolinium-enhanced lesions. We also developed a web application that calculates an individual’s probability of relapsing within the next 2 years. The optimism-corrected c-statistic is 0.65 and the optimism-corrected calibration slope is 0.92. For threshold probabilities between 15 and 30%, the “treat based on the prognostic model” strategy leads to the highest net benefit and hence is considered the most clinically useful strategy. Conclusions The prognostic model we developed offers several advantages in comparison to previously published prognostic models on RRMS. Importantly, we assessed the potential clinical benefit to better quantify the clinical impact of the model. Our web application, once externally validated in the future, could be used by patients and doctors to calculate the individualized probability of relapsing within 2 years and to inform the management of their disease.


Introduction
Multiple sclerosis (MS) is an immune-mediated disease of the central nervous system with several subtypes. The most common subtype is relapsing-remitting multiple sclerosis (RRMS) [1]. Patients with RRMS present with acute or subacute symptoms (relapses) followed by periods of complete or incomplete recovery (remissions) [2]. Effective treatment of patients with RRMS can prevent disease progression and associated severe consequences, like spasticity, fatigue, cognitive dysfunction, depression, bladder dysfunction, bowel dysfunction, sexual dysfunction, pain, and death [3].
Relapses have been commonly used as a primary efficacy endpoint in phase III randomized clinical trials leading to market approval of RRMS therapies, although the strength of the association between relapses and disease progression (an outcome of highest interest to patients) is still debated [4,5,6]. Prognosis for relapses in individuals with RRMS could support individualized decisions and disease management. A prognostic model for relapses may also be helpful for the efficient selection of patients in future randomized clinical trials and, therefore, for the reduction of type II errors in these trials [7]. In addition, such a model could support individualized decisions on initiation or switch of disease-modifying treatment (DMT). To our knowledge, no widely accepted prognostic model for MS has been used in clinical practice yet.
A recent systematic review of prediction models in RRMS [8] identified only three prognostic models (i.e. models that focus on predicting the outcome instead of predicting treatment response) with relapses as the outcome of interest [7,9,10]. However, all three studies had methodological shortcomings. Only one small study, with 127 patients, used a cohort of patients that is considered the best source of prognostic information [8,10]. All three studies used complete cases, excluding cases with missing data, analysis without justifying the assumptions underlying this approach; given the potential non-random distribution of missing data, the results might be biased [11]. In addition, none of them validated internally their model and they did not present calibration or discrimination measures. Hence, they might be at risk of misspecification [12]. In addition, none of them used shrinkage to avoid overfitted models [13]. Finally, none of the studies evaluated the clinical benefit of the model, an essential step, which quantifies whether and to what extent a prognostic model is potentially useful in decision-making and clinical practice. Similar limitations exist in other published prognostic models, which commonly have serious deficiencies in the statistical methods, are based on small datasets and have inappropriate handling of missing data and lack validation [14].
In this research work, we aim to fill the gap of prognostic models on relapses for RRMS patients. We present the development, the internal validation, and the evaluation of the clinical benefit of a prognostic model for relapses for individuals with RRMS using real-world data from the Swiss Multiple Sclerosis Cohort (SMSC) [15]. The cohort is comprised of patients diagnosed with RRMS who are followed bi-annually or annually in major MS centres with full standardized neurological examinations, MRIs and laboratory investigations [15]. Our prognostic model is designed for a patient who, within the Swiss health care system and standard MS treatment protocols, would like to estimate their probability of having at least one relapse within the next 2 years.

Data and methods
In "Data description", we describe the data available for the model development. We followed seven steps (described in detail in "Steps in building the prognostic model") to build and evaluate the prognostic model: (1) selection of prognostic factors via a review of the literature, (2) development of a generalized linear mixedeffects model in a Bayesian framework, (3) examination of sample size efficiency, (4) shrinkage of the coefficients, (5) dealing with missing data using multiple imputations, (6) internal validation of the model. Finally, we evaluated the potential clinical benefit of the developed prognostic model. For the development and the validation of our prognostic model we followed the TRI-POD statement [16]; the TRIPOD checklist is presented in Appendix Table 3.

Data description
We analysed observational data on patients diagnosed with relapsing-remitting multiple sclerosis (RRMS) provided by the Swiss Multiple Sclerosis Cohort (SMSC)) study [15], which has been recruiting patients since June 2012. SMSC is a prospective multicentre cohort study performed across seven Swiss centres. Every patient included in the cohort is followed up every 6 or 12 months, and the occurrence of relapses, disability progression, DMTs initiation or interruption, adverse events, and concomitant medications are recorded at each visit. Brain MRI and serum samples are also collected at each visit. The strength of SMSC is the high quality of data collected including MRI scans and body fluid samples in a large group of patients. In addition, several internal controls and validation procedures are performed to ensure the quality of the data.
We included patients with at least 2 years of followup. The drop-out rate in the entire SMSC cohort was 15.8%. Drop-out was primarily associated with change of address and health care provided by a physician not associated with SMSC. Therefore, we assume that patients dropping out of the cohort before completing 2 years were not more likely to have relapsed than those remaining in the cohort, and hence the risk of attrition bias is low. The dataset includes 935 patients, and each patient has one, two, or three 2-year follow-up cycles. At the end of each 2-year cycle, we measured relapse occurrence as a dichotomous outcome. At the beginning of each cycle, several patient characteristics are measured and we considered them as baseline characteristics for this specific cycle. In total, we included 1752 cycles from the 935 study participants. Patients could be prescribed several potential DMTs during their follow-up period, i.e. a patient during a 2-year follow-up cycle could either take no DMT or one of the available DMTs. We used the treatment status only at baseline of each 2-year cycle to define the dichotomous prognostic factor "currently on treatment" or not.
We transformed some of the continuous variables to better approximate normal distributions and merged categories with very low frequencies in categorical variables. Table 1 presents summary statistics of some important baseline characteristics using all cycles (n = 1752), while in Appendix Table 4, we present the outcome of interest (frequency of relapse within 2 years), as well as several baseline characteristics separately for patients that were included in 1 cycle, patients that were included in 2 cycles, and patients that were included in 3 cycles.

Notation
Let Y ij denote the dichotomous outcome for individual i where i=1, 2, …, n at the j th 2-year follow-up cycle out of c i cycles. PF ijk is the k th prognostic factor k=1,…,np. An individual develops the outcome (Y ij = 1) or not (Y ij = 0) according to its probability p ij .
Steps in building the prognostic model Step 1-Selection of prognostic factors Developing a model using a set of predictors informed by prior knowledge (either in the form of expert opinion  or previously identified variables in other prognostic studies) has conceptual and computational advantages [17,18,19]. Hence, in addition to the information obtained from the three prognostic models included in the recent systematic review discussed in introduction 7, 9, 10,  Step 2-Logistic mixed-effects model We developed a logistic mixed-effects model in a Bayesian framework: We used fixed-effect intercept (β 0 ), fixed-effect slopes (β k ), individual-level random effects intercept (u oi ), and individual-level random effects slopes (u ki ) to account for information about the same patient from different cycles.
We define u¼ u o u k to be the (np + 1) × n matrix of all random parameters and we assume it is normally distributed u~N(0, D u ) with mean zero and a (np + 1) × (np + 1) variance-covariance matrix This structure assumes that the variances of the impact of the variables on multiple observations for the same individual are equal (σ 2 ) and that the covariances between the effects of the variables are equal too (ρ × σ 2 ).
Step 3-Examination of sample size efficiency We examined if the available sample size was enough for the development of a prognostic model [16]. We calculated the events per variable (EPV) accounting for both fixed-effects and random-effects and for categorical variables [20]. We also used the method by Riley et al. to calculate the efficient sample size for the development of a logistic regression model, using the R package pmsampsize [21]. We set Nagelkerke's R 2 = 0.15 (Cox-Snell's adjusted R 2 = 0.09) and the desired shrinkage equal to 0.9 as recommended [21].
Step 4-Shrinkage of the coefficients The estimated effects of the covariates need some form of penalization to avoid extreme predictions [13,22]. In a Bayesian setting, recommended shrinkage methods use a prior on the regression coefficients [23]. For logistic regression, a Laplace prior distribution for the regression coefficients is recommended [24] (i.e. double exponential, also called Bayesian LASSO) where λ is the shrinkage parameter. A Laplace prior allows small coefficients to shrink towards 0 faster, while it applies smaller shrinkage to large coefficients [25].
Step 5-Multiple imputations for missing data In the case of missing values in the covariates, we assumed that these are missing at random (MAR), meaning that, given the observed data, the occurrence of missing values is independent of the actual missing values. Appropriate multiple imputation models should provide valid and efficient estimates if data are MAR. As our substantive model is hierarchical, we used Multilevel Joint Modelling Multiple imputations using the mitml R package [26].
First, we checked for variables not included in the substantive model that could predict the missing values (i.e. auxiliary variables). Then, we built the imputation model, using both fixed-effect and individual-level random effects intercept and slopes as in our substantive (Model 1), where the dependent variables are the variables that include missing values for imputation, and the independent variables are all complete variables included in the substantive model and the identified auxiliary variables.
We generated 10 imputed datasets, using the jomoImpute R function, and we applied the Bayesian model (Model 1) to each of the imputed datasets. We checked convergence of the imputations using the plot R function in the mitml R package. Finally, we obtained the pooled estimates for the regression coefficients, b β 0 and b β k , using Rubin's rules [27] (testEstimates R function) with two matrices containing the mean and the variances estimates, respectively, from each imputed dataset as arguments. Step 6-Internal validation First, we assessed the calibration ability of the developed model, via a calibration plot with loess smoother, for the agreement between the estimated probabilities of the outcome and the observed outcome's proportion (val.prob.ci.2 R function). We used bootstrap internal validation to correct for optimism in the calibration slope and in discrimination, measured via the AUC [13]. For each one of the 10 imputed datasets, we created 500 bootstrap samples and in each one of them: (1) we constructed a generalized linear model with the prespecified predictors, using the glm R function, denoted as Model*, (2) we calculated the bootstrap performance as the apparent performance of Model* on the sample for each one of the bootstrap samples, (3) we applied the Model* to the corresponding imputed dataset to determine the test performance, (4) we calculated the optimism as the difference between bootstrap performance and test performance. Then, we calculated the average optimism between the 500 bootstrap samples and used Rubin's rules to summarize the optimism for the AUC and the calibration slope between the 10 imputed datasets. We calculated the optimism-corrected AUC and calibration slope of our prognostic model, by subtracting the optimism estimate from the apparent performance. Ideally, we should construct the Bayesian logistic mixed-effects model exactly as we developed the original model. However, this would need 15000 h to run, as the Bayesian model needs to run for 500 bootstrap samples in each one of the 10 imputed datasets (i.e. 5000 times) and the Bayesian model itself needs 3 h, and hence, the bootstrap internal validation we performed results to a rough optimism estimation ignoring the dependence between the same individual.
We used self-programming R routines to validate the model via bootstrapping.

Clinical benefit of the developed model
Decision curve analysis is a widely used method to evaluate the clinical consequences of a prognostic model. This method aims to overcome some weaknesses of the traditional measures (i.e. discrimination and calibration) that are not informative about the clinical value of the prognostic model [28]. Briefly, decision curve analysis calculates a clinical "net benefit" for a prognostic model and compares it in with the default strategies of treat all or treat none of the patients. Net benefit (NB) is calculated across a range of threshold probabilities, defined as the minimum probability of the outcome for which a decision will be made.
More detailed, information about their risk of relapsing within the next 2 years might be important to help patients to re-consider whether their current treatment and approach should continue to follow the established standards of care in Switzerland. If the probability of relapsing is considered too high, maybe RRMS patients would be interested in taking a more radical stance towards the management of their condition: discuss with their treating doctors about more active diseasemodifying drugs (which might also have a high risk of serious adverse events), explore the possibility of stem cell transplantation etc. Let us call this the "more active approach". If the probability of relapsing is higher than a threshold α% then a patient will take a "more active approach" to the management of their condition; otherwise, they will continue "as per standard care".
We examined the net benefit of our final model, via the estimated probabilities provided, by using decision curve analysis and plotting the NB of the developed prognostic model, using the dca R function, in a range of threshold probabilities α% that is equal to NB decision based on the model = ðTrue positive%Þ−ð False positive%Þ Â a% 1−a% . We compare the results with those from two default strategies: recommend "as per standard care for all" and continue "more active approach for all". The NB of "as per standard care for all" is equal to zero in the whole range of the threshold probabilities, as there are no false positives and false negatives. "More active approach for all" does not imply that the threshold probability a% has been set to 0 and is calculated for the whole range of threshold probabilities using the formula: NB more active approach for all ¼ ðprevalenceÞ−ð1−prevalenceÞ Â a% 1−a% These two strategies mean the more active treatment options will be discussed and considered by all patients ("more active approach for all") or with none ("as per standard care for all"). A decision based on a prognostic model is only clinically useful at threshold a% if it has a higher NB than both "more active approach for all" and ("as per standard care for all"). If a prognostic model has a lower NB than any default strategy, the model is considered clinically harmful, as one of the default strategies leads to better decisions [28,29,30,31,32].
We made the analysis code available in a GitHub library: https://github.com/htx-r/Reproduce-results-frompapers/tree/master/PrognosticModelRRMS Results For the model development, we used 1752 observations coming from 2-year repeated cycles of 935 patients who experienced 302 relapses.
First, we took into account the three prognostic models included in the recent systematic review [7,9,10] that predict relapse (not the treatment response to relapses) in patients with RRMS. Our search in PubMed identified 87 research articles. After reading the abstracts, we ended up with seven models that predicted either relapses or treatment response to relapses. Three of them were already included in the recent systematic review, as they predicted relapses. Hence, we identified three additional models that predict the treatment response to relapses [33,34,35], and one research work aiming to identify subgroups of RRMS patients who are more responsive to treatments [36]. Figure 1 shows which prognostic factors were selected and which pre-existing prognostic models were included [7,9,33,34,35,36]. We included none of the prognostic factors included in Liguori et. al.'s [10] model, as none of the prognostic factors they used (i.e. MRI predictors) were included in any other of the available models. We briefly summarize these models in Section 1 of the Appendix file, and some important characteristics of these models are shown in Appendix Table 5.
The prognostic factors included in our model are presented in Table 2 with their pooled estimated b β k , ORs and their corresponding 95% credible intervals (CrIs). We have also developed a web application where the personalized probabilities to relapse within 2 years are calculated automatically. This is available for use in a R Shiny app https://cinema.ispm.unibe.ch/shinies/rrms/. In this example the variance σ 2 is estimated 0.0001 and the covariance ρ × σ 2 are equal to 0.00005. Hence, the random intercept and all random slopes were estimated close to 0. For convenience and speed of estimation, predictions were made using only the fixed effects estimates. In the Supplementary file, Appendix Table 6, we present the estimated coefficients in each of the ten imputed datasets.
The full model's degrees of freedom were 22 (for 10 predictors with random intercept and slope) and the events per variable (EPV) was 13.7. The efficient sample size was calculated as 2084 (to avoid optimism in the regression coefficients), 687 (for agreement between apparent and adjusted model performance), and 220 (for a precise estimation of risk in the whole population) [21]. Our available sample size suggests that there might be optimism in our regression coefficients. However, this should have been addressed via the shrinkage we performed.
In Fig. 2, we show the distributions of the calculated probability of relapsing for individuals by relapse status. The overlap in the distributions of the probabilities is large, as also shown by the optimism-corrected AUC ( Table 2). The overall mean probability of relapsing is 19.1%. For patients who relapsed the corresponding mean is 23.4% whereas for patients who did not relapse is 18.0%. Figure 3 shows the calibration plot, with some apparent performance measures and their 95% confidence intervals (CIs), of the developed prognostic models and represents the agreement between the Fig. 1 Venn diagram of the prognostic factors included at least two times in pre-existing models and included in our prognostic model. The names with an asterisk refer to the first author of each prognostic model or prognostic factor research [ 7,9,10,33,34,35,36]. . EDSS, Expanded Disability Status Scale; Gd, gadolinium estimated probabilities and the observed proportion to relapse within 2 years. In Fig. 4, the exploration of the net benefit of our prognostic model is presented [ 29,30,31,32]. In the figure, the vertical axis corresponds to the NB and the horizontal axis corresponds to the preferences presented as threshold probabilities. The NB is a weight between the benefit of identifying, and consequently correctly treating, individuals that relapsed and the harm (e.g. side effects) of wrongly prescribing patients the "more active approach" due to false positives results. Threshold probabilities refer to how decision makers value the risk of relapsing related to a harmful condition for a given patient, a decision that is often influenced by a discussion between the decision maker and the patient. It is easily seen that the dashed line, corresponding to decisions based on the developed prognostic model, has the highest NB compared to default strategies, between the range 15 and 30% of the threshold probabilities. Nearly half of the patients (46.5%) in our dataset have calculated probabilities between these ranges, in at least one follow-up cycle. Hence, for patients that consider the relapse occurrence to be 3.3 to 6.6 times worse ( 1 a% ) than the risks, costs, and inconvenience in "more active approach", the prognostic model can lead to better decisions than the default strategies. Table 2 Pooled estimates of the regression coefficients c β k ; ORs and the 95% CrIs for each one of the parameters in the model (centralized to the mean), using Rubin's rules. The estimated σ (standard deviation of the impact of the variables on multiple observations for the same individuals) is 0.01. The estimated correlation ρ between the effects of the variables is 0.49. The pooled optimism-corrected AUC is 0.65 and the pooled optimism-corrected calibration slope is 0.91. Disease duration was transformed to log(disease duration+10), and months since last relapse was transformed to log(months since last relapse+10)

Discussion
We developed a prognostic model that predicts relapse within 2 years for individuals diagnosed with RRMS, using observational data from the SMSC [15], a prospective multicenter cohort study, to inform clinical decisions. Prognostication is essential for the disease management of RRMS patients, and until now, no widely accepted prognostic model for MS is used in clinical practice. A recent systematic review on prognostic models for RRMS [8], describes that most of the prognostic models, regardless of the outcome of interest, are lacking statistical quality in the development steps, introducing potential bias, did not perform internal validation, did not report important performance measures like calibration and discrimination, and did not present the clinical impact of the models. More specifically, only three studies examined the relapses as an outcome of interest and none of them satisfied the criteria above.
Our model aimed to fill the existing gap, by satisfying all the above criteria, to enhance the available information for predicting relapses and to inform decision-making.
Given that a manageable number of characteristics is needed to establish the risk score, doctors and patients can enter these using our online tool (https://cinema. ispm.unibe.ch/shinies/rrms/), estimate the probability of relapsing within the next 2 years, and take treatment decisions based on patient's risk score. This tool shows the potential of the proposed approach, however, may not yet be ready for use in clinical practice, as decisionmaking tools need external validation with an independent cohort of patients.
We included eight prognostic factors (all measured at baseline where also the risk was estimated): age, disease duration, EDSS, number of gadolinium-enhanced lesions, number of previous relapses 2 years prior, months since last relapse, treatment naïve, gender, and "currently on treatment". The EPV of our model is 13.7, the sample size is efficient enough, and more than the sample size of all three pre-existing prognostic models. The optimism corrected AUC of our model is 0.65, indicating a relatively small discrimination ability of the model. However, in the literature, only Stühler et. al. reported the AUC of their model that was also equal to 0.65. In our previous work [37], the optimism corrected AUC using the LASSO model, with many candidate predictors, was 0.60, whereas this of the pre-specified model was 0.62. This could indicate that, in general, relapses are associated with unknown factors. The prognostic model we developed seems to be potentially useful, preferred over "Treat all" or "Treat none" approaches for threshold ranges between 15 and 30%.
The applicability of our model is limited by several factors. First, the risk of relapsing is not the only outcome that patients will consider when making decisions; long-term disability status would also determine their choice [4], and there is an ongoing debate of whether the relapse rate is associated with the longterm disability [ 5,6,7,38]. That could be a further line of future research, and prognostic models with good statistical quality for long-term disability still need to be developed. In addition, the sample size of the SMSC is relatively small compared to other observational studies; this study though is of high quality. Furthermore, the bootstrap internal validation we performed ignores the dependence between the same individuals. In each one of the 10 imputed datasets and the 500 bootstrap samples, we constructed a frequentist logistic linear model. Ideally, we should construct the Bayesian logistic mixed-effects model exactly as we developed the original model. In addition, for model parsimony reasons, our model assumes that the variances of the impact of the variables on multiple observations for the same individual are equal and that the covariances between the effects of the variables are equal too. This assumption might be relaxed by, e.g. assuming covariate-specific correlations. Finally, our model was not validated externally, something essential for decision-making tools. In the near future, independent researchers, as recommended by Colins et. al. [39], should validate externally our model before it is ready for clinical use.

Conclusions
The prognostic model we developed offers several advantages in comparison to previously published prognostic models in RRMS. We performed multiple imputations for the missing data to avoid potential bias induced [11], we used shrinkage of the Fig. 4 Decision curve analysis showing the net benefit of the prognostic model per cycle. The horizontal axis is the threshold estimated probability of relapsing within 2 years, a%, and the vertical axis is the net benefit. The plot compares the clinical benefit of three approaches: "as per standard care for all" approach, "more active care for all" approach, and "decision based on the prognostic model" approach (see definitions in "Clinical benefit of the developed model"). For a given threshold probability, the approach with the highest net benefit is considered the most clinically useful model. The "decision based on the prognostic model" approach provides the highest net benefit for threshold probabilities ranging from 15 to 30% coefficients to avoid overfitting [13], and we validated internally our model presenting calibration and discrimination measures, an essential step in prognosis research [13]. Importantly, we assessed the net benefit of our prognostic model, which helps to quantify the potential clinical impact of the model. Our web application, when externally validated, could be used by patients and doctors to calculate the individualized risk of relapsing within the next 2 years and to inform their decision-making.

Appendix
Section 1. Summary of pre-existing models on RRMS used in our model  [35] presented an individualized prediction model using demographic and clinical predictors in patients with MS. Treatment response was analysed separately for disability progression, disability regression, relapse frequency, conversion to secondary progressive disease, change in the cumulative disease burden, and the probability of treatment discontinuation. They used a large cohort study, MSBase, with seven disease-modifying therapies. They validated externally the prediction model in a geographically distinct cohort, the Swedish Multiple Sclerosis Registry. Pre-treatment relapse activity and age were associated with the relapse incidence. 3. Liquori et al. [10] aimed to investigate the prognostic value of 1-year subtraction MRI (sMRI) on change in T2 lesion volume, relapse rate, and change in brain parenchyma fraction. They used 127 patients from a cohort followed in a single centre, the Partners MS Center. They used only MRI and sMRI measures as prognostic factors. 4. Pellegrini et al. [34] developed a prediction model to predict treatment response in patients with relapsing-remitting multiple sclerosis, using an individual treatment response score, regressing on a set of baseline predictors. They used two randomized clinical trials: CONFIRM and DEFINE studies. The outcome of interest was the annualized relapse rate. The prognostic factors they used are age, short form-36 mental component summary, short form-36 physical component summary, visual function test 2.5%, prior MS treatment (yes or no), EDSS, timed 25-foot walk, paced auditory serial addition test (known as PASAT), months since last relapse, number of prior relapses, 9-hole peg test, ethnicity, and sex. 5. Signori et al. [36] aimed to examine whether there are subgroups of RRMS patients who are more responsive to treatments. 9-Hole Peg Test he collect all published randomized clinical trials in RRMS reporting a subgroup analysis of treatment effect. Two main outcomes were studied: the annualized relapse rate and the disability progression. The authors meta-analysed the results of the identified studies to compare the relative treatment effects between subgroups. Age, gadolinium activity, and EDSS were identified as the statistically important subgroups regarding the response to treatments for annualized relapse rate. 6. Sormani et al. [7] developed and validated a prognostic model to identify RRMS patients with a high risk of experiencing relapses in the short term. They used 539 patients from the placebo arm of a double-blind, placebo-controlled trial (CORAL study) of oral glatiramer acetate in RRMS. The validation sample consisted of 117 patients from the placebo arm of a double-blind, placebo-controlled trial of subcutaneous glatiramer acetate in RRMS (European/Canadian Glatiramer Acetate study). The variables included in the final model as independent predictors of relapse occurrence were the number of gadolinium-enhanced lesions and the number of previous relapses. 7. Stühler et al. [33] presented a framework for personalized prediction model of treatment response based on real-world data from the NeuroTransData network for patients diagnosed with RRMS. They examined two outcomes of interest: the number of relapses and the disability progression. They used three different approaches (10-fold cross-validation, leave-one-site-out crossvalidation, and excluding a test set) to validate their model. The predictors included for the number of relapses are age, gender, EDSS, current treatment, previous treatment, disease duration, months since last relapse, number of prior relapses, number of prior therapies, prior second-line therapy (yes or no), duration of the current treatment, duration of the previous treatment, and clinical site. Table 3 TRIPOD checklist was followed for the development and the validation of the prognostic model   Section/topic  Item Checklist item  Page   Title and abstract   Title  1 Identify the study as developing and/or validating a multivariable prediction model, the target population, and the outcome to be predicted.
1 Abstract 2 Provide a summary of objectives, study design, setting, participants, sample size, predictors, outcome, statistical analysis, results, and conclusions.

2-3 Introduction
Background and objectives 3a Explain the medical context (including whether diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models.

3-4 3b
Specify the objectives, including whether the study describes the development or validation of the model or both.

4-5 Methods
Source of data 4a Describe the study design or source of data (e.g. randomized trial, cohort, or registry data), separately for the development and validation data sets, if applicable.

4b
Specify the key study dates, including start of accrual, end of accrual, and, if applicable, end of followup.

5
Participants 5a Specify key elements of the study setting (e.g. primary care, secondary care, general population) including number and location of centres. Missing data 9 Describe how missing data were handled (e.g. complete-case analysis, single imputation, multiple imputation) with details of any imputation method.

9-10
Statistical analysis methods 10a Describe how predictors were handled in the analyses. 7-10 10b Specify type of model, all model-building procedures (including any predictor selection), and method for internal validation.

7-10
10d Specify all measures used to assess model performance and, if relevant, to compare multiple models. 10-12 Risk groups 11 Provide details on how risk groups were created, if done. Not relevant

Results
Participants 13a Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time. A diagram may be helpful. Table 1 and Appendix Table 4 13b Describe the characteristics of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and outcome. Table 1 and Appendix Table 4 Model development 14a Specify the number of participants and outcome events in each analysis. Table 1 and Appendix Table 4 14b If done, report the unadjusted association between each candidate predictor and outcome.

Not relevant
Model specification 15a Present the full prediction model to allow predictions for individuals (i.e., all regression coefficients, and model intercept or baseline survival at a given time point).

Availability of data and materials
The data that support the findings of this study were available from Swiss Multiple Sclerosis Cohort (SMSC). Restrictions apply to the availability of these data, which were used under licence for this study.

Declarations
Ethics approval and consent to participate The use of data for this study was approved by the Cantonal Ethics commission of Bern (Kantonale Ethikkommission für die Forschung, KEK Bern) for the project with ID 2019-02151

Consent for publication
The manuscript does not contain any individual person's data in any form.
Competing interests KC, ES, PB, SS, PB, JK, GD, ME, and GS declare that they have no conflict of interest with respect to this paper. LK´s institution (University Hospital Basel) has received in the last 3 years and used exclusively for research support: Table 6 The estimation of all parameters in the complete dataset and in each one of the imputed datasets Parameters  CC  ID 1  ID 2  ID 3  ID 4  ID 5  ID 6  ID 7  ID 8  ID 9  ID 10 Intercept