An enhanced version of FREM (Fracture Risk Evaluation Model) using national administrative health data: analysis protocol for development and validation of a multivariable prediction model

Background Osteoporosis poses a growing healthcare challenge owing to its rising prevalence and a significant treatment gap, as patients are widely underdiagnosed and consequently undertreated, leaving them at high risk of osteoporotic fracture. Several tools aim to improve case-finding in osteoporosis. One such tool is the Fracture Risk Evaluation Model (FREM), which in contrast to other tools focuses on imminent fracture risk and holds potential for automation as it relies solely on data that is routinely collected via the Danish healthcare registers. The present article is an analysis protocol for a prediction model that is to be used as a modified version of FREM, with the intention of improving the identification of subjects at high imminent risk of fracture by including pharmacological exposures and using more advanced statistical methods compared to the original FREM. Its main purposes are to document and motivate various aspects and choices of data management and statistical analyses. Methods The model will be developed by employing logistic regression with grouped LASSO regularization as the primary statistical approach and gradient-boosted classification trees as a secondary statistical modality. Hyperparameter choices as well as computational considerations on these two approaches are investigated by an unsupervised data review (i.e., blinded to the outcome), which also investigates and handles multicollinarity among the included exposures. Further, we present an unsupervised review of the data and testing of analysis code with respect to speed and robustness on a remote analysis environment. The data review and code tests are used to adjust the analysis plans in a blinded manner, so as not to increase the risk of overfitting in the proposed methods. Discussion This protocol specifies the planned tool development to ensure transparency in the modeling approach, hence improving the validity of the enhanced tool to be developed. Through an unsupervised data review, it is further documented that the planned statistical approaches are feasible and compatible with the data employed.


Background
Osteoporosis is a prevalent disease which represents a considerable and growing public health problem.The number of patients with osteoporosis in Europe is projected to rise from 27.5 million in 2010 to 33.9 million in 2025, corresponding to an increase of 23%, as a result of the increasing number of elderly in the population [1].
Furthermore, osteoporosis causes debilitating fragility fractures which are significant causes of mortality and morbidity for patients [2][3][4].It is estimated that the annual number of osteoporotic fractures in Europe will rise from 4.28 million fractures in 2019 to 5.34 million fractures in 2034, corresponding to an expected increase of 24.8% [5].Even though osteoporosis represents a significant healthcare burden on both patients and society, patients at risk of fracture do not receive appropriate treatment, resulting in a large treatment gap [6].Results from the ICUROS study showed that among patients who sustained a low-energy hip fracture in 10 countries (Australia, Austria, Estonia, France, Italy, Lithuania, Mexico, Russia, Spain, and the UK), only 27% were prescribed fracture prevention treatment following the fracture, even though prior fracture is a known risk factor of subsequent fracture [7], showing that even secondary prevention is lacking [8].Insufficient case-finding strategies cause osteoporosis to remain underdiagnosed and undertreated [1].The treatment gap in osteoporosis therefore calls for improved detection of high-risk individuals.
Several fracture risk assessment tools have been developed to provide risk stratification of patients [9].Such tools integrate known risk factors of osteoporosis and fracture into an estimate of fracture risk for patients.Some of the most commonly used tools include the fracture risk assessment tool (FRAX) [10] and the Garvan Institute of Medical Research-Bone Fracture Risk Calculator [11].FRAX and Garvan use a set of cross-sectional clinical risk factors to estimate fracture risk, which typically requires manual data entry from physicians to retrieve fracture risk predictions.Furthermore, these tools predict long-term fracture risk, thus excluding patients with high imminent fracture risk.
However, new minimal physician-effort strategies are needed to systematically detect high-risk individuals.The Fracture Risk Evaluation Model (FREM) is a fracture risk assessment model that was developed to identify individuals at high imminent (1 year) risk of fracture, using hospital diagnoses from Danish registries and age as predictors.Predictors were chosen solely on the basis of their statistical predictive performance, without any assumptions on causality or biological relationship between predictor and outcome.Further, the model was developed for each sex, separately, to account for sex differences in fracture risk [12].FREM exclusively utilizes routinely collected administrative health data from the Danish national registers, and the model therefore lends itself towards automatic risk predictions.FREM identified 38 and 43 hospital diagnoses in women and men, respectively, as risk factors for major osteoporotic fracture (MOF).Further, FREM identified 32 hospital diagnoses in both women and men as risk factors of hip fracture (HF).FREM showed good accuracy in prediction of MOF resulting in AUC = 0.750 (95% CI 0.741; 0.795) and AUC = 0.752 (95% CI 0.743; 0.761) for women and men, respectively.FREM showed even higher accuracy in prediction of HF resulting in AUC = 0.874 (95%CI 0.869; 0.879) and AUC = 0.851 (95% CI 0.841; 0.861) for women and men, respectively [12].A validation study of FREM was recently performed in an updated extraction of Danish registry data which still proved good accuracy of FREM in prediction of MOF and HF [13].Additionally, FREM has been externally validated in Canadian hospitalization and physician claims data proving significant fracture risk stratification [14].FREM has the potential to be integrated into the primary health care system and can-without any manual data entry required by the general practitioners-provide a single easy-to-interpret estimate for each patients' risk of MOF and HF.FREM has recently been suggested in international publications as a practicable future code of practice to identify patients at very high risk of osteoporotic fractures [15].
However, FREM holds the potential for further improvements in fracture risk prediction through the inclusion of data from additional health registries and the utilization of more advanced statistical methods including approaches from machine learning in model development.In particular, positive predictive values of the original FREM are fairly low [13], which is in accordance with an algorithm predicting a low prevalence outcome but also indicates a potential for reducing the number of false positives implied by FREM.Moreover, the original development utilized stepwise selection methods for identifying risk factors and employed data-splitting for assessing model performance, both techniques for which more modern equivalents have been developed [16].
Therefore, the aim of this article is to describe the development of an enhanced version of FREM and the evaluation of its performance in a Danish context.
The article follows the reporting guidelines set forth in the TRIPOD statement [17].

Data and data management
Data currently consists of the entire Danish population aged 45 or older on January 1, 2018, coupled with a 15-year look-back (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017) in the Danish national registries, and these data are used in the analyses reported in the current article.We intend to update the data to include later years, before performing the analyses outlined in this protocol.Included individuals are observed for a period of 1 year from the index date, during which outcomes may occur.The 15-year look-back is used to collect information on personal characteristics and exposures, and we refer to these collectively as predictors.Other fracture prediction studies employ cohorts that are slightly younger (40 years and older [18][19][20]) or older [21][22][23].Trémollieres and colleagues argue that a cutoff at 45 years allows for the inclusion of early menopausal women, who may be at increased risk of osteoporotic fracture [24].The sample size is dictated by the study design and the data obtained from the Danish national registries.The original FREM study included a sample of 2,495,339 individuals [12].
Data sources largely coincide with those used for the original FREM tool [12], except that the present model also utilizes information on medicine redemptions and uses an updated data extraction from the registries.All data management and analyses are performed on a secure server at the Danish Health Data Authority.

Data sources
Data is extracted from a collection of Danish health registries.The Danish health registries cover all citizens of Denmark and offer long-term data on the entire population and may generally be characterized as having a high degree of validity and completeness [25].Data from the following three national registries are used in this study: • The Danish Civil Registration System (CRS) includes all individuals residing in Denmark [26].Upon birth or immigration, an individual is assigned a unique person identification number which may be used to identify the individual across registries [27].The CRS is used to identify the study population and to determine age and sex.
• The Danish National Patient Register (NPR) is an administrative register containing all inpatient, outpatient, and emergency department contacts from Danish hospitals, since 1977 (outpatient and emergency departments since 1995) for both public and private hospitals (private hospitals since 2003) [28].
Records contain information on clinical characteristics along with procedures and treatment, in particular the diagnosis (-es) associated to the contact.The NPR is used to identify the outcomes of interest along with predictors using ICD-10 (International Classification of Diseases 10th Revision)-coding.• The Danish National Prescription Registry (DNPR) contains individual-level data on all dispensed prescription pharmaceuticals sold in Danish community pharmacies [29].Data comprises information on ATC (Anatomical Therapeutic Chemical Classification System)-codes, price reimbursement, etc., at the redemption level.The DNPR is used to define predictors.

Outcome definitions
Two outcomes will be considered, a major osteoporotic fracture (MOF) or a hip fracture (HF).The primary outcome is MOF, defined as fracture of hip, clinical vertebral, wrist, or humerus (given as a primary or secondary ICD-10 code in the NPR: S120, S121, S122, S220, S221, S320, T08, S422, S423, S720, S721, S722, S525, or S526) [12] (Table 1).The secondary outcome is HF, which is given as a primary or secondary ICD-10 code in the NPR (S720, S721, or S722) [12] in combination with a surgical code (KNFB, or one of its sub-codes, or KNFJ4x-9x) [30].Both definitions of MOF and hip fractures will be evaluated in a forthcoming medical record review and will potentially be updated following the results of this review.

Diagnoses as predictors
Information is collected on all non-administrative level 3 ICD-10 codes.We record both the occurrence and date of each ICD-10 code.If more than one instance of the same diagnosis occurs in the look-back period, the first record is used.Note that previous experiences of an osteoporotic fracture thus enter as predictors through the ICD10 codes and that those with previous fractures are not excluded.The diagnosis exposure is defined as the time from the first diagnosis to the end of the look-back period.Rare diagnoses are excluded based on the unsupervised review as described below.

Pharmacological exposures from redemptions
Redemptions (i.e., prescriptions filled) are perceived as relevant risk predictors in two ways.First, the redemption may act as a proxy for a diagnosis that is not recorded through the administrative health registers under consideration.This may occur since the NPR contains information on diagnoses from hospital contacts only and not from general practice consultations.Diagnosis registrations from general practice are difficult to retrieve for research purposes as they are not registered in a national database but in systems provided by privately owned software providers.However, data on redemptions of prescription drugs prescribed by general practitioners can be retrieved from the DNPR and redemptions may therefore act as proxies for diagnoses from general practice.Secondly, the redemption may signify exposure to a pharmaceutical with effects on fracture risk, which may influence the risk of outcome by e.g., affecting the individual's bone density or by altering the risk of experiencing a serious fall.The second class of pharmaceuticals with effects on fracture risk are further subsets depending on their assumed type of effect.We record whether the ATC code was redeemed during the look-back period and further record a measure of the extent of exposure.The following pharmaceutical exposures are used: 1. Medications associated with risk of developing osteoporosis: Defined by the ATC codes in Table 2 (taken from [31]).Exposure is coded as time (in days) from the earliest redemption in the look-back period and defined as zero if no redemption occurred.2. Fall-specific: Table 3 gives an overview of pharmaceuticals related to fall risk and the associated ATC codes.Their exposure is coded as 15 times 365.25 (i.e., the number of days in the lookback period) minus the number of days from the last redemption during the look-back period and is defined to be zero if no redemption occurred.Thus, a high exposure signifies a very recent redemption.

Diagnosis proxies: ATC codes to be used as proxies
for diagnoses are all non-osteoporosis-specific codes included on the therapeutic level of the ATC hierarchy (level 3), meant to reflect the same therapeutic areas as the ICD-10 codes.For these ATC codes, we follow the principle described above for diagnoses and record the time of the first redemption.Exposure is the time from the first redemption to the index  date (January 1, 2018).Note that a single redemption of an osteoporosis-specific drug may contribute both to the ATC-specific variable and to a diagnosis proxy.
In the unsupervised review, we investigate any potentially ensuing problems of collinearity.
We retain all pharmaceutical exposures regardless of their prevalence of redemption.

Loss to follow-up
We record loss to follow-up due to emigration prior to an outcome.This is expected to occur in only a very small fraction of the over-45-year-old population within a year, and we assume that any censoring of the outcome is the same as no fracture.Death before the outcome is treated as no fracture.

Incomplete look-back
Not all included individuals have a full 15-year record in the registries, most commonly due to immigration.We record whether the look-back period is complete along with the number of years for which records could be obtained from the most recent immigration in the look-back period.An incomplete lookback period is likely to cause misclassification of risk factors as being absent, and we attempt to address this by including an incomplete lookback period as a predictor as detailed below.

Prediction models
Analogous to the data management steps, analyses are performed on a secure server using R (version ≥ 4.1 ) [32].

Models
Let Y MOF i and Y HF i be indicators for subject i experienc- ing a MOF or HF, respectively, in 2018 and let be a row vector with predictors (age, sex, indicator for an incomplete look-back period (ILB), diagnoses, and redemptions in the look-back period) for subject i .Here, diagnoses and redemptions are themselves vectors.For a vector v i we denote by v i,k its k'th entry.
We consider two types of prediction models, the primary model being LASSO-regularized logistic regression and the secondary-boosted decision trees.The purpose of both is to predict the 1-year fracture risks.using the predictors in X i .

Primary: logistic regression with grouped LASSO regularization
We describe our proposed model for MOFs, but the same approach is used for HFs.Note that in accordance with the original FREM model, sex is considered an effect modifier for all predictor effects.The logistic regression model originally planned is, where f 0 , f k , and g k are natural cubic splines (i.e., piece- wise polynomial functions that are twice continuously differential, the polynomial being linear outside the outer knots and cubic inside).The spline function does not include an intercept and is determined by a number of parameters equal to the number of knots plus one.Due to the lack of intercept, whenever the exposure argument is zero, the spline is zero and consequently the intercept α sex may be interpreted as the log-odds of MOF for an individual of sex sex , who is 45 years old, and has a com- plete look-back period in which no relevant diagnoses or redemptions were recorded.It was realized a priori that most diagnoses and redemption exposures have high proportions of observations at zero (i.e., unexposed individuals).While this does not constitute a computational problem for the spline regression (as it would eg for fractional polynomials), it influences the choice of knots and perhaps the interpretation of the spline effect [33].The choice of knots was informed from the unsupervised data review.Due to low prevalences of most diagnoses and redemptions, it was decided to use no inner spline knots (thus modeling the effects as linear due to the boundary constraint).We retain the spline on age using four knots.
The spline is estimated by expanding the function in its basis functions.We originally planned to apply grouped LASSO-regularization when estimating the logistic model, so that every basis function belonging to a specific spline was grouped together.This means that the effect of an exposure to a diagnosis or pharmaceutical is either selected or not in the sparse solution provided by the LASSO.However, numerical experimentation in the unsupervised data showed that the group-regularization was infeasible due to the size of the data set.Since there is only the age spline in the model, we instead apply ungrouped LASSO regularization.The amount of regularization is controlled by the parameter .

Tuning the shrinkage parameter
The choice of the regularization parameter is often referred to as "tuning" the model.We tune the LASSO model using the cross-entropy loss, for a class y and predicted class probability π .This is the negative log-likelihood of a binomial variable or, up to a constant, the deviance.Tuning is based on estimates of the loss function on a -grid.We use the default grid chosen by the software.To ameliorate overfitting, estimates are obtained from 10-fold cross-validation.The shrinkage parameter is then taken to be the which minimizes the loss on the grid.

Secondary: gradient-boosted classification trees
As above, we focus on the MOF outcome.The basic model is, where h is a prediction rule that is to be determined by gradient-boosting classification trees [34].A sequential method, boosting focuses on the part of the data that is not well-classified by the rule and adds a learner to better target this.Specifically, in gradient boosting, the new learner is chosen to predict the loss function's gradient (corresponding intuitively to the part of the data that is not well predicted) and the rule is updated by weighting the new learner using a weight (the "step size") estimated from the data.The learning rate may be dampened by introducing a multiplier for the step size.
As a loss function, we use the cross-entropy defined in [35].Each learner is a classification tree of fixed depth (i.e., the number of terminal nodes).Since the primary analysis model includes no interactions, we take the tree depth to be 2, thus allowing each tree to contribute with up to two-factor interactions between predictors.The learning rate is chosen to be 0.1 with considerations of computational speed as investigated in the outcomeblinded code tests.Similarly, the number of trees is initially taken to be 1000, a choice which is updated using out-of-bag samples to estimate the loss function as a function of the number of trees.Aside from predictions from the model, we calculate the relative importance of the predictors in X (as described in [34]).

Evaluation of model performance
For both models, we consider measures of discriminative ability (how well does the algorithm separate those with fracture from those without fracture) as well as their ability to perform absolute risk prediction (how precisely does the model predict the fracture risks), i.e., the model's calibration.
As a measure of discriminative ability, we use the area under the receiver operating characteristic curve (AUC), which may be interpreted as the probability that an individual with a fracture is assigned higher risk by the model than an individual with no fracture.
To assess calibration, we inspect calibration plots that arise by plotting observations against predicted probabilities and smoothing the relationship using a LOWESS smoother (e.g., [36]).Plots are constructed for the primary and secondary models and for both MOF and HF.As a measure of calibration, we further calculate the measure [16,36], where π and π c are the model-predicted risk probabili- ties and the corresponding value of the calibration curve, respectively.Thus, E max (0, b) is a measure of the largest discrepancy between the predicted and "true" risks in the range 0 to b .Varying b shows calibration in different ranges of risks (where lower risks are presumed much more prevalent in the population).We plot the measure for increasing b and report E max (0, 1) as the primary sta- tistic for calibration.As a supplementary measure of calibration, we use the integrated calibration index, which is the expected absolute discrepancy over the distribution of risk probabilities [36].

Internal validation by bootstrapping
Non-parametric bootstrapping is used to perform interval validation [16,37].For both the AUC and E max (0, 1) measure, the optimism due to overfitting is estimated: A bootstrap sample is formed from the data set by sampling with replacement, and the outlined analysis procedures are performed using the bootstrap sample as a training set and the original data as a test set.The performance measure in the training set minus that on the test set is the observed optimism, which is then estimated by averaging over the bootstrap samples.The apparent performance measure may then be corrected by subtracting the optimism.
Note that for the LASSO regression, the choice of lambda grid is also performed in each bootstrap sample.

Unsupervised data review (i.e., blinded to the outcome)
The data management steps described above were performed on the raw data from the Danish Health Data Authority.Several considerations emerging from the review have already been reported above in their appropriate context.
Diagnosis exposures and diagnosis proxies from redemption data were formed.We excluded rare diagnoses and proxy diagnoses that had a prevalence below 0.1%.Generally, predictor prevalences were low, and consequently, it was decided to omit inner knots from splines (leading to a linear effect due to the boundary condition for the natural spline) to avoid overfitting the exposure effects.As described above, we modeled the age effect using 4 knots.
Investigations of collinearity revealed 11 predictors (6 medications and 5 diagnoses), which were highly collinear with other predictors and hence were removed from the selection procedure.
Following the blinded review of predictors, 785 potential predictors remained (including age and sex).
To guide the choice of hyperparameters, the primary and secondary analyses described above were implemented, and the code was tested in an approach blinded to the outcome by simulating an outcome independently from all predictors with an overall prevalence of 1%. Figure 1 includes diagnostic plots for the primary and secondary analyses.Although based upon a single simulation, the results are promising as they show that there is no tendency of the methods to overfit the data: Both methods correctly identify no relevant predictors as they should since the outcomes have been simulated independently from risk factors.
The code was implemented in R version 4.1 using the glmnet ([38], version 4.1-4) package to fit the primary analysis and the gbm package ([34], version 2.1.8.1) for the secondary analysis.The mock analysis differed from those described above in that it was implemented on the entire data set, not stratified by sex.

Discussion
In this protocol paper, we described the planned enhancement of the FREM tool by including additional data sources and applying a more advanced statistical methodology.Furthermore, the exposures and statistical methodology were investigated in an unsupervised data review.
The main strength of the planned study is building on top of an existing tool, which has already been validated in multiple settings [13,14], hence ensuring that the resulting enhanced version of FREM will potentially improve on top of an already applicable tool.The enhanced FREM has the potential to be integrated into the primary health care system as a decision support tool to optimize the identification of individuals at high risk of MOF.Moreover, the use of administrative registry data including a complete population ensures a limited risk of selection bias in the tool development.
The main limitations of the planned study are the validity of exposures and outcomes obtained from administrative registries.While some exposures have been shown to be valid in the literature [28], this has not been investigated for all included exposures.The validity of outcomes will be investigated in a forthcoming medical record review, which will be concluded before the planned study is performed.Furthermore, the statistical methods planned and applied in the unsupervised data review make some assumptions on association structure, and misspecification of these models could influence the validity of the results.The planned approach of using two separate statistical approaches (LASSO and gradient-boosted classification trees) will ensure that model misspecification will be detectable as a discrepancy between those two strategies.Following the development of an updated version of FREM, external validations will be performed in future studies, both in different populations outside of Denmark, as well as temporal validations to investigate a possible drift in the model performance.These external validations are not part of the current protocol.
Results of the planned study will be reported in one or multiple scientific papers in international peerreviewed journals, and deviations from this protocol, if any occur, will be pointed out in detail and substantiated in these publications.
In this paper, we have described the forthcoming development of an updated version of the FREM tool including additional data sources and more advanced statistical methods.We find that our approach will take into account 785 possible predictors after selecting against collinearity and sparsity.Moreover, our unblinded review has identified appropriate choices for the hyperparameters to be used in the development of an enhanced version of FREM.

Table 1
Outcome definition of MOF with ICD-10 coding

Table 2
Osteoporosis-specific medications

Table 3
Fall risk medications and associated ATC codes