Development of risk prediction models to predict urine culture growth for adults with suspected urinary tract infection in the emergency department: protocol for an electronic health record study from a single UK university hospital

Background Urinary tract infection (UTI) is a leading cause of hospital admissions and is diagnosed based on urinary symptoms and microbiological cultures. Due to lags in the availability of culture results of up to 72 h, and the limitations of routine diagnostics, many patients with suspected UTI are started on antibiotic treatment unnecessarily. Predictive models based on routinely collected clinical information may help clinicians to rule out a diagnosis of bacterial UTI in low-risk patients shortly after hospital admission, providing additional evidence to guide antibiotic treatment decisions. Methods Using electronic hospital records from Queen Elizabeth Hospital Birmingham (QEHB) collected between 2011 and 2017, we aim to develop a series of models that estimate the probability of bacterial UTI at presentation in the emergency department (ED) among individuals with suspected UTI syndromes. Predictions will be made during ED attendance and at different time points after hospital admission to assess whether predictive performance may be improved over time as more information becomes available about patient status. All models will be externally validated for expected future performance using QEHB data from 2018/2019. Discussion Risk prediction models using electronic health records offer a new approach to improve antibiotic prescribing decisions, integrating clinical and demographic data with test results to stratify patients according to their probability of bacterial infection. Used in conjunction with expert opinion, they may help clinicians to identify patients that benefit the most from early antibiotic cessation.


Background
Urinary tract infection (UTI) is a leading cause of hospital admissions, accounting for 16% of all avoidable emergency admissions [1]. UTI presents with a clinical spectrum that ranges from urosepsis and pyelonephritis to mild urinary symptoms, each of which merits different durations of antibiotic treatment or potentially no antibiotics at all [2,3]. The diagnosis of UTI syndromes is based on a combination of symptoms and microbiological culture of urine (bacteriuria) and/or blood (bacteraemia) [4]. Obtaining microbiological results introduces a bottleneck for evidence-based diagnosis, since cultures often take 48-72 h to grow. In the meantime, patients are often treated with antibiotics. Previous studies have found that up to 50% of such antibiotic use is unnecessary [5][6][7]. A wide range of additional information is collected as part of routine hospital care, which may provide an opportunity to reduce the diagnostic uncertainty introduced by the delay in culture results. Stored within electronic health records (EHR), these auxiliary data may help to create risk prediction models that can be used to predict the likely culture result and identify patients who are highly unlikely to have bacterial UTI.
We are aware of very few studies that have looked into using routine health data to predict bacteriuria in emergency department (ED) settings [8,9]. In a recent study, Taylor et al. predicted bacterial growth in urine sampled from more than 80,000 patients with potential UTI symptoms in four US EDs [8]. Their best performing model achieved an area under the receiver operating characteristic curve (AUROC) of 0.90, with a sensitivity of 61.7% and a specificity of 94.9%. However, there are several reasons why it is difficult to apply this model in an NHS hospital including inclusion of urinalysis results that are not regularly performed in the UK, a relatively broad definition of the population at risk and the exclusion of microbiological culture of blood. In the only other study that we are aware of that attempted to predict bacteriuria in the ED, Wigton et al. achieved a lower AUROC of 0.78 on a sample of 506 patients [9]. Several further studies were performed in primary care settings [10][11][12][13][14] but their generalisability to a generally sicker ED population is questionable.
In this study, we will expand on previously published work [8,9] and develop a model which aims to judge the probability of bacterial UTI in UK patients who present with suspected UTI in the ED. The models will be developed and tested using data on individuals presenting in the ED at Queen Elizabeth Hospital Birmingham (QEHB). QEHB has EHR which are ideally suited for this purpose, containing high-quality and detailed information on diagnoses, outcomes, investigations, vital signs, drug treatments and diagnostic coding dating back to 2011 [15]. Using these hospital records, our model aims to predict the probability that urinary pathogens will grow in urine and/or blood cultures collected during ED attendance. For admitted patients, additional predictions will be made at specific intervals throughout the first three days of their hospital stay to investigate whether additional information gathered during their inpatient stay, but before availability of culture results, allows to predict culture growth with increased certainty. Finally, we will explore differences in model performance and clinical progression for important subpopulations including the elderly and patients with a recorded alternative infective syndrome (e.g. pneumonia) at arrival or discharge, which do not require antibiotics for UTI but may need them for the treatment of the other infection.

Aims and objectives
Aim To use EHR data from a large UK teaching hospital to predict patients' probability of bacterial UTI at arrival among individuals with suspected UTI in the ED.
Objectives a) To develop models that predict bacterial growth in urine and/or blood samples collected during ED attendance based on clinical information recorded in the patient's medical history and in the ED b) To assess the change in predictive performance at pre-defined times after admission (0, 12 [16]. Detailed information on all patients admitted to QEHB is recorded within its electronic patient management system, including clinical diagnoses, observations, assessments and laboratory results [15]. Unlike many other trusts in England, QEHB has also recorded drug prescriptions electronically for more than 10 years, making it an invaluable resource for research linked to antibiotic prescribing.

Development dataset
To develop the predictive models, we will use data from all eligible patients who attended the ED at QEHB between 1 November 2011 and 31 December 2017 (electronic recording of ED diagnosis at QEHB started after a system change at the end of October 2011).

Validation dataset
We will use data collected at QEHB between 1 January 2018 and 31 March 2019 to externally validate the model. Patients who were included in the development dataset due to an earlier attendance will be excluded from the validation dataset. We will undertake external validation of our models in an independent dataset from University College London Hospitals NHS Foundation Trust.

Inclusion and exclusion criteria
All patients who attended the ED at QEHB within the study period and who had a urine sample submitted for microbiological testing within 24 h of arrival are eligible for inclusion in the study. A window of 24 h was chosen to account for discrepancies between when the sample was collected and when the urine sample was recorded in the laboratory system (particularly overnight). Patients enter the study at registration in the ED and exit the study on the earliest of the following dates: date of discharge, date of death, date of transfer to a different hospital or date of urine culture results. Individuals aged < 18 years, pregnant women, patients who were not admitted via the ED and patients whose urine sample was submitted for culture but was not cultured due to standard laboratory protocols at QEHB (see the "Outcome" section for details) will be excluded from the analysis.

Outcome
The principal outcome of interest is microbiological growth (≥ 10 4 colony-forming units/mL). Only urine samples that were eventually cultured will be included in the analysis. Microbiological cultures at QEHB are performed in accordance with standard laboratory procedures (UK Standards for Microbiology Investigations: SMI B41, Investigation of Urine; SMI B37, investigation of blood cultures (for organisms other than Mycobacterium species) [17]. The decision whether to culture a urine sample depends on cell count results performed in the laboratory. Only urines with white blood cell counts and bacteria counts above a threshold value were cultured. At the start of the study, the threshold value for proceeding to culture was white cell counts > 40/μL or bacteria counts > 4000/μL. This was adjusted to white cell counts > 80/μL or bacteria counts > 8000/μL following the introduction of a revised standard operating procedure in the microbiology laboratory in October 2015. Performing cell counts is not possible for urine samples less than 4 mL or for samples too viscous to pass through the instrument. Samples for which cell counts could not be performed are always cultured and included in the analysis. Following the standard procedure at QEHB, (heavy) mixed growth in the urine sample will be considered as contamination, except where E. Coli was present. In addition, samples will be classified as positive if there are < 10 4 colony-forming units/mL but the same urinary pathogen is identified from a blood culture, implying urosepsis.

Predictors
We will consider a wide range of candidate predictors relating to characteristics of the urine sample, a patient's clinical presentation at the start of and throughout the hospital stay, and to risk factors encoded in a patient's medical history (Table 1). Candidate predictors were chosen based on clinical experience, the frequency with which variables are measured in the clinical context where the model is likely to be applied, and existing literature [8].

Sample size
Each year, around 60,000 patients are seen in the ED at QEHB. In 2014, more than 4500 patients were admitted to QEHB and prescribed an antibiotic. Preliminary analysis suggests that 20% of these prescriptions were for suspected UTI syndromes; hence, we expect~5400 admitted patients using data from late 2011 to end of 2017 (6 years) [19]. Based on clinical experience, we expect a similar number of patients with suspected UTI syndromes to be discharged directly from the ED, resulting in an estimated total training sample of~10,800 patients. Assuming a prevalence of bacteriuria of 30% like that reported by Taylor et al. previously, this would imply > 30 events per variable when including all variables defined in Table 1.

Feature engineering and selection
All continuous predictors will be winsorized at the 1st and 99th percentile to account for outliers and normalised to lie within the range (0, 1). Categorical predictors will be encoded in a full-rank encoding, combining levels with a small number of cases (< 5%). Predictors with zero variance will be excluded before analysis. For highly correlated predictors (correlation coefficient > 0.9 using  Spearman's rank correlation), one predictor will be removed before analysis based on clinical judgement. Similarly, predictors which are found to be largely missing and might thus not be expected to be present when the model will be used in practice at QEHB will be removed from the analysis before fitting the models. We will consider the use of fractional polynomials (FP) with up to four degrees of freedom (i.e. 2 fractional polynomial terms) for each numerical predictor [20,21]. We will estimate the optimal number of FPs using the Akaike Information Criterion. Once the best-fitting FPs have been determined, we will consider models with all predictors and parsimonious models selected via backwards feature elimination based on Wald statistics and Rubin's rules [22]. Since the large number of possible predictors might limit the model's usability in clinical practice, we follow Taylor et al. and consider a minimal model based on age, sex, urinalysis results and history of UTI [8].

Type of model
Baseline model in the ED We will first develop a multivariable logistic regression model to predict bacterial growth in the urine and/or blood sample at the end of ED attendance. A prediction will be made for each patient based on the fitted value, which will serve as a baseline comparison for all further models considered.
Landmarking models at distinct time points after hospital admission Additional measurements taken during the first couple of days in hospital may further improve the predictive power of our risk prediction models. We will develop a set of landmarking logistic regression models [23] that predict the probability of bacterial growth in the ED urine sample at pre-defined times t = {0, 12, 24, 36, 48, 60} hours after the patient has left the ED and was admitted to the hospital ward. In order to do so, we require a value for each included predictor at time t. Since predictors are measured irregularly throughout the patient's hospital stay, we will first train a multivariate generalized linear mixed model (MGLMM) on all past predictor values up to time t to estimate the most likely value of each predictor at time t (see the "Missing data" section below for details). Values at time t will be estimated using the best linear unbiased predictors from the empirical Bayes posterior distribution of the random effects, conditional on past predictor measurements [23]. The estimated predictor values will then be fed to a logistic regression model that predicts the probability of microbiological growth in the ED sample after having observed the patient for t hours. As a result, patients might have more than one prediction, one for each time t at which they were still part of the at-risk population. Only patients still admitted and without a culture result at time t will be considered at-risk and will be included in the fitting and evaluation of the logistic regression model for time t.

Missing data
In EHR data, information is only recorded when events take place and we cannot distinguish between cases in which a test or diagnosis was not made and cases in which they were made but not recorded. Consequently, if historical variables such as co-morbidities, procedures, admission records, test results and procedures are not recorded (e.g. because they were performed at another hospital), we will have to assume that these events did not take place. For other variables with missing values that should have been obtained during the current visit (particularly vital signs and laboratory measurements), we will examine the pattern of missingness and impute values where appropriate depending on the type of prediction model.
Our baseline model is a logistic regression, which requires a non-missing value for each included predictor. We will use multivariate imputation by chained equations (MICE) based on the assumption that data are missing at random, i.e. whether a variable is missing or not only depends on the values of observed variables [24]. Following standard MICE procedures [25], we will include all predictors as well as the prediction outcome in the imputation procedure and impute 5 datasets with 10 iterations per dataset ( Table 2). Depending on computational feasibility, we will aim to impute up to 100 datasets for our final model to ensure that we obtain robust imputations. Model training will be performed on the imputed development dataset. However, we cannot use the same imputation procedure to evaluate our models since we expect predictors to also be missing during model deployment. When used in practice, our model must impute any missing data in real-time before making a prediction, but at this point, no outcome will be available yet to use in the imputation. This will tend to result in suboptimal imputations when the model is used in practice [25]. To obtain an honest estimate of the performance of our models, we will evaluate them on a second set of imputations that were fit without using the outcome in the imputation procedure, emulating the situation in which the model will ultimately be used [26]. For our time-dependent models, the nature of missing data slightly differs. Values for each predictor might have been recorded never, once or multiple times before time t and we are interested in estimating the most likely value at time t. To estimate a good approximation for each predictor, we will separately fit a MGLMM at each landmarking time [23]. Each model will include fixed intercepts and slopes for each predictor and a timedependent covariate indicating concurrent antibiotic treatment. We will consider correlation structures of varying complexity, with uncorrelated and correlated patient-specific random intercepts and/or slopes for each predictor. If the MGLMM is intractable, we will consider a simpler last observation carried forward (LOCF) method to estimate predictor values at time t, or a mixture of LOCF and MGLMM.

Model validation
Clinical diagnosis of bacterial UTI requires the presence of urinary symptoms in addition to microbiological culture. Bacteriuria in the absence of urinary symptoms (called asymptomatic bacteriuria) should not be treated with antibiotics [2]. Prevalence of asymptomatic bacteriuria differs between patient groups and increases for example with age. Whereas a urine sample might be sent for culture in many different patients "just in case", a clinically usable model to confirm or rule out suspected bacterial UTI needs to perform especially well in patients with urinary symptoms. In our main analysis, we will therefore validate our models in the subgroup of patients with a suspected ED diagnosis of lower UTI or pyelonephritis, and our final model will be chosen based on the performance in this group. This group differs from the training population, which will include all patients irrespective of ED diagnosis to increase sample size and provide our model with enough power to learn general relationships. In a secondary analysis, we will also evaluate the performance of our models in patients without an ED diagnosis of UTI as well as in different age groups, by sex and by outcome (i.e. discharge diagnosis, death, admission to intensive care unit, length of stay). We will further consider training our model using only data from patients with a suspected ED diagnosis of lower UTI or pyelonephritis for training to ensure that a heterogeneous training population is not obscuring important relationships in patients with suspected UTI. Finally, we will perform secondary analyses limited to the first visit of each patient and to data after 2015, assessing the impact of repeated patient visits and the impact of increased culture thresholds on our models.
Internal validation Model discrimination in each scenario will be assessed via multiple performance metrics: AUROC, Brier score, area under the precision-recall curve (AUPRC), specificity and negative predictive value (NPV). We will estimate each model's specificity and NPV at a pre-set sensitivity of 95%, which will evaluate the model's ability to be used as a screening tool to rule out bacterial UTI. We will assess how well predicted and observed probabilities correspond within each predicted decile (model calibration) by creating a calibration plot and estimating the calibration slope. An estimated slope > 1 indicates underfitting, whereas a slope < 1 indicates overfitting.
Evaluating the model only on the development dataset or a single validation dataset leads to optimistic estimations of the true model performance (henceforth called the apparent performance) [27]. To obtain a more reliable estimate of model performance, we will draw at least 100 bootstrap samples of the development dataset. Where computation time allows for it, we will consider up to 1000 bootstrap samples. All preprocessing and analysis steps including missing data imputation, estimation of fractional polynomials, feature selection and model evaluation will be carried out independently within each bootstrapped sample to avoid any data leakage [28]. The result will be one final model per bootstrapped sample. Evaluating each model on the bootstrap sample in which it was developed provides another estimate of the apparent performance, this time within the bootstrap. To estimate the magnitude of optimism in this bootstrapped apparent performance, we will simultaneously evaluate the bootstrapped model in the original development dataset (called test performance). The difference between test performance and bootstrapped apparent performance will be an estimate of model optimism.
Averaging estimates of the optimism across all bootstrapped samples results in a stable estimate of the optimism [27]. The final, optimism-corrected ("true") estimate of model performance will then be calculated as follows: All metrics used in the model evaluation (AUROC, AUPRC, specificity and NPV) will be adjusted for optimism.

External validation
The performance of the model (AUROC, AUPRC, specificity and NPV) in a new dataset will be evaluated using EHRs from patients with suspected UTI who were admitted to QEHB between 1 January 2018 and 31 March 2019. We will summarise average performance and calibration in this temporally independent sample. We will further validate the model in a geographically independent sample of patients from University College London Hospitals NHS Foundation Trust.

Discussion
The need to reduce inappropriate antibiotic prescribing in secondary care is widely acknowledged, but progress is thwarted by the lack of rapid and reliable diagnostic tests for bacterial infection. Risk prediction models using data contained within EHR offer a new approach to improve antibiotic prescribing decisions, by integrating clinical and demographic data with test results to stratify patients according to their likelihood of bacterial infection.
However, diagnostic uncertainty represents a major obstacle in the application of risk prediction models for bacterial infection. Clinical infection syndromes often overlap, and diagnoses are often not confirmed by microbial culture. This makes it difficult to reliably distinguish infection from non-infectious conditions, but also to discriminate between clinical infection syndromes.
For these reasons, we have not attempted to develop a model which supports decision around antibiotic initiation in the ED, recognising that few doctors will be willing to withhold antibiotics if patients are unwell and the diagnosis is uncertain. Instead, we have opted for a model that identifies patients who may benefit from early antibiotic cessation since they are actually at low risk of bacterial UTI. Descriptive analyses of patients who have been categorised by the model as low/high risk of bacterial UTI will identify categories of patients who are most likely to be low risk, for example based on age, sex and UTI syndrome at presentation. This will be used in conjunction with expert clinical opinion to define a "low-risk" population of patients who have been treated with antibiotics for suspected UTI but are unlikely to benefit from antibiotic treatment. Individuals from this population sub-group will be asked to participate in a proof of concept trial, and randomised to either stop antibiotics early, or to continue antibiotic as per standard care. The trial will assess the safety and feasibility of early antibiotic cessation in these patients and lay the foundation for a future multi-centre trial. It will also demonstrate the potential use of EHR datasets to guide prescribing decisions.