Weight loss as a predictor of cancer and serious disease in primary care: an ISAC-approved CPRD protocol for a retrospective cohort study using routinely collected primary care data from the UK

Background Unexpected weight loss is a symptom of serious disease in primary care, for example between 1 in 200 and 1 in 30 patients with unexpected weight loss go on to develop cancer. However, it remains unclear how and when general practitioners (GPs) should investigate unexpected weight loss. Without clarification, GPs may wait too long before referring (choosing to watch and wait and potentially missing a diagnosis) or not long enough (overburdening hospital services and exposing patients to the risks of investigation). The overall aim of this study is to provide the evidence necessary to allow GPs to more effectively manage patients with unexpected weight loss. Methods A retrospective cohort analysis of UK Clinical Practice Research Datalink (CPRD) data to: (1) describe how often in UK primary care the symptom of reported weight loss is coded, when weight is measured, and how GPs respond to a patient attending with unexpected weight loss; (2) identify the predictive value of recorded weight loss for cancer and serious disease in primary care, using cumulative incidence plots to compare outcomes between subgroups and Cox regression to explore and adjust for covariates. Preliminary work in CPRD estimates that weight loss as a symptom is recorded for approximately 148,000 eligible patients > 18 years and is distributed evenly across decades of age, providing adequate statistical power and precision in relation to cancer overall and common cancers individually. Further stratification by cancer stage will be attempted but may not be possible as not all practices within CPRD are eligible for cancer registry linkage, and staging information is often incomplete. The feasibility of using multiple imputation to address missing covariate values will be explored. Discussion This will be the largest reported retrospective cohort of primary care patients with weight measurements and unexpected weight loss codes used to understand the association between weight measurement, unexpected weight loss, and serious disease including cancer. Our findings will directly inform international guidelines for the management of unexpected weight loss in primary care populations.


Background
A 2014 systematic review suggests that the positive predictive value (PPV) for cancer is 33% in patients with an unexpected 10% loss of weight from baseline over 6-12 months. The same review reported a wide range of differential diagnoses for patients with unexpected weight loss, including advanced heart failure, chronic obstructive pulmonary disease, renal disease, pancreatic insufficiency, malabsorption, and endocrine disease, with up to 25% of patients without a diagnosis to explain their weight loss after extended followup [1]. However, these data mainly come from hospital inpatient populations or patients referred to the outpatient clinic where the prevalence of cancer and serious disease is much higher than in primary care as GPs have already filtered out many cases of weight loss that are more likely to be attributable to another cause. Given the absence of appropriate clinical guidelines or standardised practice, clinicians have been reported to take a wide range of action in response to patients with unexpected weight loss, from doing nothing through to ordering "extensive blind investigations" because of the fear of underlying cancer [2].
On the basis of primary care research, NICE (2015) has since suggested that unexpected weight loss is a sign of seven cancers, citing evidence from 14 studies reporting positive predictive values (PPVs) of 0.4-3% [3]. The problem for GPs is how to interpret and implement the term weight loss in these cancer guidelines: NICE do not define the degree of weight loss, or the time period of loss, that should prompt referral. Most cited studies referred to in the NICE guidelines define weight loss on the basis of a coded entry in the GP record, often based on a report of weight loss (volunteered by, or elicited from, the patient) rather than measured weight change [4][5][6]. Only one study referred to by NICE quantified the degree of weight loss that predicts colorectal cancer in primary care reporting odds ratios of 1.2 (95% CI 0.99-1.5) for 5-9.9% and 2.5 (2.1-3) for ≥ 10% weight loss [7]. However, in this study, weight loss was defined by comparing the last recorded weight with the highest recorded weight in the preceding 2 years [7], as weight is not routinely recorded in primary care and is considered a common missing variable in primary care databases [8].
There is an evidence gap for a comprehensive study to describe the use of weight measurement and coding for unexpected weight loss in primary care and for a study that determines the association between unexpected weight loss and cancer and serious disease that may lead to a comprehensive recommendation for the investigation of unexpected weight loss in primary care.

Objective
The overall objective is to provide the evidence necessary to allow GPs to more effectively manage unexpected weight loss.

Aims and rationale
Aim 1.1 To describe how often and when weight is measured, and the symptom of unexpected weight loss recorded as a code, in adults aged > 18 years, in NHS primary care. Aim 1. 2 To describe what action is taken in response to unexpected weight loss, in adults aged > 18 years, in NHS primary care.
Weight measurements and weight loss codes will be categorised using a rule-based search strategy developed as part of this project to identify the clinical purpose and clinical condition related to each weight entry in the primary care record, and the investigations requested, medications prescribed, and referrals made in response to the symptom of weight loss.
Aim 2.1 To identify the predictive value of unexpected weight loss recorded as a symptom for cancer in primary care in adults aged > 18 years.
Aim 2.2 If the symptom of unexpected weight loss predicts cancer, to explore if it is (i) independent of other symptoms, signs, and test results and (ii) restricted to late-stage disease.

Aim 2.3
To ascertain the predictive value of unexpected weight loss recorded as a symptom for serious disease in primary care.
The evidence regarding the predictive value of unexpected weight loss for cancer in primary care, which underpins the 2015 NICE guideline, does not cover all cancer types or take cancer stage at diagnosis into account. We will identify the predictive value of unexpected weight loss in primary care across all cancer types, explore the incremental predictive value of symptom combinations, and examine the association with cancer stage at diagnosis using a matched open cohort study design. In cases where cancer is excluded, an understanding of which alternative diagnoses are related to unexpected weight loss will inform subsequent management decisions in primary care. We will therefore identify the disease groups for which unexpected weight loss is also predictive to develop clinical guidance for the investigation of unexpected weight loss in primary care.

Aim 1: Descriptive
The descriptive epidemiology of weight measurement and weight loss coding in NHS primary care.

Aims 2.1 and 2.3: Hypothesis testing
A cohort study of weight loss as a sign of cancer and serious disease in NHS primary care.

Aim 2.2: Exploratory
Exploratory analysis to investigate the influence of covariates on the relationship between weight loss and the occurrence of cancer and serious disease.

Study design
The design of the study is an open cohort study.

Sample size
In preparing this ISAC application, a preliminary search of 20 GP practices from 2000 to 2013 was conducted. Of 127,024 patients > 40 years with acceptable records, 80,562 (63.4%) had at least one weight measurement recorded during that period, 30,728 (24.1%) had two weight measurements within 6 months of each other, and 40,436 (31.8%) within 1 year; 3079 (2.4%) of patients had a Read code for weight loss but only half of these had an accompanying weight measurement.
Two thousand one hundred eighty-four patients with weight loss are required to detect a hazard ratio of 2 (a change in incidence of 1.5 to 3%) at 99% power (0.05% alpha) using a ratio of one case to five controls. It is anticipated that the study will therefore have sufficient power for stratification by cancer type, cancer stage, and using symptom combinations even though linkage to cancer registry may only be possible in approximately 60% of cancer cases [9].
Preliminary work in Clinical Practice Research Datalink (CPRD) estimated that that unexpected weight loss is coded as a symptom for about approximately 148,000 patients > 18 years and is distributed evenly across decades of age providing adequate statistical power and precision for a comprehensive cohort study investigating cancer and serious disease in adults (> 18 years). For example, if 3% of patients with weight loss develop cancer the number of Events Per Variable will far exceed the minimum number required for robust statistical modelling.

Data linkage
NCDR Cancer Registry Data Linkage to the cancer register is required as cancer is a major outcome variable in this cohort study. Cancer registry data will provide more accurate information on cancer site and stage than reliance on the primary care record.

Office of National Statistics (ONS) mortality data
Linkage is required to cross-validate cause of death for patients confirmed to have died of cancer using Cancer Registry Data linkage and to identify or confirm the cause of death in patients with and without serious disease as identified by the GP record.
Index of Multiple Deprivation (IMD) scores They are required to provide a GP (and where possible patient) level proxy for socioeconomic status to be used when describing both the baseline characteristics in the descriptive analysis of Aim 1 and the cohort analysis of Aim 2. IMD score will also be used as a covariate in the multivariate cox regression analysis as part of Aim 2 (see below).

Study population
The study population is summarised in Fig. 1. Exclusions: -Patients with a diagnosis of cancer prior to the index symptom of weight loss.

Selection of comparison group(s) or controls
Aim 1: Descriptive study -No comparison group is required.

Aim 2: Cohort analysis
-A matched cohort of patients without weight loss-patients without a coded entry for weight loss will be matched for age and sex and selected from the population of patients registered with the same practice having consulted within ± 3 months of the index weight loss code. -Matching for age and sex will ensure there are sufficient patients without weight loss in each age and sex strata.
-A 1:5 sampling ratio achieves the best balance between data cost and statistical power (see sample size).

Exposures, outcomes and covariates
Aim 1: Descriptive study Outcome 1: Objective weight measurement-quantitative weight measurements. Outcome 2: Weight loss code-Read Codes defined in Table 1.
Patients with objective weight measurements or the symptom of unexpected weight loss recorded using the following Medcodes and Read codes listed in Table 1.

Exposure-weight loss
Patients with the symptom of weight loss recorded using the unexpected weight loss Medcodes and Read Codes listed in Table 1. Weight loss codes will be independently categorised for clinical relavence by four co-investigators based on the results of the descriptive analysis, then consensus reached through discussion.

Outcome-cancer
A library of over 1600 Read Codes and ICD-10 codes (grouped by site-see Table 2) developed by Hamilton and colleagues will be reviewed, updated using Read Code searches, and validated through consensus amongst coinvestigators. All new cancer diagnoses in the 24 months following the weight loss code will be identified in CPRD and linked cancer registry data. To inform this analysis, data will also be extracted on cancer stage, grade, tumour size, and histology at diagnosis.

Outcome-serious disease
A library of candidate Read Codes for the most common serious diseases related to unexpected weight loss will be developed by combining two approaches: (i) review of the most frequent diagnostic codes entered in the clinical record within the period surrounding the unexpected weight loss code (descriptive study analysis section); (ii) review of the literature on causes of unexpected weight loss [1,2]. A list of these candidate conditions will be reviewed independently by four co-investigators until consensus is reached on up to 20 serious diseases to be identified in the 24 months following the weight loss code.

Covariates
Data will also be extracted to explore the effect of the following factors which could independently impact the recording of weight and the occurrence of cancer: 1. Personal characteristics-age, gender, ethnicity, smoking history, alcohol intake, family history of cancer, and IMD score recorded before the date of the weight loss code (index date). 2. Co-morbidity-recorded before the index date (no time limit) or implied from the prescribing record at the index date.

Other cancer symptoms and signs-using Read
Codes for symptoms shown to have an independent association with cancer as described by NICE [3]. These will be sought for 3 months before to 2 years after the index date. 4. Results of basic cancer investigations used routinely in primary care: CxR, FBC, LFTs (inc. alkaline phosphatase), calcium, PSA, CA125, and inflammatory markers. These will be sought for 3 months before to 2 years after the index date.

Aim 1: Descriptive study
To describe how often and when weight is recorded, we will request preliminary CPRD searches to identify all: (1) Read coded entries for weight loss and (2) quantitative weight measurements. A subset of patients with weight measurements and unexpected weight loss codes will be used to develop a rule-based search strategy to categorise: (1) the clinical purpose (e.g. prevention, monitoring, diagnosis); (2) the related clinical condition (e.g. diabetes, heart failure, cancer). The GPs' subsequent actions will be described in terms of (1) investigations requested, (2) medications prescribed, and (3) referrals made. The search strategy will then be applied to the entire cohort of weight measurements and weight loss codes.
The most effective method to identify the reason for the weight entry and the subsequent action will be investigated. For example, codelists will be developed to capture the clinical purpose of the consultation associated with each weight measurement or weight loss code: health check codes will be used to identify prevention activity; chronic disease review codes will be used to identify monitoring. For associated clinical conditions, symptom and diagnostic codes entered at the same time as each weight measurement or weight loss code will be ascertained and frequency ranked for the entire descriptive study population. Initially, searches will be performed on the day of the weight entry, then a sensitivity analysis will be performed increasing the time window to ± 1 day of the weight entry, then 1 week, 1 month, and so on. This strategy will be repeated to identify investigation and referral codes following entry of the weight loss code.

Aim 2: Cohort analysis Cumulative incidence plots
Cumulative incidence plots will be used to describe the probability of cancer or serious disease over time for those with and without weight loss. These will be assessed in aggregate and stratified by disease type, cancer stage, grade, tumour size, histology, and covariates.
Differences between those with and without weight loss will be assessed using the log-rank test.

Multivariate Cox regression
Cox regression will be used to estimate the adjusted hazard ratios (HR) for cancer or serious disease associated with weight loss recorded as a symptom. The impact of choosing to restrict the follow-up period on the predictive value of weight loss will be explored by limiting the analysis by time period (0-6, 6-12, 12-18, and 18-24 months) and by including weight loss as a time dependent variable.
Age at index date, sex, ethnicity, IMD score, comorbidity, smoking, and alcohol intake will be included, and the predictive value of other symptoms and investigations will be explored for (1) all cancers in aggregate, (2) cancer type, (3) by cancer stage, (4) by tumour size, (5) by grade of cancer and (6) serious disease type.

Performance of diagnostic strategies
To allow clinical guidance to be developed on how to rule-in or rule-out cancer or serious disease in adult patients (> 18 years) with unexpected weight loss, diagnostic accuracy measures will be calculated for investigative strategies including those described in the literature including the subgroups of (1) gender and (2) age-group.
Plan for addressing confounding Aim 1: Descriptive study Not required.

Aim 2: Cohort analysis
Patients who have conditions which might explain the weight loss (e.g. co-morbidities at the time of entry to the cohort or planned dieting) will be included and the impact of their inclusion assessed in multivariate and sensitivity analyses.
Patients with coded weight loss will be matched with patients without a weight loss code based on GP practice to account for systematic biases in coding between practices.
Age at index date, sex, IMD score, co-morbidity, smoking, and alcohol intake will be adjusted for in the multivariate modelling. Weight is cited as a missing variable in CPRD as GPs do not routinely measure weight in NHS primary care [8]. This descriptive analysis will add to our understanding of how often and when weight is recorded.

Plan for addressing missing data
We will also describe the completeness of personal characteristics (as defined above) in relation to weight measurements and weight loss codes.

Aim 2: Cohort analysis
As measurements appear to be too infrequent to allow us to identify weight loss from serial weight measurement data, the cohort design will make best use of the coded weight loss information available in CPRD. For this reason, we do not intend to impute missing weight measurement values in the primary analysis, although the feasibility of using multiple imputation to address missing covariate values will be explored [10].

Discussion
Within this section, we expand on the protocol as submitted to ISAC to elucidate decisions made about study design and to report developments made since commencing the study. We have incorporated and expanded upon the "Limitations of the study design, data sources and analytical methods" section of the original ISAC protocol.

Reliance on weight loss coding
It appears from our preliminary searches that weight measurement is infrequent for the majority of patients in primary care, most likely initiated by a concern for underlying disease or existing chronic disease management. This is consistent with studies that acknowledge weight measurement as a source of missing data in NHS primary care records [8]. Consequently, the detection of weight loss from serial weight measurements cannot be relied on as a method of defining weight loss. Our descriptive analysis is designed to identify whether a group of patients exists who undergo weight measurements more frequently, in which a future analysis involving serial weight measurements may be feasible. However, any subgroup is unlikely to be representative of the NHS primary care population. We have therefore chosen to focus on weight loss coding.
As with previous primary care studies using routinely collected data, an assumption will be made that the absence of a symptom code represents the absence of the symptom [5,11]. This assumption has two major limitations: firstly, a coded entry is reliant on the patient visiting the GP and reporting the symptom; and secondly, that the GP chooses to enter the code in the record. Lack of the former would lead to an underestimation of the associated HR, and for the latter, selective recording of symptoms only deemed severe by the GP could lead to overestimated HRs. The latter is likely to differ by GP but cluster by GP practice, as GPs within the same practice are likely to have more similar approaches to coding. One method to address these limitations would be to analyse free-text entries to identify reported but uncoded symptoms, but at present CPRD does not allow requests for free-text entries and we will cite this as a weakness of our study [12]. We decided to adjust for age and sex in multivariate analysis as the association between weight loss and cancer is not established for these variables.

Sample size for cohort analysis
Progress since the initial ISAC application has established that there are 148,000 patients eligible patients aged > 18 years with an unexpected weight loss code as described in Appendix 1 (preliminary pilot work had suggested there was at least 30,000). This will therefore be the largest primary care CPRD cohort study using unexpected weight loss coding as the exposure variable. We originally calculated that only 2184 patients with weight loss are required to detect a hazard ratio of 2 at 99% power (0.05% alpha) using an enrolment ratio of 1:5. That is, a change in a cancer risk from a PPV of 1.5% in patients without weight loss to 3% in patients with weight loss. An alternative approach to estimating sample size is the number of Events Per Varaible in multivariate modelling. If 3% of patients with weight loss develop cancer the number of Events Per Variable will far exceed the minimum number of ten required for robust multivariate modelling. It is anticipated that the study will therefore have sufficient power for stratification by cancer type.
We aim to understand the association between weight loss and cancer in as much detail as the data permits. However, we accept it may not be possible to stratify for cancer stage or for other covariates with sufficient numbers remaining in each stratum. Cancer stage information is unsatisfactory in CPRD, which is why we have requested data linkage to the cancer registry (which will also be incomplete, but less so). Lifestyle covariates are non-essential for our main aim (to determine the predictive value of weight loss for cancer), and we will only perform analysis on substrata when numbers permit. Multiple imputation will be explored for these (and all other relevant missing) variables.

Investigation and referral outcomes
There remains uncertainty over the completeness of investigation and referral data until the descriptive analysis has been conducted. Data for laboratory investigations are likely to be more complete than data on radiological and endoscopic investigations, as laboratory investigations are commonly transmitted directly into the electronic health record from the laboratory whereas results for the other tests are not. Further linkage to the Diagnostic Imaging Dataset (for radiology activity) and Hospital Event Statistics (for endoscopy activity) may be necessary if these data are judged to be incomplete following the descriptive analysis, which would allow a formal comparison of data completeness to be conducted between these datasets and CPRD.

Implications
A second cohort study using American primary care data is also in set-up to assess whether there is greater value in defining weight loss using serial weight measurements rather than a reliance on patient reported weight loss and a GP entered code. In particular, this study aims to establish whether weight loss detected using change in serial weight measurements leads to less advanced disease at diagnosis.
Together, these studies will provide the largest reported retrospective cohorts of primary care patients with unexpected weight loss used to understand the association between unexpected weight loss and serious disease including cancer. We hope our findings will directly inform international guidelines for the management of unexpected weight loss in primary care populations.