Step 1: Benefit is good
Figure 1 shows only the most essential elements of a decision curve analysis. The result for the prediction model is the light gray line, and the diagnostic test is the dashed line. The two other lines are for “intervention for all” (thin black line) and “intervention for none” (thick black line).
“Intervention” is used in a general sense: it might refer to drugs or surgery, but it could also encompass lifestyle advice, additional diagnostic workup, or subsequent monitoring. Indeed, intervention reflects any action that a patient at high risk from a model, or getting a positive result on a diagnostic test, would consider to improve their health, or their life in general. The exact intervention depends on the clinical setting. In our study of prostate cancer in men with elevated PSA, intervention would mean prostate biopsy. To give other examples, in a study of infection, intervention might be giving antibiotics; in a study of heart disease prevention, intervention might be giving statins. In a study of palliative surgery for advanced cancer, with an endpoint of death within 3 months; however, the idea would be to avoid surgery in patients at high risk and intervention would be “best supportive care.” Note that in the original paper, describing decision curve analysis, and in many empirical applications, the word “treat” is used in place of intervention.
Decision curve analysis includes results for “intervention for all” and “intervention for none” because these are often reasonable clinical strategies [10, 11]. To give a specific example, one reasonable strategy in the prostate biopsy study would be to biopsy all patients with elevated PSA irrespective of the results of the diagnostic test or prediction model. Indeed, this is generally what happens in contemporary practice, where men who have a PSA above a certain threshold are routinely biopsied without additional testing. On the other hand, we might imagine a study of men with low PSA, who are not subject to biopsy in routine clinical practice. Some of these men do have high-grade prostate cancer, and researchers might be investigating a suitable test. In this case, the reference strategy would be “intervention for none.”
On the figure, the y-axis is benefit and the x-axis is preference. The benefit of a test or model is that it correctly identifies which patients do and do not have disease (in our example, high-grade cancer). Preference refers to how doctors value different outcomes for a given patient, a decision that is often influenced by a discussion between the doctor and that patient. Both preference and benefit are described in further detail below: at this stage, it is only important to know that benefit is good and that preferences vary. It is easily seen that the light gray line, corresponding to the prediction model, has the highest benefit across a wide range of values of preference. Hence, we can conclude that, except for a small range of low preferences, intervening on (i.e., biopsying) patients on the basis of the prediction model leads to higher benefit than the alternative strategies of biopsying all patients, biopsying no patients, or only biopsy those patients who are positive on the diagnostic test. For the prostate biopsy study, the conclusion is that using the model to determine whether patients should have a biopsy would lead to improved clinical outcome.
Step 2: Preference refers to how doctors value different outcomes for their patients
Following a consultation and a discussion with some patients, a doctor might be particularly worried about missing disease; for other patients, the doctor may be more concerned about avoiding unnecessary intervention. Doctors may also vary in their propensity to intervene, some being more conservative, others more aggressive. In Fig. 1, the extremes of the x-axis for preference are “I’m worried about disease” and “I’m worried about biopsy.” In the case of prostate cancer biopsy, a doctor who, for a given patient, has a preference towards the left end of the x-axis weighs the relative harm of missing a high-grade cancer as much greater than the harm of unnecessary biopsy. This may be, for instance, because the patient is younger and has school-age children, and so very much prioritizes finding any lethal cancer at a curable stage: this patient is clearly “worried about disease,” consistent with a low threshold for continuing diagnostic workup. A doctor with a preference for a given patient towards the right of the x-axis wants to avoid biopsy if possible. This might reflect a patient who does not like the idea of invasive medical procedures or a doctor treating an older patient and who is skeptical about the value of early detection in that population: they are “worried about biopsy” and will opt for biopsy only if the patient is at particularly high risk.
This helps us take our interpretation a little bit further. We can see that the model has higher benefit than the other approaches, apart from doctors who fall in the “very worried” category, for whom the benefit is actually slightly higher for the strategy of “intervention for all.” This makes intuitive sense: a patient with an elevated PSA who has a strong preference for early identification of potentially lethal cancer might want to go straight ahead and get a biopsy rather than depend on a second model or test that is not 100% accurate.
Step 3: The unit of preference is threshold probability
Our model gives a patient’s predicted probability of high-grade cancer. One might assume that if the model estimated the patient’s risk as 1%, both the patient and the doctor would agree that there was no need for biopsy; if the risk was 99%, however, the doctor would advise and the patient accept that biopsy was indicated. Comparable conclusions would be drawn if the risks were 2% versus 98%. We might imagine that we vary the risks, counting up from 2% and down from 98% until the doctor is no longer sure. For instance, a doctor might say “Thinking about this patient, I wouldn’t do more than 10 biopsies to find one high-grade cancer in patients with similar health and who think about the risks and benefits of biopsy vs. finding cancer in the same way. So if a patient’s risk was above 10% I do a biopsy, otherwise, I just carefully monitor the patient and perhaps do a biopsy later if I saw a reason to.”
The relationship between preference and threshold probability is perhaps the easiest to see when using the odds. The risk of 10% is an odds of 1:9, so in using a threshold probability of 10%, the doctor is telling us “missing a high-grade cancer is 9 times worse than doing an unnecessary biopsy” [2]. This can be interpreted as the “number-needed-to-test,” that is, 10% is a number-needed-to-test of 10. Figure 2 shows threshold probabilities on the x-axis. Odds are also shown for didactic purposes, although these are omitted when presenting decision curves. This helps us to understand our previous conclusion that patients who are particularly worried about disease do not benefit from using the model. We can now see that it is only if threshold probabilities are less than 2 or 3% that we should avoid using the model. That would be a stretch in prostate cancer, where biopsy is invasive, painful, and associated with the risk of sepsis. However, such a low threshold might be plausible in some other scenarios, for instance, biopsy for skin cancer, which is a far less risky and less invasive procedure. Note also that the curve is only plotted up to 20%. This is because, given the relative risks of missing a high-grade prostate cancer compared to the harms of biopsy, we would consider it unreasonable for any patient or doctor to demand greater than 20% risk before accepting biopsy. The plausible range of thresholds hence depends critically on context. Elsewhere, we describe in detail the process by which a reasonable range of thresholds can be agreed upon [2].
Step 4: Benefit is actually net benefit
Figure 2 also shows the correct units for benefit, what is known as “net benefit.” The “net” in “net benefit” is the same as in “net profit,” that is, income minus expenditure. If, say, a wine importer buys €1m of wine from France and sells it in the USA for $1.5m, then if the exchange rate is €1 to $1.25, the net profit is income in dollars (1.5m) − expenditure in euros (1m) × exchange rate (1.25) = $250,000. Leaving aside, for the sake of simplicity, the issue of risk and the time and trouble to trade, this is equivalent to being given $250,000 without having to do any trading. In the case of diagnosis, the income is true positives (e.g., finding a cancer) and the expenditure is false positives (e.g., unnecessary biopsies), with the “exchange rate” being the number of false positives that are worth one true positive. The exchange rate will depend on the relative seriousness of the intervention and outcome. For instance, we will be willing to conduct more unnecessary biopsies to find one cancer if the biopsy procedure is safe vs. dangerous or the cancer is aggressive vs. more indolent. The exchange rate is calculated, as explained above, from the threshold probability. Another analogy is with net health benefit or net monetary benefit, which both depend on the willingness to pay threshold in their exchange of benefits in terms of health and costs [12].
The unit of net benefit is true positives. A net benefit of 0.07, for instance, means “7 true positives for every 100 patients in the target population.” So just like in the example of net profit for the wine trader, a net benefit of 0.07 would be the equivalent of identifying 7 patients per 100, all of whom had disease. In the prostate biopsy example, a 0.07 net benefit would be equivalent to a strategy where 7 patients per 100 were biopsied and all were found to have high-grade tumors. Also comparable to the business example, where a profit of $250,000 could result from various combinations of income and expenditure, a net benefit of 0.07 could result from different combinations of true and false positives.
Step 5: Net benefit can also be expressed as interventions avoided
In many scenarios, the most common strategy is to “intervention for all” rather than to “intervention for none.” Indeed, this is the case for our prostate cancer example, where urologists routinely biopsy all patients with an elevated PSA. In these scenarios, a model or test would aim to reduce unnecessary intervention. Net benefit can be expressed in terms of true negatives rather than true positives. Figure 3 shows an example of this type of decision curve. This could be interpreted that, at a risk threshold of 10%, use of the prediction model would be the equivalent of a strategy that reduced the number of unnecessary biopsies by about 40 per 100 without missing biopsy for any patients with high-grade cancer. Expressing net benefit in terms of avoided unnecessary diagnostic procedures or avoided unnecessary treatments is recommended if the reference strategy is “intervention for all.” Note that doing so does not change any conclusions as to which model or test has the highest net benefit.