Skip to main content

Table 1 Summary of performance measures for quantifying added value

From: Quantifying the added value of new biomarkers: how and how not

Measure Advantages Disadvantages
Likelihood-based measures Reflects probability of obtaining the observed data Based on assumed model
 Likelihood ratio (LR), change in AIC or BIC The LR test is the uniformly most powerful test for nested models. The AIC and BIC can be used to assess non-nested models. While powerful, statistical association or model improvement may not be of clinical importance.
Discrimination Assesses separation of cases and non-cases Only one component of model fit
 Difference in ROC curves, AUC, c-statistic Assesses discrimination between those with and without outcome of interest across the whole range of a continuous predictor or score. Useful for classification Based on ranks only. Does not assess calibration. Differences may not be of clinical importance.
Clinical risk reclassification Examines difference in assigning to clinically important risk strata Strata should be pre-defined. Loses information if strata are not clinically important
 Reclassification calibration statistic Assesses calibration within cross-classified risk strata A test for each model is needed
 Categorical NRI Can assess changes in important risk strata. Cases and non-cases can be considered separately Depends on the number of categories and cut points used
 NRI(p) Nice statistical properties. Does not vary by event rate in the data May not be clinically relevant
 Conditional NRI Indicates improvement within clinically important risk subgroups Biased in its crude form, and a correction based on the full data is needed.
Category-free measures Does not require cut points May lose clinical intuition
 Brier score Proper scoring rule May be difficult to interpret; the maximum value depends on incidence of the outcome.
 NRI(0) Continuous, does not depend on categories Based on ranks only. Measure of association rather than model improvement. Behavior may be erratic if the new predictor is not normally distributed.
 IDI Nice statistical properties. Related to the difference in model R2 Depends on event rate. Values are low and may be difficult to interpret.
Decision analytics Estimates clinical impact of using model Not a direct estimate of model fit or improvement. Need reasonable estimates of decision thresholds
 Decision curve Displays the net benefit across a range of thresholds Does not compare model improvement directly but clinical consequences of using the models for treatment decisions
 Cost-benefit analysis Compares costs and benefits of one models or treatment strategy vs. another Need detailed estimates of costs and benefits of misclassification, including further diagnostic workup and treatments