From: Quantifying the added value of new biomarkers: how and how not
Measure | Advantages | Disadvantages |
---|---|---|
Likelihood-based measures | Reflects probability of obtaining the observed data | Based on assumed model |
Likelihood ratio (LR), change in AIC or BIC | The LR test is the uniformly most powerful test for nested models. The AIC and BIC can be used to assess non-nested models. | While powerful, statistical association or model improvement may not be of clinical importance. |
Discrimination | Assesses separation of cases and non-cases | Only one component of model fit |
Difference in ROC curves, AUC, c-statistic | Assesses discrimination between those with and without outcome of interest across the whole range of a continuous predictor or score. Useful for classification | Based on ranks only. Does not assess calibration. Differences may not be of clinical importance. |
Clinical risk reclassification | Examines difference in assigning to clinically important risk strata | Strata should be pre-defined. Loses information if strata are not clinically important |
Reclassification calibration statistic | Assesses calibration within cross-classified risk strata | A test for each model is needed |
Categorical NRI | Can assess changes in important risk strata. Cases and non-cases can be considered separately | Depends on the number of categories and cut points used |
NRI(p) | Nice statistical properties. Does not vary by event rate in the data | May not be clinically relevant |
Conditional NRI | Indicates improvement within clinically important risk subgroups | Biased in its crude form, and a correction based on the full data is needed. |
Category-free measures | Does not require cut points | May lose clinical intuition |
Brier score | Proper scoring rule | May be difficult to interpret; the maximum value depends on incidence of the outcome. |
NRI(0) | Continuous, does not depend on categories | Based on ranks only. Measure of association rather than model improvement. Behavior may be erratic if the new predictor is not normally distributed. |
IDI | Nice statistical properties. Related to the difference in model R2 | Depends on event rate. Values are low and may be difficult to interpret. |
Decision analytics | Estimates clinical impact of using model | Not a direct estimate of model fit or improvement. Need reasonable estimates of decision thresholds |
Decision curve | Displays the net benefit across a range of thresholds | Does not compare model improvement directly but clinical consequences of using the models for treatment decisions |
Cost-benefit analysis | Compares costs and benefits of one models or treatment strategy vs. another | Need detailed estimates of costs and benefits of misclassification, including further diagnostic workup and treatments |