Skip to main content

Table 1 Performance characteristics of binary tests and continuous prediction models with various degrees of miscalibration. All values given were calculated directly from the formulae in the text and independently verified using a simulation approach (Appendix)

From: The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models

      Net benefit
Test Specificity Sensitivity AUC Brier score Threshold: 5% Threshold: 10% Threshold: 20%
Binary tests
 Assume all negative 100% 0% 0.500 0.2000 0.0000 0.0000 0.0000
 Assume all positive 0% 100% 0.500 0.8000 0.1579 0.1111 0.0000
 Highly specific 95% 50% 0.725 0.1400*
0.1169
0.0979 0.0956 0.0900
 Highly sensitive 50% 95% 0.725 0.4100*
0.1386
0.1689 0.1456 0.0900
Continuous prediction models
 Well calibrated 0.75 0.1386 0.1595 0.1236 0.0716
 Overestimating risk 0.75 0.1708 0.1583 0.1160 0.0423
 Underestimating risk 0.75 0.1540 0.1483 0.0986 0.0413
 Severely underestimating risk 0.75 0.1760 0.0921 0.0372 0.0076
  1. AUC, Brier score, and net benefit for various threshold probabilities corresponding to binary tests and continuous prediction models with various degrees of miscalibration predicting an outcome with prevalence of 20%, as shown in Fig. 1. Higher values of AUC and net benefit are desirable whereas lower values of the Brier score are desirable
  2. *Method 1 calculation: binary test is considered to produce probabilities of 1 and 0 for a positive and negative test, respectively
  3. Method 2 calculation: binary test is considered to produce probabilities of the positive predictive value and 1 − negative predictive value for a positive and negative test, respectively
\