A generic nomogram for multinomial prediction models: theory and guidance for construction

Background The use of multinomial logistic regression models is advocated for modeling the associations of covariates with three or more mutually exclusive outcome categories. As compared to a binary logistic regression analysis, the simultaneous modeling of multiple outcome categories using a multinomial model often better resembles the clinical setting, where a physician typically must distinguish between more than two possible diagnoses or outcome events for an individual patient (e.g., the differential diagnosis). A disadvantage of the multinomial logistic model is that the interpretation of its results is often complex. In particular, the calculation of predicted probabilities for the various outcomes requires a series of careful calculations. Nomograms are widely used in studies reporting binary logistic regression models to facilitate the interpretation of the results and allow the calculation of the predicted probability for individuals. Methods and results In this paper we outline an approach for deriving a generic nomogram for multinomial logistic regression models and an accompanying scoring chart that can further simplify the calculation of predicted multinomial probabilities. We illustrate the use of the nomogram and scoring chart and their interpretation using a clinical example. Conclusions The generic multinomial nomogram and scoring chart can be used irrespective of the number of outcome categories that are present.


Background
The use of multinomial logistic regression modeling has been encouraged to study multiple unordered outcome categories (and possibly their combination) simultaneously [1,2]. The multinomial logistic model can be considered to be an extension of the popular binary logistic regression model, which is often used in the presence of two mutually exclusive outcome categories. More specifically, multinomial logistic regression analysis can be viewed as a series of binary logistic regression analyses where one of the outcome categories is the reference category in each binary sub-model. Researchers often collapse multiple outcome categories into a single binary (composite) outcome and use binary logistic regression models to analyze their data. Consequently, potentially important information about the different outcome categories is lost. It can also be argued that a binary outcome often does not accurately reflect clinical practice, where physicians commonly have to make decisions while considering more than two relevant choices. For instance, physicians often consider the presence of differential diagnoses (and prognoses) for an individual patient simultaneously [1].
One of the key reasons why researchers might refrain from multinomial logistic regression analysis is that the results from these models are more complex to interpret and more elaborate than results from a binary logistic regression analysis. In the multinomial context, regression coefficients are estimated for each binary sub-model reflecting the relation of covariates to one outcome category relative to the reference category. The number of estimated parameters quickly increases with additional outcome categories considered. The large amount of information from multinomial models can easily overwhelm researchers and clinicians. In addition, when using the multinomial logistic model for estimating probabilities for individual patients, the computation involved for the various outcomes requires a series of careful calculations.
Nomograms found many applications in the reporting of binary logistic regression models (for recent examples, see [3,4]). A nomogram can not only improve insights of clinicians into the results of a logistic model, it can also be used to arrive at a predicted probability of outcome(s) of interest that is (are) tailored to the profile of an individual patient in a graphical manner. Nomograms can thus facilitate clinical decision making during clinical encounters. So far, nomograms have been used primarily for improving the reporting of models with only two outcome categories. We are aware of one recent paper that reported on the construction and one on the application of a nomogram for multinomial models. However, the focus of these papers was limited to constructing a nomogram for a limited number of outcome categories [5] and the reporting of results of one specific dataset [6].
In this manuscript, we present how to construct, interpret, and use a generic nomogram for a multinomial logistic prediction model. We will first specify the multinomial regression model and then present a general approach for deriving the nomogram and accompanying scoring chart for such models irrespective of the number of outcome categories that are present. We will illustrate the use of the nomogram and its interpretation using a clinical example on the risk of operative delivery [7].

Multinomial logistic model
Let y i denote the single observed outcome category of individual i. Assuming that this outcome is in one of K categories (e.g., a disease among K possible diseases), we may assume Y i to be a multinomial random variable with probabilities π i1 , . . . , π iK . Conditional on J observed covariate values in vector x i , x i = {x i1 , . . . , x ij , . . . , x iJ }, the probability of observing category k is denoted by π k (x i ). The multinomial logistic model where category K is treated as the reference category, can then be defined as where α k is an intercept term, β k = {β k1 , . . . , β kJ } is a vector of regression coefficients and lp k (x i ) is one of K − 1 linear predictors for individual i. We can now define the probability of each possible category k by (2)

Methods: Constructing the multinomial nomogram
The nomogram we suggest is a special case of a group of nomograms that are formally known as parallel scale nomograms. Doerfler [8] outlined the parallel scale nomogram that can be constructed if a particular value can be calculated from the sum of two functions. To use this approach for multinomial logistic models, we make use of a natural logarithm transformation applied to the elements of Eq. 2, such that, (3) To simplify notation, in the following we let The nomogram is depicted in Fig. 1. Below we detail the four-step procedure to arrive at this nomogram. For further details about the construction of the parallel scale nomogram, we refer to Doerfler [8].

Step 1: placing the outer axes (L and S)
To obtain an adequately sized nomogram, determine the desired common height (h) for the outer two axes (L and S) and the horizontal distance between them (d).
The two parallel axes are placed in the vertical direction. The values for h and d are assigned at the discretion of the researcher in a common metric (e.g., centimeters or inches). Larger values for h and d will allow for more precise reading of values. For determining the scaling factors for the outer axes (m 1 and m 3 ), the relevant ranges of l ik and s i need to be considered. The limits of these ranges (l low , l up , s low , s up ) may be determined by the range observed in the data set where the model was developed, e.g., l low = min(l ik ) and l up = max(l ik ). These limits define the corresponding limits of the axes. Once these ranges are chosen, the scaling factors m 1 and m 3 are computed by m 1 = h/(l up − l low ) and m 3 = h/(s up − s low ). The remaining scaling factor is given by

Step 3: placing the middle axis (O)
The O axis is placed parallel to the outer axes. The horizontal distance between the axes L and O is given by

Step 4: placing tick marks and labels
For the outer axes L and S, two sequences of values for the tick marks and corresponding labels are defined: l T = (l low , . . . , l t , . . . , l up ) and s T = (s low , . . . , s t , . . . , s up ). Tick mark t on the axis should be placed relative to the lower end of that axis at a distance of m 1 × (l t − l low ) for axis L and m 3 × (s t − l low ) for axis S. For axis O, we define: o low = exp{l low + s low }. Because the axis O represents a log transformed probability scale, first a sequence of arithmetic probabilities o * T is defined with values between exp{o low } and 1. Then, tick mark t for this sequence may be placed at

Methods: Constructing the scoring chart
The use of the nomogram by health professionals can be improved by additionally presenting a scoring chart. This scoring chart provides a graphical approach to arriving at the values for the two outer axes L and S of the nomogram for any relevant combination of values on the covariates (x i ). For brevity, in this section we only consider the case of a multinomial logistic regression model with first order main effects. The scoring chart (and nomogram) can be extended to accommodate situations where higher order and interaction effects are present.
To make the scoring chart user-friendly, the individual effects of covariate j, (β jk x ij ), that make up the linear predictor k are rescaled to a "standardized" score. The sum over these individual effects together with a baseline score make up a "standardized" total score. This total score is a linear transformation of l ik . To facilitate the applicability of this standardized total score approach to the nomogram, the scaling of axis L should be adjusted accordingly. Below we detail the three-step procedure to arrive at the scoring system that makes up the scoring chart.
Step 5: standardized covariate effects: points The estimated multinomial logistic regression coefficients,β jk , are rescaled relative to the largest (conditional) covariate effect on a scale that has a minimum of 0 and a maximum of 100. First, the relevant ranges for each of the covariate variables are considered. Let the boundaries of these relevant ranges be denoted: x low j and x up j . The rescaling factor and rescaled coefficients are then computed by r = 100/max j,k (|β jk x up j −β jk x low j |) and β * jk = r ×β jk . The covariate effects are "standardized" by Points jk (x ij ) =β * jk x ij − min(β * jk x up j ,β * jk x low j ).
Step 6: standardized total effect: total points A baseline score for each category (except the reference category) is defined that takes into account the standardization that has been performed at step 5. The baseline score is computed by bl k = r ×α k + j min(β * jk x up j ,β * jk x low j ). To also obtain a "standardized" baseline score such that the minimum rescaled baseline score is zero, we subtract the minimum baseline score, bl * k = bl k − min k (bl k ). The standardized total effect for category k given covariate values is then given by Total k = bl * k + j Points jk (x ij ). Notice that l ik = lp k (x i ) = (Total k + min k (bl k ))/r.
Step 7: connecting the standardized total effects to the S axis A horizontal axes representing Total k is placed near the lower end of the scoring chart. Another parallel horizontal axis is placed: the values on this axis are related to the former axis by exp{(Total k + min k (bl k ))/r}. Taking the sum over the values that can be read off from the axis for all categories (except the reference category) is all the information necessary for determining the value on the S axis.

Empirical example: predicting the risk of operative delivery
To illustrate the suggested scoring chart and nomogram for the reporting of multinomial logistic models, we use a previously published model on predicting the risk of operative delivery [7]. We detail their use by considering a specific hypothetical subject (described below). The prediction model used in this illustration was developed using data from a randomized clinical trial conducted in the Netherlands [9].
In brief, the multinomial prediction model was developed in 5667 laboring women with high-risk vertex (i.e., babies in a normal position in the uterus) singleton pregnancies beyond 36 weeks of gestation that met the inclusion criteria of the randomized clinical trial. Based on the combination of the intervention (i.e., instrumental vaginal delivery (IVD) or caesarean section (CS)) and the indication for the intervention (i.e., fetal distress (FD) or failure to progress (FTP)), women were assigned to one of five distinctive outcome categories: spontaneous vaginal delivery (reference category); instrumental vaginal delivery due to suspected fetal distress (IVD-FD); caesarean section due to suspected fetal distress (CS-FD); instrumental vaginal delivery due to failure to progress (IVD-FTP); or caesarean section due to failure to progress (CS-FTP). The multinomial regression model included the antepartum variables: maternal age, parity, gestational age, maternal diabetes mellitus, previous caesarean delivery, fetal gender, maternal hypertensive disorder, suspected intrauterine growth restriction, and antepartum estimated fetal weight. An antepartum prediction model was developed using this set of variables (i.e., model 1 in Schuit et al. 2012; see Table 1). For more details on the various outcome categories and candidate predictors, we refer to the original publication [7]. Consider the following hypothetical subject: a nulliparous diabetic subject with a maternal age of 32 years, a gestational age of 40 weeks, expecting a boy with an estimated birth weight of 3540 g. Using the results presented in Table 1 and a series of calculations (that follow from Eqs. 1 and 2), one may calculate the predicted probabilities for the hypothetical case study for each of the five outcomes as 0.096 (IVD-FD), 0.051 (CS-FD), 0.069 (IVD-FTP), 0.268 (CS-FTP), and 0.516 (spontaneous delivery). To facilitate the calculation of the predicted probabilities, we constructed a scoring chart (Fig. 2) and nomogram ( Fig. 2) using the reported regression coefficients by Schuit et al. [7]. For illustrative purposes we have added some information specific to the case study in Figs. 2 and 2. The scoring chart and nomogram without the case study information can be found in Webfigure A and Webfigure B, respectively.
Using the scoring chart (Fig. 2), each subject characteristic can be converted into a score for a particular outcome category (except the reference category). We have illustrated this procedure by drawing vertical dashed lines between the axes only for the subject characteristic "maternal age" (32 years) to the Points axis (upper end scoring chart) in the figure. We leave it to the interested reader to draw the lines for the remaining characteristics. By adding the points of the separate predictors to the baseline points for each of the chosen outcomes, one obtains the total points for this particular outcome. Each of these total points should be marked at the line indicated by Total Points (lower end scoring chart). For the considered subject, the total points are 159 (IVD-FD), 140 (CS-FD), 149 (IVD-FTP), and 189 (CS-FTP). Further, by drawing vertical lines from the marked points on Total Points axis to the Exp(l ik ) axis (illustrated using dashed lines), one can read-off the exponentiated linear predictor for each of the outcome categories. This information is then put in the boxes right under the Exp(l ik ) axis. Finally, by taking the sum over these values (registered under box "S") one obtains all information needed to use the nomogram (Fig. 2).
To use the nomogram (Fig. 2), one first marks on the Total Points axis (left axis) the total points for each outcome category except the reference category, as derived from the scoring chart. One than marks the value registered under box "S" in the scoring chart on the S axis of (Fig. 2). The points marked on the Total points axis and point on the S axis are then connected using straight lines. From the middle "O" axis, one can now read-off the predicted probabilities for each of the four outcomes for this particular case. Finally, by subtracting these four probabilities from one , the probability of the reference category is obtained, corresponding to the probability of a "spontaneous delivery".

Conclusion
Nomograms are not likely to be used very often in contemporary clinical practice for calculating probabilities for individual patients given the considerable burden placed on the user and also because its accuracy is limited by the precision with which physical markings can be drawn, reproduced, viewed, and aligned. For this purpose, electronic implementation of a prediction model by means of a website calculator, mobile app, or an algorithm implemented into the electronic medical file of the patients to directly calculate the relevant probabilities may be considered. However, we still think that nomograms remain quite useful, even today, as they do offer great visual insight a (multinomial) prediction model, which is often perceived as a black-box (for a further discussion on the usefulness of nomograms we refer to [10,11]). Therefore, we do expect that our general approach to construct and report scoring charts and nomograms for multinomial logistic regression models will facilitate the interpretation and use of such models to a wider audience.
Our approach is flexible and generalizable and can be used irrespective of the number of outcome categories and types of covariates present. R-code for developing the nomogram is available from http://mvansmeden.net/ software/multinomial-nomogram.