Estimate a Multinomial Logistic regression (MNL) for classification

Functionality

To estimate a Multinomial Logistic regression (MNL) we need a categorical response variable with two or more levels and one or more explanatory variables. We also need to specify the level of the response variable to count as the base level for comparison (i.e., use the Choose level: dropdown). In the example data file, ketchup, we can use heinz28 as the base for comparison.

To access this dataset go to Data > Manage, select examples from the Load data of type dropdown, and press the Load button. Then select the ketchup dataset.

In the Summary tab we can test if two or more variables together add significantly to the fit of a model by selecting them in the Variables to test dropdown. This functionality can be very useful to test if the overall influence of a variable of type factor is statistically significant.

Additional output that requires re-estimation:

Additional output that does not require re-estimation:

We can use the Predict tab to predict probabilities for different values of the explanatory variable(s) (i.e., a common use of MNL models). First, select the type of input for prediction using the Prediction radio buttons. Choose either an existing dataset for prediction (“Data”) or specify a command (“Command”) to generate the prediction inputs. If you choose to enter a command you must specify at least one variable and one value to get a prediction. If you do not specify a value for each variable in the model either the mean value or the most frequent level will be used. It is only possible to predict outcomes based on variables used in the model (e.g., price.heinz32 must be one of the selected explanatory variables to predict the probability of choosing to buy heinz32 when priced at $3.80.

To generate predicted values for all cases in, for example, the ketchup dataset select Data from the Prediction input dropdown then select the ketchup dataset. You can also create a dataset for input in Data > Transform using Expand grid or in a spreadsheet and then paste it into Radiant through the Data > Manage tab. You can also load CSV data as input for prediction.

Once the desired predictions have been generated they can be saved to a CSV file by clicking the download button button on the top right of the screen. To add predictions to the dataset used for estimation, click the Store button. Note that MNL models generate as many columns of probabilities as there are level in the categorical response variable. If you want to store only the predictions for the first level (e.g., heinz28) provide only one name in the the Store predictions input. If you want to store predictions for all ketchup brands, enter four variable names, separated by a comma.

Example: Choice of ketchup brands

As an example we will use a dataset on on choice behavior for 300 individuals in a panel of households in Springfield, Missouri (USA). The data captures information on 2,798 purchase occasions over a period of around 2 years and includes the follow variables:

Suppose we want to investigate how prices of the different products influence the choice of ketchup brand and package size. In the Model > Multinomial logistic regression (MNL) select choice as the response variable and heinz28 from the Choose base level dropdown menu. Select price.heinz28 through price.hunts32 as the explanatory variables. In the screenshot below we see that several, but not all, of the coefficients are statistically significant (p.value < .05) and that the model has some predictive power (Chi-squared statistic < .05). The left-most output column show the which brand the coefficients apply to.

Unfortunately the coefficients from a logistic regression model are difficult to interpret. The OR column provides estimated odds-ratios. We see that the odds of survival were significantly lower for 2nd and 3rd class passengers compared to 1st class passenger. The odds of survival for males were also lower than for females. While the effect of age is statically significant, for each extra year in age the odds of survival are not as strongly affected (see also the standardized coefficient).

For each of the explanatory variables the following null and alternate hypotheses can be formulated for the odds ratios:

The odds-ratios from the logistic regression can be interpreted as follows:

Report > Rmd

Add code to Report > Rmd to (re)create the analysis by clicking the icon on the bottom left of your screen or by pressing ALT-enter on your keyboard.

If a plot was created it can be customized using ggplot2 commands or with gridExtra. See example below and Data > Visualize for details.

plot(result, plots = "coef", custom = TRUE) +
  labs(title = "Coefficient plot")

R-functions

This document is a work in progress. For a worked example using the multinom function, see the link below.

R-functions

For an overview of related R-functions used by Radiant to estimate a multinomial logistic regression model see Model > Multinomial logistic regression.

The key functions used in the mnl tool are multinom from the nnet package and linearHypothesis from the car package.