Import packages to use.
library("ggplot2")
library("readr")
library("dplyr")
Define the inverse of the logit function (i.e., the logiscti function).
invlogit <- function(x) {
p <- 1 / (1 + exp(-x))
return(p)
}
For illustration, we use the dataset appered in Asano and Yanai (2013): hr96-09.csv.
Using this dataset, we will try to explain the victory in single member districts (SMDs) of the Japanese House of Representatives elections by logistic regression.
First, load the dataset into R. Here, let’s use the dataset we modified (labelled) and saved before.
load(url("http://www2.kobe-u.ac.jp/~yyanai/classes/rm1/contents/data/HR-labelled.Rda"))
head(HR)
## Source: local data frame [6 x 16]
##
## year ku kun party name age status nocand
## (int) (chr) (int) (fctr) (chr) (int) (fctr) (int)
## 1 1996 aichi 1 NFP KAWAMURA, TAKASHI 47 incumbent 7
## 2 1996 aichi 1 LDP IMAEDA, NORIO 72 ex 7
## 3 1996 aichi 1 DPJ SATO, TAISUKE 53 incumbent 7
## 4 1996 aichi 1 JCP IWANAKA, MIHOKO 43 challenger 7
## 5 1996 aichi 1 bunka ITO, MASAKO 51 challenger 7
## 6 1996 aichi 1 NP YAMADA, HIROSHIB 51 challenger 7
## Variables not shown: wl (fctr), rank (int), previous (int), vote (int),
## voteshare (dbl), eligible (int), turnout (dbl), exp (int)
This dataset contains the variable showing the election results, wl. The values for the variable are “lost”, “won”, and “zombie” (candidates who lost the SMD but resurrected in a PR block). Using this variable, let’s make the binary variable indicating the victory in an SMD, wlsmd.
myd <- mutate(HR, wlsmd = as.numeric(wl == "won"))
with(myd, table(wlsmd))
## wlsmd
## 0 1
## 4114 1500
In addition, let’s create three dummy variables ldp (Liberal Democratic Party), komei (New Komeito or Clean Government Party), dpj (Democratic Party of Japan) and a variable measuring the expenditure in million yen.
myd <- myd %>%
mutate(ldp = as.numeric(party == "LDP"),
komei = as.numeric(party == "CGP"),
dpj = as.numeric(party == "DPJ"),
expm = exp / 10^6)
For simpliciy, keep the data for the 2009 election and delete the observations with a missing value (You shouldn’t delete any observation because of missing values in a real data analysis).
myd_2009 <- myd %>%
filter(year == 2009) %>%
na.omit()
Now we’d like to model the victory in SMDs with two predictors. One is the number of privious wins in the HR elections, previous; the other is the elecotral expenditure in million yen, expm. (In a research paper, we should explicity explain why we use this model. In other words, we should provide theoretical reasons why we use these predictors to explain the response. However, we focus on the quantitative methods here.)
This logistic regression is estimated by the following code.
fit_logit_1 <- glm(wlsmd ~ previous + expm, data = myd_2009,
family = binomial(link = "logit"))
summary(fit_logit_1)
##
## Call:
## glm(formula = wlsmd ~ previous + expm, family = binomial(link = "logit"),
## data = myd_2009)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.0691 -0.5922 -0.5111 0.5219 1.9741
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.03746 0.13296 -15.323 <2e-16
## previous 0.36052 0.03826 9.422 <2e-16
## expm 0.03825 0.01841 2.077 0.0378
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1300.1 on 1123 degrees of freedom
## Residual deviance: 1071.7 on 1121 degrees of freedom
## AIC: 1077.7
##
## Number of Fisher Scoring iterations: 4
This result shows that \[\mathrm{Pr(victory)} = \frac{1}{1 + \exp[- (-2.04 + 0.36 \mathrm{previous~wins} + 0.04\mathrm{expenditure})]}.\]
Next, let’s add the candidate’s age to the linear predictor.
fit_logit_2 <- glm(wlsmd ~ previous + expm + age , data = myd_2009,
family = binomial(link = "logit"))
summary(fit_logit_2)
##
## Call:
## glm(formula = wlsmd ~ previous + expm + age, family = binomial(link = "logit"),
## data = myd_2009)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2093 -0.6602 -0.4414 0.4761 2.2864
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.766843 0.387483 1.979 0.0478
## previous 0.517282 0.046734 11.069 < 2e-16
## expm 0.044772 0.019158 2.337 0.0194
## age -0.064545 0.008818 -7.319 2.49e-13
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1300.1 on 1123 degrees of freedom
## Residual deviance: 1012.1 on 1120 degrees of freedom
## AIC: 1020.1
##
## Number of Fisher Scoring iterations: 5
This shows that \[\mathrm{Pr(victory)} = \frac{1}{1 + \exp[-(0.77 + 0.52 \mathrm{previous~wins} + 0.04 \mathrm{expenditure} -0.06 \mathrm{age})]}.\]
Furthermore, let’s add dummies for LDP, CGP, and DPJ.
fit_logit_3 <- glm(wlsmd ~ previous + expm + age + ldp + komei + dpj,
data = myd_2009,
family = binomial(link = "logit"))
summary(fit_logit_3)
##
## Call:
## glm(formula = wlsmd ~ previous + expm + age + ldp + komei + dpj,
## family = binomial(link = "logit"), data = myd_2009)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8819 -0.2646 -0.1906 0.2406 2.7992
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.49795 0.62996 -3.965 7.33e-05
## previous 0.48450 0.05767 8.402 < 2e-16
## expm 0.04006 0.03037 1.319 0.18706
## age -0.03316 0.01256 -2.640 0.00829
## ldp 0.45332 0.42776 1.060 0.28925
## komei -14.35608 589.26975 -0.024 0.98056
## dpj 4.18928 0.33995 12.323 < 2e-16
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1300.13 on 1123 degrees of freedom
## Residual deviance: 584.78 on 1117 degrees of freedom
## AIC: 598.78
##
## Number of Fisher Scoring iterations: 14
Write an equation representing the estimated results yourself.
Comparing three models above, we find that AIC is smallest in the third model. This implies that the explanatory power is the strongest in the third model than the other.
Now, let’s calculate the “hit ratio” of each model. Here, we expect the candidates whose probability of winning is greater than 0.5 to win the SMD and the other to lose.
First, we’ll obtain the predicted probabilites by predict()
. By setting the argument type to “response”, we are provided with the predected probability. Otherwise,predict()
returns the logit (i.e., log odds) by default.
myd_2009 <- myd_2009 %>%
mutate(pred_1 = predict(fit_logit_1, type = "response"),
pred_2 = predict(fit_logit_2, type = "response"),
pred_3 = predict(fit_logit_3, type = "response"))
Using these predicted probabilities, we’ll calculate the “hit ratio”.
## Error rates
ert_1 <- with(myd_2009, mean((pred_1 > 0.5 & wlsmd == 0) | (pred_1 < 0.5 & wlsmd == 1)))
ert_2 <- with(myd_2009, mean((pred_2 > 0.5 & wlsmd == 0) | (pred_2 < 0.5 & wlsmd == 1)))
ert_3 <- with(myd_2009, mean((pred_3 > 0.5 & wlsmd ==0 ) | (pred_3 < 0.5 & wlsmd == 1)))
## Hit rate = 1 - error rate
1 - c(ert_1, ert_2, ert_3)
## [1] 0.7464413 0.7704626 0.8825623
The hit ratio is the highest in the third model.
The observed ratio of victories is
(wr <- with(myd_2009, mean(wlsmd)))
## [1] 0.2651246
Thus, the baseline of prediction is
max(c(wr, 1 - wr))
## [1] 0.7348754
That is, without a model, we can achieve 73% hit by predicting everybody loses. The prediction based on Models 1 and 2 are only slightly better than the baseline, and we consider them as “bad” fits. In contrast, Model 3 improves the accuracy of prediction.
Next, we’ll compare the ROC curves of these three models. We define the functions to calculate the true positive rate (TPR, sensitivity) and the false positive rate (FPR, \(1 -\) specificity) and draw ROC curves with ggplot2.
## Function to calculate the true positive rate (sensitivity)
true_pos <- function(predicted, observed, cutoff = .5) {
n <- length(cutoff)
res <- rep(NA, n)
for (i in 1:n) {
cond_met <- predicted >= cutoff[i] & observed == 1
res[i] <- mean(cond_met) / mean(observed)
}
return(res)
}
## Function to calculate the false positive rate(1- specificity)
false_pos <- function(predicted, observed, cutoff = .5) {
n <- length(cutoff)
res <- rep(NA, n)
for (i in 1:n){
cond_met <- predicted >= cutoff[i] & observed == 0
res[i] <- mean(cond_met) / (1 - mean(observed))
}
return(res)
}
## Get TPRs and FPRs by changing the cutoff from 0 to 1
cutoff_vec <- seq(0, 1, length = 1000)
tp_1 <- true_pos(myd_2009$pred_1, myd_2009$wlsmd, cutoff = cutoff_vec)
tp_2 <- true_pos(myd_2009$pred_2, myd_2009$wlsmd, cutoff = cutoff_vec)
tp_3 <- true_pos(myd_2009$pred_3, myd_2009$wlsmd, cutoff = cutoff_vec)
fp_1 <- false_pos(myd_2009$pred_1, myd_2009$wlsmd, cutoff = cutoff_vec)
fp_2 <- false_pos(myd_2009$pred_2, myd_2009$wlsmd, cutoff = cutoff_vec)
fp_3 <- false_pos(myd_2009$pred_3, myd_2009$wlsmd, cutoff = cutoff_vec)
## Create data frame to use ggplot2
df_roc <- data.frame(tpr = c(tp_1, tp_2, tp_3),
fpr = c(fp_1, fp_2, fp_3),
model = rep(1:3, rep(length(cutoff_vec), 3)))
## Draw ROC cuves
p <- ggplot(df_roc, aes(x = fpr, y = tpr, color = as.factor(model))) +
geom_line() +
geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
scale_color_discrete(name = "", labels = paste("Model", 1:3)) +
labs(x = "1 - specificity", y = "sensitivity", title = "ROC Curves")
print(p)
This figure also shows that the third models is the best among three.
The residual is the difference between the observed value and the predicted value, \[\mathrm{residual}_i = y_i - \mathrm{logit}^{-1}({\bf x}_i^T {\bf \beta}).\] Since the response \(y_i\) takes only two different values 0 and 1, a spcific \({\bf x}_i^T {\bf \beta}\) gives us only two different residuals. For instance, for \(\mathrm{logit}^{-1}({\bf x}_i^T {\bf \beta})=0.4\), the possible residuals are \(-0.4\) and \(0.6\) only.
Plotting the residuals for Model 3, we get the following.
myd_2009$res_3 <- with(myd_2009, wlsmd - pred_3)
res_p <- ggplot(myd_2009, aes(x = pred_3, y = res_3)) +
geom_point() +
geom_hline(yintercept = 0, color = "royalblue") +
labs(x = "predicted prob. of victory", y = "observed - predicted",
title = "Residual Plot of Model 3")
print(res_p)
As this figure shows, the residual plot of logistic regression is (almost) useless. And so, we create binned residual plots instead of ordinary residual plots. To make this type of plot, we group the residuals by predicted values and calculate the group mean of the residuals. Then, we plot the mean residual versus the mean predicted value for each group.
Let’s create one for Model 3.
## We make 20 groups based on predicted values: num of groups is arbitrary
## Create a data frame with the predicted values and residuals
df <- myd_2009[, c("pred_3", "res_3")]
## Sort the observations by predicted values
df <- arrange(df, pred_3)
N <- nrow(df)
n_group <- round(N / 20) ## Observations in each group
## If N / 20 is not an integer, the nuber of observations differ in the last group
df$group <- rep(20, N) ## Category var for groups
df$group[1:(n_group * 20)] <- rep(1:20, rep(n_group, 20))
## Caculate the group means of predicted values and residuals
df_group <- df %>%
group_by(group) %>%
summarize(pred = mean(pred_3),
res = mean(res_3),
obs = n())
## Calculate +-2se
df_group <- df_group %>%
mutate(lb = -2 * sqrt(pred * (1 - pred) / obs),
ub = 2 * sqrt(pred * (1 - pred) / obs))
## Plot
p_br <- ggplot(df_group, aes(x = pred, y = res)) +
geom_point() +
geom_line(aes(y = lb), linetype = "dotted") +
geom_line(aes(y = ub), linetype = "dotted") +
geom_hline(yintercept = 0, color = "royalblue") +
labs(x = "predicted probability of victory", y = "mean residual",
title = "Binned Residual Plot of Model 3")
print(p_br)
This figure shows the binned residual plot with 20 groups for Model 3. The number of groups 20 was arbitrarily chosen. We should determine the number of groups so that the number of observatios in each group is not too small. Here, we have 1,123 observations in total and each group has 56 observations except the 20th group, which has 60.
Looking the figure, we cannot find an obvious pattern of the residuals. However, we are not so sure because the number of points are as small as 20. (Assignment: Create a similar figure with 40 groups and check if you can find any pattern.) The curves in the figure display \(\pm2\)se\(=\pm 2\sqrt{p(1-p)/n}\), where \(p\) is the predicted probability for each group, and \(n\) is the number of observations for each group. We expect that about 95% of observations are between two curves. In this figure, 18 out of 20 points are in that area. That is, 90% of the data are between the curves, which is not off the expectation much.
Using Model 3, I’ll explain how to conduct significance test. First, let’s consider the null hypothesis for the parameters other than the intercept \(\beta_2 = \beta_3 = \cdots = \beta_k=0\).
To decide if we reject the null or not, we use deviance. In the output of summary(fit_logit_3)
, Null deviance and Residual deviance are displayed. The former is the deviance where all parameters (except the intercept) are set to 0, and the latter is the deviance of the estimated model. We consider a model with smaller deviance is better. If the null is true, the difference between these two deviances follow the \(\chi^2\) distribution with \(\nu\) degrees of freedom, where \(\nu\) is the difference in the degree of freedom between two deviances (i.e., the number of predictors in the model). Therefore, we shoudl conduct \(\chi^2\) test for the difference of deviances.
The differences of deviances and the degree of freedom are obtained as follows.
X2_logit_3 <- with(fit_logit_3, null.deviance - deviance)
df_logit_3 <- with(fit_logit_3, df.null - df.residual)
Thus, the \(p\) values of the \(\chi^2\) test is
pchisq(X2_logit_3, df = df_logit_3, lower.tail = FALSE)
## [1] 2.97e-151
This is practically 0, so the null \(\beta_2 = \beta_3 = \cdots = \beta_k=0\) is rejected at 0.01 significance level.
To evaluate addition of a predictor, we check how much the deviance declines as we add a predictor. We expect that the deviance decreaes by 1 (on average) if we add a predictor that does not have any systematic effect on the response. Thus, to argue that the addition is meaningful, we should show that the reduction of the deviance is significantly larger than 1 by \(\chi^2\) test.
We can do that with anova()
(ANalysis Of VAriance).
anova(fit_logit_3, test = "Chi")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: wlsmd
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 1123 1300.13
## previous 1 224.216 1122 1075.91 < 2.2e-16
## expm 1 4.226 1121 1071.69 0.0398
## age 1 59.553 1120 1012.13 1.190e-14
## ldp 1 167.361 1119 844.77 < 2.2e-16
## komei 1 36.216 1118 808.56 1.766e-09
## dpj 1 223.775 1117 584.78 < 2.2e-16
In the output, the column Deviance shows the amount of deviance reduction by including each variable in addition to the variables above. The deviance decreases more than 1 for all variables examined here.
Next, let’s consider the null that “coefficient = 0” for each variable. We can use \(p\) values displayed in summary to test the hypotheses. Let’s check the result using caterpillar plot.
df_logit_3 <- data.frame(
order = 1:7,
predictor = c("intercept", "previous wins", "expenditure", "age",
"LDP", "CGP", "DPJ"),
coef = coef(fit_logit_3),
se = summary(fit_logit_3)$coefficient[, 2])
df_logit_3 <- df_logit_3 %>%
mutate(lb = coef - qt(.975, df = df.residual(fit_logit_3)) * se,
ub = coef + qt(.975, df = df.residual(fit_logit_3)) * se)
ctplr_1 <- ggplot(filter(df_logit_3, predictor != "intercept"),
aes(x = reorder(predictor, order), y = coef,
ymin = lb, ymax = ub)) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_pointrange(color = "darkblue") +
labs(x = "predictor", y = "estimate",
title = "Coefficient Estimates and 95% CI: Response is Victory in SMD") +
coord_flip()
print(ctplr_1)
This figure reveals that the effect of being CGP is not statistically significant. Let’s draw a similar figure without CGP.
ctplr_2 <- ctplr_1 %+%
filter(df_logit_3, !(predictor %in% c("intercept", "CGP")))
print(ctplr_2)
This figure shows that the predictors “privious wins” and “DPJ” have statistically significant effetcs, while “expenditure” and “LDP” (and “CGP”) don’t. It is hard to tell if the effect of age is statistically significan from this figure. Since the estimates for party dummies are huge compared to the other variables, let’s make a figure without party dummies to see the estimation uncertainty in other variables.
ctplr_3 <- ctplr_1 %+%
filter(df_logit_3, predictor %in% c("previous wins", "expenditure", "age"))
print(ctplr_3)
By looking these figures, we reject the null “coefficient = 0” for the predictors “DPJ”, “previous wins”, and “age”, but not for the other.
Since the relationship between the response and the linear predictor is not linear in logistic regression, it is not trivial to interpret the estimated coeffcients. One way to tackle the problem is to consider how much the predicted probability changes corresponding to a specific change of each predictor. For illustration we use Model 2 estimated above.
Here is the result of Model 2.
summary(fit_logit_2)
##
## Call:
## glm(formula = wlsmd ~ previous + expm + age, family = binomial(link = "logit"),
## data = myd_2009)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2093 -0.6602 -0.4414 0.4761 2.2864
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.766843 0.387483 1.979 0.0478
## previous 0.517282 0.046734 11.069 < 2e-16
## expm 0.044772 0.019158 2.337 0.0194
## age -0.064545 0.008818 -7.319 2.49e-13
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1300.1 on 1123 degrees of freedom
## Residual deviance: 1012.1 on 1120 degrees of freedom
## AIC: 1020.1
##
## Number of Fisher Scoring iterations: 5
First, let’s examine the effect of previous wins on the probability of victory. Consider how much the probability changes as the prvious wins increases from 0 to 1, other things equal (fixed). Let \(\delta\) denote the change in the probability of victory. \[\delta(\mathrm{expenditure, age}) = \mathrm{logit}^{-1}(0.77 + 0.52 \cdot 1 + 0.04 \mathrm{expenditure} - 0.06 \mathrm{age})\\ - \mathrm{logit}^{-1}(0.77 + 0.52 \cdot 0 +0.04 \mathrm{expenditure} - 0.06 \mathrm{age}).\]
Calculating this for each observation, we get the mean prediced difference (MPD) \[ \mathrm{MPD} = \frac{1}{n}\sum_{i=1}^n \delta(\mathrm{expenditure}_i, \mathrm{age}_i).\]
Let’s get it in R. We use our own invlogit()
function for calculation.
## Extract the coefficient estimates
b <- coef(fit_logit_2)
## Calculate the predicted difference for each observation
delta <- with(myd_2009,
invlogit(b[1] + b[2] * 1 + b[3] * expm + b[4] * age) -
invlogit(b[1] + b[2] * 0 + b[3] * expm + b[4] * age))
## Calculate the mean predicted difference
mean(delta)
## [1] 0.06078672
This shows that the probability of winning the SMD increases by 6.08 points on average as the previous wins increases from 0 to 1.
Similarly, let’s calculate the change of the probability if the previous wins increases from 1 to 5.
delta <- with(myd_2009,
invlogit(b[1] + b[2] * 5 + b[3] * expm + b[4] * age) -
invlogit(b[1] + b[2] * 1 + b[3] * expm + b[4] * age))
mean(delta)
## [1] 0.4096711
Thus, if the previous wins increases from 1 to 5, the probability of winning increases by 40.97 points on average.
Next, let’s examin the effect of electoral spending. Now, we’ll check how much the winning probability changes as the spending changes from “the mean - sd” to “the mean + sd”.
lo <- with(myd_2009, mean(expm) - 0.5 * sd(expm))
hi <- with(myd_2009, mean(expm) + 0.5 * sd(expm))
delta <- with(myd_2009,
invlogit(b[1] + b[2] * previous + b[3] * hi + b[4] * age) -
invlogit(b[1] + b[2] * previous + b[3] * lo + b[4] * age))
mean(delta)
## [1] 0.03342016
This shows that the change of the elecotral spending from “the mean -sd” to the “mean + sd” is associated with the rise of the winning probability by 3.34 points.
Similarly, what happenes if the spending changes from the minimum to the maximum.
delta <- with(myd_2009,
invlogit(b[1] + b[2] * previous + b[3] * max(expm) + b[4] * age) -
invlogit(b[1] + b[2] * previous + b[3] * min(expm) + b[4] * age))
mean(delta)
## [1] 0.1860051
This shows that the winning probability increases by 18.6 points on average if the expenditure changes from the minimum to the maximum.
Lastly, let’s check the effect of age on the probability of winning the SMDs. How does the probability change if a candidate’s age changes from 30 to 40?
delta <- with(myd_2009,
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 40) -
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 30))
mean(delta)
## [1] -0.1116126
It is shown that the probability of elecotal victory increases by -11.16 points (or decreases by 11.16 points) on average by changing the age from 30 to 40.
Simlilary, what is the effect of age changing from 40 to 50?
delta <- with(myd_2009,
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 50) -
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 40))
mean(delta)
## [1] -0.09267792
It is presented that the aging from 40 to 50 is associated with the -9.27 points increase (or 9.27 points decrease) of the winning probability.
How about the change from 50 to 60?
delta <- with(myd_2009,
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 60) -
invlogit(b[1] + b[2] * previous + b[3] * expm + b[4] * 50))
mean(delta)
## [1] -0.07292911
It is displayed that the aging from 50 to 60 is associated with the -7.29 points increase (or 7.29 points decrease) of the winning probability.
As these examples show, we can substatively interpret the results of logistic regression by showing the average change of the predicted probability corresponding to some specific changes of the predictors. It is even better to show these results graphically (e.g. bar graphs).
Let’s try to explain the winnin probability with probit model.
First, we’ll consider the model explaining wlsmd by previous and expm. The probit regression model can be estimated by the following code.
fit_probit_1 <- glm(wlsmd ~ previous + expm, data = myd_2009,
family = binomial(link = "probit"))
summary(fit_probit_1)
##
## Call:
## glm(formula = wlsmd ~ previous + expm, family = binomial(link = "probit"),
## data = myd_2009)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3842 -0.5862 -0.4922 0.5449 1.9910
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.24950 0.07392 -16.903 <2e-16
## previous 0.20837 0.02178 9.566 <2e-16
## expm 0.02509 0.01078 2.327 0.02
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1300.1 on 1123 degrees of freedom
## Residual deviance: 1066.2 on 1121 degrees of freedom
## AIC: 1072.2
##
## Number of Fisher Scoring iterations: 5
The result can be written as \[\mathrm{Pr(victory)} = \Phi(-1.25 + 0.21 \mathrm{previous~wins} + 0.03\mathrm{expenditure}),\] where \(\Phi\) is the standard normal CDF.
Next, let’s add age to the linear predictor.
fit_probit_2 <- glm(wlsmd ~ previous + expm + age , data = myd_2009,
family = binomial(link = "probit"))
summary(fit_probit_2)
##
## Call:
## glm(formula = wlsmd ~ previous + expm + age, family = binomial(link = "probit"),
## data = myd_2009)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.5465 -0.6569 -0.4291 0.5177 2.3166
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.351364 0.222982 1.576 0.1151
## previous 0.292056 0.025633 11.394 < 2e-16
## expm 0.027782 0.011136 2.495 0.0126
## age -0.036245 0.004929 -7.353 1.94e-13
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1300.1 on 1123 degrees of freedom
## Residual deviance: 1009.1 on 1120 degrees of freedom
## AIC: 1017.1
##
## Number of Fisher Scoring iterations: 5
This result can be written as \[\mathrm{Pr(victory)} = \Phi(0.32 + 0.29 \mbox{previous~wins} + 0.03\mathrm{expenditure} -0.04\mathrm{age}).\]
Furthermore, let’s add dummies for LDP, CGP, and DPJ.
fit_probit_3 <- glm(wlsmd ~ previous + expm + age + ldp + komei + dpj, data = myd_2009,
family = binomial(link = "probit"))
summary(fit_logit_3)
##
## Call:
## glm(formula = wlsmd ~ previous + expm + age + ldp + komei + dpj,
## family = binomial(link = "logit"), data = myd_2009)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8819 -0.2646 -0.1906 0.2406 2.7992
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.49795 0.62996 -3.965 7.33e-05
## previous 0.48450 0.05767 8.402 < 2e-16
## expm 0.04006 0.03037 1.319 0.18706
## age -0.03316 0.01256 -2.640 0.00829
## ldp 0.45332 0.42776 1.060 0.28925
## komei -14.35608 589.26975 -0.024 0.98056
## dpj 4.18928 0.33995 12.323 < 2e-16
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1300.13 on 1123 degrees of freedom
## Residual deviance: 584.78 on 1117 degrees of freedom
## AIC: 598.78
##
## Number of Fisher Scoring iterations: 14
Like logistic regression, the third model has the smallest AIC and is the best fit among three.
Like logistic regreesion, since the relationship betwen the response and the linear predictor is not linear, it is not straightforward to interpret the results of probit regression. So we’d like to consider how much the predicted probability changes corresponding to a specific change of each predictor again. For illustration we use Model 2 estimated above. For comparison, we use the same changes we examined for logistic regression.
Here’s the result of Model 2.
fit_probit_2 <- glm(wlsmd ~ previous + expm + age , data = myd_2009,
family = binomial(link = "probit"))
summary(fit_probit_2)
##
## Call:
## glm(formula = wlsmd ~ previous + expm + age, family = binomial(link = "probit"),
## data = myd_2009)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.5465 -0.6569 -0.4291 0.5177 2.3166
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.351364 0.222982 1.576 0.1151
## previous 0.292056 0.025633 11.394 < 2e-16
## expm 0.027782 0.011136 2.495 0.0126
## age -0.036245 0.004929 -7.353 1.94e-13
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1300.1 on 1123 degrees of freedom
## Residual deviance: 1009.1 on 1120 degrees of freedom
## AIC: 1017.1
##
## Number of Fisher Scoring iterations: 5
First, let’s examine the effect of previous wins on the probability of victory. Consider how much the probability changes as the prvious wins increases from 0 to 1, other things equal (fixed). Let \(\delta\) denote the change in the probability of victory. Then, \[\delta(\mathrm{expenditure, age}) = \Phi(0.35 + 0.29 \cdot 1 +0.03 \mathrm{expenditure} - 0.04 \mathrm{age})\\ - \Phi(0.35 + 0.29 \cdot 0 +0.03 \mathrm{expenditure} - 0.04 \mathrm{age}).\]
We calculate this for each observation and get the mean predicted difference: \[\mathrm{MPD} = \frac{1}{n}\sum_{i=1}^n \delta(\mathrm{expenditure}_i, \mathrm{age}_i).\]
Let’s calculate it in R. R has a built-in function pnorm()
to get the noraml CDF.
## Extract the coefficient estimates
b <- coef(fit_probit_2)
## Calculate the difference in predicted prob for each observation
delta <- with(myd_2009,
pnorm(b[1] + b[2] * 1 + b[3] * expm + b[4] * age) -
pnorm(b[1] + b[2] * 0 + b[3] * expm + b[4] * age))
## Obtain the mean predicted difference
mean(delta)
## [1] 0.06134222
This shows that the probability of winning the SMD increases by 6.13 points on average as the previous wins increases from 0 to 1.
Similarly, let’s calculate the change of the probability if the previous wins increases from 1 to 5.
delta <- with(myd_2009,
pnorm(b[1] + b[2] * 5 + b[3] * expm + b[4] * age) -
pnorm(b[1] + b[2] * 1 + b[3] * expm + b[4] * age))
mean(delta)
## [1] 0.3861617
Thus, if the previous wins increases from 1 to 5, the probability of winning increases by 38.62 points on average.
Next, let’s examin the effect of electoral spending. Now, we’ll check how much the winning probability changes as the spending changes from “the mean - sd” to “the mean + sd”.
## Follwing two lines were done when logistic results are examined
# lo <- with(myd_2009, mean(expm) - 0.5*sd(expm))
# hi <- with(myd_2009, mean(expm) + 0.5*sd(expm))
delta <- with(myd_2009,
pnorm(b[1] + b[2] * previous + b[3] * hi + b[4] * age) -
pnorm(b[1] + b[2] * previous + b[3] * lo + b[4] * age))
mean(delta)
## [1] 0.03586274
This shows that the change of the elecotral spending from “the mean -sd” to the “mean + sd” is associated with the rise of the winning probability by 3.59 points.
Similarly, what happenes if the spending changes from the minimum to the maximum.
delta <- with(myd_2009,
pnorm(b[1] + b[2] * previous + b[3] * max(expm) + b[4] * age) -
pnorm(b[1] + b[2] * previous + b[3] * min(expm) + b[4] * age))
mean(delta)
## [1] 0.198816
This shows that the winning probability increases by 19.88 points on average if the expenditure changes from the minimum to the maximum.
Lastly, let’s check the effect of age on the probability of winning the SMDs. How does the probability change if a candidate’s age changes from 30 to 40?
delta <- with(myd_2009,
pnorm(b[1] + b[2] * previous + b[3] * expm + b[4] * 40) -
pnorm(b[1] + b[2] * previous + b[3] * expm + b[4] * 30))
mean(delta)
## [1] -0.1071702
It is shown that the probability of elecotal victory increases by -10.72 points (or decreases by 10.72 points) on average by changing the age from 30 to 40.
Simlilary, what is the effect of age changing from 40 to 50?
delta <- with(myd_2009,
pnorm(b[1] + b[2] * previous + b[3] * expm + b[4] * 50) -
pnorm(b[1] + b[2] * previous + b[3] * expm + b[4] * 40))
mean(delta)
## [1] -0.09135181
It is presented that the aging from 40 to 50 is associated with the -9.14 points increase (or 9.14 points decrease) of the winning probability.
How about the change from 50 to 60?
delta <- with(myd_2009,
pnorm(b[1] + b[2] * previous + b[3] * expm + b[4] * 60) -
pnorm(b[1] + b[2] * previous + b[3] * expm + b[4] * 50))
mean(delta)
## [1] -0.07284359
It is displayed that the aging from 50 to 60 is associated with the -7.28 points increase (or 7.28 points decrease) of the winning probability.
Let’s comare the logistic regression results with the probit regression results we got above.
First, by looking the effects of the predictors on the winning probability, there is no real difference in substantive meaning between two. For instance, the increase in the winning probability corresponding to the change of previous win from 0 to 1 is 6.08 points for logistic and 6.13 points for probit. By rounding up to the first decimal, the results are the same. This is true for other mean predicted changes. Thus, choice between logistic and probit doen’t affect the political-science interpretation of the results (except rare singular cases). That is, you can choose either model based on your taste in most cases.
To simplify the explanation, let’s compare Model 1 of logisti and probit.
b_logit <- coef(fit_logit_1)
b_probit <- coef(fit_probit_1)
cbind(b_logit, b_probit)
## b_logit b_probit
## (Intercept) -2.03745863 -1.24949910
## previous 0.36051937 0.20836892
## expm 0.03824864 0.02509093
Based on these results, we can write logistic regression as \[\mathrm{Pr(victory)} = \frac{1}{1 + \exp[-(-2.03 + 0.36 \mathrm{previous~wins} + 0.04\mathrm{expenditure} )]}\] and probit regresion as \[\mathrm{Pr(victory)} = \Phi(-1.25 + 0.21 \mathrm{previous~wins} + 0.03\mathrm{expenditure}).\]
As explained in class, logist regression should be equivalent to the probit with its sd multiplied by 1.6. To verify this, let’s multiply the logit’s coefficient estimates by 1.6.
b_probit_1.6 <- 1.6 * b_probit
cbind(b_logit, b_probit_1.6, b_probit)
## b_logit b_probit_1.6 b_probit
## (Intercept) -2.03745863 -1.99919856 -1.24949910
## previous 0.36051937 0.33339028 0.20836892
## expm 0.03824864 0.04014549 0.02509093
As shown, the results are very close, though not exactly same.
The notable difference between the standard normal and logistic distributions is their thickness of the tails. Logistic regression has the fatter tails than the standard normal. In other words, the logistic distribution gives the higher probability to the values far from the center than does the standard normal. Accordingly, we expect that the difference between logit and probit gets larger when the the value of the linear predictor is far from 0.